Add serialization documentation

This commit is contained in:
Ben Letham 2020-08-19 17:36:00 -07:00
parent 6c3a8a8e07
commit 809c1b5662

View file

@ -2,18 +2,125 @@
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 9,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The rpy2.ipython extension is already loaded. To reload it, use:\n",
" %reload_ext rpy2.ipython\n"
]
},
{
"data": {
"text/plain": [
"<fbprophet.forecaster.Prophet at 0x7f7794ac16d0>"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%load_ext rpy2.ipython\n",
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"from fbprophet import Prophet\n",
"import logging\n",
"logging.getLogger('fbprophet').setLevel(logging.ERROR)\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"df = pd.DataFrame({\n",
" 'ds': pd.date_range(start='2020-01-01', periods=20),\n",
" 'y': np.arange(20),\n",
"})\n",
"m = Prophet(weekly_seasonality=False)\n",
"m.fit(df)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.\n",
"\n",
"WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.\n",
"\n",
"WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: n.changepoints greater than number of observations. Using 15\n",
"\n"
]
}
],
"source": [
"%%R\n",
"library(prophet)\n",
"df <- data.frame(\n",
" ds=seq(as.Date(\"2020-01-01\"), by = \"day\", length.out = 20),\n",
" y=1:20\n",
")\n",
"m <- prophet(df, weekly.seasonality=FALSE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Saving models\n",
"\n",
"It is possible to save fitted Prophet models so that they can be loaded and used later.\n",
"\n",
"In R, this is done with `saveRDS` and `readRDS`:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"%%R\n",
"saveRDS(m, file=\"model.RDS\") # Save model\n",
"m <- readRDS(file=\"model.RDS\") # Load model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Python, models should not be saved with pickle; the Stan backend attached to the model object will not pickle well, and will produce issues under certain versions of Python. Instead, you should use the built-in serialization functions to serialize the model to json:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from fbprophet.serialize import model_to_json, model_from_json\n",
"\n",
"with open('serialized_model.json', 'w') as fout:\n",
" json.dump(model_to_json(m), fout) # Save model\n",
"\n",
"with open('serialized_model.json', 'r') as fin:\n",
" m = model_from_json(json.load(fin)) # Load model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The json file will be portable across systems, and deserialization is backwards compatible with older versions of fbprophet."
]
},
{
@ -47,45 +154,39 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Partial Fitting "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Prophet also allows the option of partial fitting i.e. using a previous model's fitted parameters to initialize parameters of a new model. This could be useful when the model needs to be re-trained with new data coming in e.g. online learning. This works best when the newly added data follows the same trend as the history that has been previously fitted. An example is shown below in Python using the Peyton Manning dataset introduced in the <a href=\"https://facebook.github.io/prophet/docs/quick_start.html#python-api\">Quick Start</a>. In this case, a model `m1` is initially fit to `df1` with two years less history. A new model `m2` is then fit to `df` with full history, with parameters initialised to `m1`parameter values. These are passed to the `init` keyword as a dictionary by calling `stan_init`. Depending on the dataset, this can lead to an improvement in training time, as the parameters passed downstream to Stan's optimizing function have a more optimal initialization from the previous model's fit. In this case, we get over 20% improvement in training time compared to fitting model `m` to `df` with default parameter initialization (without partial fitting)."
"### Updating fitted models\n",
"\n",
"A common setting for forecasting is fitting models that need to be updated as additional data come in. Prophet models can only be fit once, and a new model must be re-fit when new data become available. In most settings, model fitting is fast enough that there isn't any issue with re-fitting from scratch. However, it is possible to speed things up a little by warm-starting the fit from the model parameters of the earlier model. This code example shows how this can be done in Python:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.41 s ± 52.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"3.06 s ± 35.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
"1.26 s ± 21.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"716 ms ± 7.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"def stan_init(m):\n",
" \"\"\"Retrieving parameters from a trained model.\n",
" \"\"\"Retrieve parameters from a trained model.\n",
" \n",
" Retrieved parameters from a trained model \n",
" of the Prophet() Class,are used to initialise \n",
" parameters for a new model. This can help \n",
" speed up training, especially if new data\n",
" follows the same trend as the historical data.\n",
" Retrieve parameters from a trained model in the format\n",
" used to initialize a new Stan model.\n",
" \n",
" @Param\n",
" m: A trained model of the Prophet() Class\n",
" Parameters\n",
" ----------\n",
" m: A trained model of the Prophet class.\n",
" \n",
" @Return\n",
" res: A Dictionary containing retrieved parameters of m\n",
" Returns\n",
" -------\n",
" A Dictionary containing retrieved parameters of m.\n",
" \n",
" \"\"\"\n",
" res = {}\n",
@ -96,22 +197,21 @@
" return res\n",
"\n",
"df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')\n",
"df1 = df.loc[df['ds'] < '2014-01-21', :]\n",
"m1 = Prophet()\n",
"m1.fit(df1)\n",
"df1 = df.loc[df['ds'] < '2016-01-19', :] # All data except the last day\n",
"m1 = Prophet().fit(df1) # A model fit to all data except the last day\n",
"\n",
"%timeit m2 = Prophet().fit(df, init=stan_init(m1))\n",
"%timeit m = Prophet().fit(df)"
"\n",
"%timeit m2 = Prophet().fit(df) # Adding the last day, fitting from scratch\n",
"%timeit m2 = Prophet().fit(df, init=stan_init(m1)) # Adding the last day, warm-starting from m1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, there are few caveats that need to kept in mind with this approach, which could lead to a bad model fit and worse results than using the default intiialization.\n",
"As can be seen, the parameters from the previous model are passed in to the fitting for the next with the kwarg `init`. In this case, model fitting was almost 2x faster when using warm starting. The speedup will generally depend on how much the optimal model parameters have changed with the addition of the new data.\n",
"\n",
"* The number of changepoints need to be consistent from one model to the next. Otherwise, an error will be generated because the changepoint prior parameter `delta` will be the wrong size.\n",
"* If the locations of the changepoints in time have changed greatly, this may do worse than the default initialization because the initial trend may be very bad."
"There are few caveats that should be kept in mind when considering warm-starting. First, warm-starting may work well for small updates to the data (like the addition of one day in the example above) but can be worse than fitting from scratch if there are large changes to the data (i.e., a lot of days have been added). This is because when a large amount of history is added, the location of the changepoints will be very different between the two models, and so the parameters from the previous model may actually produce a bad trend initialization. Second, as a detail, the number of changepoints need to be consistent from one model to the next or else an error will be raised because the changepoint prior parameter `delta` will be the wrong size."
]
}
],