Add partial fit example and caveats to notebook (#1486)

* Add partial fit example and caveats to notebook

* correct typos

* Shift example to separate ipynb

* fix typos

* rename notebook
This commit is contained in:
Ryan Nazareth 2020-05-15 05:59:37 +01:00 committed by GitHub
parent 0dd8e522f1
commit 3578e93062
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 120 additions and 15 deletions

View file

@ -0,0 +1,106 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/ryannazareth/Documents/Python_sprints/prophet/python/fbprophet/diagnostics.py:10: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n",
" from tqdm.autonotebook import tqdm\n"
]
}
],
"source": [
"%load_ext rpy2.ipython\n",
"%matplotlib inline\n",
"import pandas as pd\n",
"from fbprophet import Prophet\n",
"import logging\n",
"logging.getLogger('fbprophet').setLevel(logging.ERROR)\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Partial Fitting "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Prophet also allows the option of partial fitting i.e. using a previous model's fitted parameters to initialize parameters of a new model. This could be useful when the model needs to be re-trained with new data coming in e.g. online learning. This works best when the newly added data follows the same trend as the history that has been previously fitted. An example is shown below in Python using the Peyton Manning dataset introduced in the <a href=\"https://facebook.github.io/prophet/docs/quick_start.html#python-api\">Quick Start</a>. In this case, a model `m1` is initially fit to `df1` with two years less history. A new model `m2` is then fit to `df` with full history, with parameters initialised to `m1`parameter values. These are passed to the `init` keyword as a dictionary by calling `stan_init`. Depending on the dataset, this can lead to an improvement in training time, as the parameters passed downstream to Stan's optimizing function have a more optimal initialization from the previous model's fit. In this case, we get over 20% improvement in training time compared to fitting model `m` to `df` with default parameter initialization (without partial fitting)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.41 s ± 52.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"3.06 s ± 35.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"def stan_init(m):\n",
" res = {}\n",
" for pname in ['k', 'm', 'sigma_obs']:\n",
" res[pname] = m.params[pname][0][0]\n",
" for pname in ['delta', 'beta']:\n",
" res[pname] = m.params[pname][0]\n",
" return res\n",
"\n",
"df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')\n",
"df1 = df.loc[df['ds'] < '2014-01-21', :]\n",
"m1 = Prophet()\n",
"m1.fit(df1)\n",
"\n",
"%timeit m2 = Prophet().fit(df, init=stan_init(m1))\n",
"%timeit m = Prophet().fit(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, there are few caveats that need to kept in mind with this approach, which could lead to a bad model fit and worse results than using the default intiialization.\n",
"\n",
"* The number of changepoints need to be consistent from one model to the next. Otherwise, an error will be generated because the changepoint prior parameter `delta` will be the wrong size.\n",
"* If the locations of the changepoints in time have changed greatly, this may do worse than the default initialization because the initial trend may be very bad."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (fbprophet)",
"language": "python",
"name": "fbprophet"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View file

@ -6,23 +6,14 @@
"metadata": {
"block_hidden": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/site-packages/rpy2/robjects/pandas2ri.py:15: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.\n",
" from pandas.core.index import Index as PandasIndex\n"
]
}
],
"outputs": [],
"source": [
"%load_ext rpy2.ipython\n",
"%matplotlib inline\n",
"import logging\n",
"logging.getLogger('fbprophet').setLevel(logging.ERROR)\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
"warnings.filterwarnings(\"ignore\")\n"
]
},
{
@ -49,7 +40,15 @@
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Importing plotly failed. Interactive plots will not work.\n"
]
}
],
"source": [
"import pandas as pd\n",
"from fbprophet import Prophet"
@ -48435,9 +48434,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python (fbprophet)",
"language": "python",
"name": "python3"
"name": "fbprophet"
},
"language_info": {
"codemirror_mode": {
@ -48449,7 +48448,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
"version": "3.7.0"
}
},
"nbformat": 4,