{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Zipline beginner tutorial\n", "=========================\n", "\n", "Basics\n", "------\n", "\n", "Zipline is an open-source algorithmic trading simulator written in Python.\n", "\n", "The source can be found at: https://github.com/quantopian/zipline\n", "\n", "Some benefits include:\n", "\n", "* Realistic: slippage, transaction costs, order delays.\n", "* Stream-based: Process each event individually, avoids look-ahead bias.\n", "* Batteries included: Common transforms (moving average) as well as common risk calculations (Sharpe).\n", "* Developed and continuously updated by [Quantopian](https://www.quantopian.com) which provides an easy-to-use web-interface to Zipline, 10 years of minute-resolution historical US stock data, and live-trading capabilities. This tutorial is directed at users wishing to use Zipline without using Quantopian. If you instead want to get started on Quantopian, see [here](https://www.quantopian.com/faq#get-started).\n", "\n", "This tutorial assumes that you have zipline correctly installed, see the [installation instructions](https://github.com/quantopian/zipline#installation) if you haven't set up zipline yet.\n", "\n", "Every `zipline` algorithm consists of two functions you have to define:\n", "* `initialize(context)`\n", "* `handle_data(context, data)`\n", "\n", "Before the start of the algorithm, `zipline` calls the `initialize()` function and passes in a `context` variable. `context` is a persistent namespace for you to store variables you need to access from one algorithm iteration to the next.\n", "\n", "After the algorithm has been initialized, `zipline` calls the `handle_data()` function once for each event. At every call, it passes the same `context` variable and an event-frame called `data` containing the current trading bar with open, high, low, and close (OHLC) prices as well as volume for each stock in your universe. For more information on these functions, see the [relevant part of the Quantopian docs](https://www.quantopian.com/help#api-toplevel)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My first algorithm\n", "----------------------\n", "\n", "Lets take a look at a very simple algorithm from the `examples` directory, `buyapple.py`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " # Load price data from yahoo.\r\n", " data = load_from_yahoo(stocks=['AAPL'], indexes={}, start=start,\r\n", " end=end)\r\n", "\r\n", " # Create and run the algorithm.\r\n", " algo = TradingAlgorithm(initialize=initialize, handle_data=handle_data,\r\n", " identifiers=['AAPL'])\r\n", " results = algo.run(data)\r\n", "\r\n", " analyze(results=results)\r\n" ] } ], "source": [ "!tail ../../zipline/examples/buyapple.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, we first have to import some functions we would like to use. All functions commonly used in your algorithm can be found in `zipline.api`. Here we are using `order()` which takes two arguments -- a security object, and a number specifying how many stocks you would like to order (if negative, `order()` will sell/short stocks). In this case we want to order 10 shares of Apple at each iteration. For more documentation on `order()`, see the [Quantopian docs](https://www.quantopian.com/help#api-order).\n", "\n", "You don't have to use the `symbol()` function and could just pass in `AAPL` directly but it is good practice as this way your code will be Quantopian compatible.\n", "\n", "Finally, the `record()` function allows you to save the value of a variable at each iteration. You provide it with a name for the variable together with the variable itself: `varname=var`. After the algorithm finished running you will have access to each variable value you tracked with `record()` under the name you provided (we will see this further below). You also see how we can access the current price data of the AAPL stock in the `data` event frame (for more information see [here](https://www.quantopian.com/help#api-event-properties)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running the algorithm\n", "\n", "To now test this algorithm on financial data, `zipline` provides two interfaces. A command-line interface and an `IPython Notebook` interface.\n", "\n", "### Command line interface\n", "After you installed zipline you should be able to execute the following from your command line (e.g. `cmd.exe` on Windows, or the Terminal app on OSX):" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: run_algo.py [-h] [-c FILE] [--algofile ALGOFILE] [--data-frequency {minute,daily}] [--start START] [--end END]\r\n", " [--capital_base CAPITAL_BASE] [--source {yahoo}] [--source_time_column SOURCE_TIME_COLUMN] [--symbols SYMBOLS]\r\n", " [--output OUTPUT] [--metadata_path METADATA_PATH] [--metadata_index METADATA_INDEX] [--print-algo] [--no-print-algo]\r\n", "\r\n", "Zipline version 0.8.0rc1.\r\n", "\r\n", "optional arguments:\r\n", " -h, --help show this help message and exit\r\n", " -c FILE, --conf_file FILE\r\n", " Specify config file\r\n", " --algofile ALGOFILE, -f ALGOFILE\r\n", " --data-frequency {minute,daily}\r\n", " --start START, -s START\r\n", " --end END, -e END\r\n", " --capital_base CAPITAL_BASE\r\n", " --source {yahoo}, -d {yahoo}\r\n", " --source_time_column SOURCE_TIME_COLUMN, -t SOURCE_TIME_COLUMN\r\n", " --symbols SYMBOLS\r\n", " --output OUTPUT, -o OUTPUT\r\n", " --metadata_path METADATA_PATH, -m METADATA_PATH\r\n", " --metadata_index METADATA_INDEX, -x METADATA_INDEX\r\n", " --print-algo, -p\r\n", " --no-print-algo, -q\r\n" ] } ], "source": [ "!run_algo.py --help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that you have to omit the preceding '!' when you call `run_algo.py`, this is only required by the IPython Notebook in which this tutorial was written.\n", "\n", "As you can see there are a couple of flags that specify where to find your algorithm (`-f`) as well as parameters specifying which stock data to load from Yahoo! finance (`--symbols`) and the time-range (`--start` and `--end`). Finally, you'll want to save the performance metrics of your algorithm so that you can analyze how it performed. This is done via the `--output` flag and will cause it to write the performance `DataFrame` in the pickle Python file format. Note that you can also define a configuration file with these parameters that you can then conveniently pass to the `-c` option so that you don't have to supply the command line args all the time (see the .conf files in the examples directory).\n", "\n", "Thus, to execute our algorithm from above and save the results to `buyapple_out.pickle` we would call `run_algo.py` as follows:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AAPL\n", "[2015-11-04 22:45:32.820166] INFO: Performance: Simulated 3521 trading days out of 3521.\n", "[2015-11-04 22:45:32.820314] INFO: Performance: first open: 2000-01-03 14:31:00+00:00\n", "[2015-11-04 22:45:32.820401] INFO: Performance: last close: 2013-12-31 21:00:00+00:00\n" ] } ], "source": [ "!run_algo.py -f ../../zipline/examples/buyapple.py --start 2000-1-1 --end 2014-1-1 --symbols AAPL -o buyapple_out.pickle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`run_algo.py` first outputs the algorithm contents. It then fetches historical price and volume data of Apple from Yahoo! finance in the desired time range, calls the `initialize()` function, and then streams the historical stock price day-by-day through `handle_data()`. After each call to `handle_data()` we instruct `zipline` to order 10 stocks of AAPL. After the call of the `order()` function, `zipline` enters the ordered stock and amount in the order book. After the `handle_data()` function has finished, `zipline` looks for any open orders and tries to fill them. If the trading volume is high enough for this stock, the order is executed after adding the commission and applying the slippage model which models the influence of your order on the stock price, so your algorithm will be charged more than just the stock price * 10. (Note, that you can also change the commission and slippage model that `zipline` uses, see the [Quantopian docs](https://www.quantopian.com/help#ide-slippage) for more information).\n", "\n", "Note that there is also an `analyze()` function printed. `run_algo.py` will try and look for a file with the ending with `_analyze.py` and the same name of the algorithm (so `buyapple_analyze.py`) or an `analyze()` function directly in the script. If an `analyze()` function is found it will be called *after* the simulation has finished and passed in the performance `DataFrame`. (The reason for allowing specification of an `analyze()` function in a separate file is that this way `buyapple.py` remains a valid Quantopian algorithm that you can copy&paste to the platform).\n", "\n", "Lets take a quick look at the performance `DataFrame`. For this, we use `pandas` from inside the IPython Notebook and print the first ten rows. Note that `zipline` makes heavy usage of `pandas`, especially for data input and outputting so it's worth spending some time to learn it." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
| \n", " | AAPL | \n", "algo_volatility | \n", "algorithm_period_return | \n", "alpha | \n", "benchmark_period_return | \n", "benchmark_volatility | \n", "beta | \n", "capital_used | \n", "ending_cash | \n", "ending_exposure | \n", "... | \n", "short_exposure | \n", "short_value | \n", "shorts_count | \n", "sortino | \n", "starting_cash | \n", "starting_exposure | \n", "starting_value | \n", "trading_days | \n", "transactions | \n", "treasury_period_return | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000-01-03 21:00:00 | \n", "3.738314 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "-0.065800 | \n", "-0.009549 | \n", "0.000000 | \n", "0.000000 | \n", "0.00000 | \n", "10000000.00000 | \n", "0.00000 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "10000000.00000 | \n", "0.00000 | \n", "0.00000 | \n", "1 | \n", "[] | \n", "0.0658 | \n", "
| 2000-01-04 21:00:00 | \n", "3.423135 | \n", "3.367492e-07 | \n", "-3.000000e-08 | \n", "-0.064897 | \n", "-0.047528 | \n", "0.323229 | \n", "0.000001 | \n", "-34.53135 | \n", "9999965.46865 | \n", "34.23135 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "10000000.00000 | \n", "0.00000 | \n", "0.00000 | \n", "2 | \n", "[{u'order_id': u'513357725cb64a539e3dd02b47da7... | \n", "0.0649 | \n", "
| 2000-01-05 21:00:00 | \n", "3.473229 | \n", "4.001918e-07 | \n", "-9.906000e-09 | \n", "-0.066196 | \n", "-0.045697 | \n", "0.329321 | \n", "0.000001 | \n", "-35.03229 | \n", "9999930.43636 | \n", "69.46458 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "9999965.46865 | \n", "34.23135 | \n", "34.23135 | \n", "3 | \n", "[{u'order_id': u'd7d4ad03cfec4d578c0d817dc3829... | \n", "0.0662 | \n", "
| 2000-01-06 21:00:00 | \n", "3.172661 | \n", "4.993979e-06 | \n", "-6.410420e-07 | \n", "-0.065758 | \n", "-0.044785 | \n", "0.298325 | \n", "-0.000006 | \n", "-32.02661 | \n", "9999898.40975 | \n", "95.17983 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "-12731.780516 | \n", "9999930.43636 | \n", "69.46458 | \n", "69.46458 | \n", "4 | \n", "[{u'order_id': u'1fbf5e9bfd7c4d9cb2e8383e1085e... | \n", "0.0657 | \n", "
| 2000-01-07 21:00:00 | \n", "3.322945 | \n", "5.977002e-06 | \n", "-2.201900e-07 | \n", "-0.065206 | \n", "-0.018908 | \n", "0.375301 | \n", "0.000005 | \n", "-33.52945 | \n", "9999864.88030 | \n", "132.91780 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "-12629.274583 | \n", "9999898.40975 | \n", "95.17983 | \n", "95.17983 | \n", "5 | \n", "[{u'order_id': u'9ea6b142ff09466b9113331a37437... | \n", "0.0652 | \n", "
5 rows × 39 columns
\n", "| \n", " | AAPL | \n", "algo_volatility | \n", "algorithm_period_return | \n", "alpha | \n", "benchmark_period_return | \n", "benchmark_volatility | \n", "beta | \n", "capital_used | \n", "ending_cash | \n", "ending_exposure | \n", "... | \n", "short_exposure | \n", "short_value | \n", "shorts_count | \n", "sortino | \n", "starting_cash | \n", "starting_exposure | \n", "starting_value | \n", "trading_days | \n", "transactions | \n", "treasury_period_return | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000-01-03 21:00:00 | \n", "3.738314 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "-0.065800 | \n", "-0.009549 | \n", "0.000000 | \n", "0.000000 | \n", "0.00000 | \n", "10000000.00000 | \n", "0.00000 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "10000000.00000 | \n", "0.00000 | \n", "0.00000 | \n", "1 | \n", "[] | \n", "0.0658 | \n", "
| 2000-01-04 21:00:00 | \n", "3.423135 | \n", "3.367492e-07 | \n", "-3.000000e-08 | \n", "-0.064897 | \n", "-0.047528 | \n", "0.323229 | \n", "0.000001 | \n", "-34.53135 | \n", "9999965.46865 | \n", "34.23135 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "10000000.00000 | \n", "0.00000 | \n", "0.00000 | \n", "2 | \n", "[{u'commission': 0.3, u'amount': 10, u'sid': 0... | \n", "0.0649 | \n", "
| 2000-01-05 21:00:00 | \n", "3.473229 | \n", "4.001918e-07 | \n", "-9.906000e-09 | \n", "-0.066196 | \n", "-0.045697 | \n", "0.329321 | \n", "0.000001 | \n", "-35.03229 | \n", "9999930.43636 | \n", "69.46458 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0.000000 | \n", "9999965.46865 | \n", "34.23135 | \n", "34.23135 | \n", "3 | \n", "[{u'commission': 0.3, u'amount': 10, u'sid': 0... | \n", "0.0662 | \n", "
| 2000-01-06 21:00:00 | \n", "3.172661 | \n", "4.993979e-06 | \n", "-6.410420e-07 | \n", "-0.065758 | \n", "-0.044785 | \n", "0.298325 | \n", "-0.000006 | \n", "-32.02661 | \n", "9999898.40975 | \n", "95.17983 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "-12731.780516 | \n", "9999930.43636 | \n", "69.46458 | \n", "69.46458 | \n", "4 | \n", "[{u'commission': 0.3, u'amount': 10, u'sid': 0... | \n", "0.0657 | \n", "
| 2000-01-07 21:00:00 | \n", "3.322945 | \n", "5.977002e-06 | \n", "-2.201900e-07 | \n", "-0.065206 | \n", "-0.018908 | \n", "0.375301 | \n", "0.000005 | \n", "-33.52945 | \n", "9999864.88030 | \n", "132.91780 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "-12629.274583 | \n", "9999898.40975 | \n", "95.17983 | \n", "95.17983 | \n", "5 | \n", "[{u'commission': 0.3, u'amount': 10, u'sid': 0... | \n", "0.0652 | \n", "
5 rows × 39 columns
\n", "