If an FX rate query requests a date that's greater than the last date in the fx
rate file, forward-fill from the last value in the file rather than raising an
error.
We do this for a few reasons:
1. We'd like to gracefully handle the possibility of an FX rates file that's
older than another input file.
2. Relative to other non-erroring behaviors, forward-filling is the simplest
thing to implement. Specifically, it's what the implementation prior to this
change would do naturally if there weren't an explicit check to prevent it.
3. For an FX rates file containing prices on a 24/5 calendar, some amount of
forward-filling is required to handle any market with a non-weekday date.
- When reading before the start of data, return NaN. We do this because it's
hard to reliably apply a lower bound to the queried dates in core-loader
style pipeline loaders.
- When reading an unknown base currency, return NaN. We might get data from
third parties with unknown currencies. Doing so should not be an error.
- When reading after the end of data, emit an error rather than forward-filling
forever. We may want to revisit this in the future.
Rather than trying to use S3s everywhere, which is annoying in Python 3 and
makes it harder to represent missing data, just use object arrays with None as
the missing value. This is the representation we want anyway for loading
currency data in pipelines, and the main downsides are performance (which
doesn't appear to be meaningfully affected) and difficulty with sorting, which
we don't need to do (at least right now).
Don't crash on queries for currency codes of possibly-unknown sids on calls to
`SessionBarReader.currency_codes`. When a currency code is requested for an
unknown sid, we return zipline.currency.MISSING_CURRENCY_CODE for that sid.
- Rather than using numpy's S3 and U3 types, which behave differently in py2
and py3, just expect object arrays of `str` everywhere.
This makes it slightly more expensive for us to read the currency index from
the file in py3, but that array is on the order of a hundred or so elements
total, so that's not a major concern, and it simplifies the handling
elsewhere.
Explicitly require that pipeline-compatible readers support 'default' as a
key. I'm currently just using the string 'default' for this. We should probably
move that value to a shared location, but it's not clear to me that value
should live since it's defining an interface boundary between pipeline and fx
data.
- Add test coverage for non-scalar lookups.
- Performance optimizations for InMemoryFXRateReader and HDF5FXRateReader. We
no longer construct a DataFrame on each call to get_rates, since doing so is
surprisingly expensive and caused test_lookup_scalar to take more time than
was reasonable.
`test_unadjusted_get_value` and `test_unadjusted_get_value_no_data` were
hardcoded to reference US sids, so we were running checks through the
US machinery, even in the CA tests. This commit refactors both so that
they are general across countries.
- Adds a `holes` arg to make_bar_data() and expected_bar_values_2d(),
to allow adding gaps of missing values in the synthetic data.
- Updates test_unadjusted_get_value_no_data to check that missing values
within an asset's lifetime are return as nan for OHLC, and 0.0 for
volume.
- Removes the ultra-hacky test_unadjusted_get_value_empty_value from,
the bcolz daily bar tests, since test_unadjusted_get_value_no_data now
covers the same behavior.
Updates _DailyBarsTestCase handle writing data for both US and CA
equities. The tests will the be run against the assets from the country
specified by the DAILY_BARS_TEST_QUERY_COUNTRY class attribute.
BcolzDailyBarTestCase uses 'US' to preserve its current behavior, and
HDF5DailyBarTestCase becomes HDF5DailyBarUSTestCase, with the new
HDF5DailyBarCanadaTestCase.