Commit graph

22 commits

Author SHA1 Message Date
gerrymanoim
c7df2e69f4
BLD: Try github actions again (#2743)
* BLD: Try github actions again

* new requirements for p36

* fix code different across numpy versions

* silence the correct warnings for tests to run

* MAINT: Use loc instead of deprecated ix

* comment out windows for now

Co-authored-by: Richard Frank <rich@quantopian.com>
2020-08-17 12:05:36 -04:00
Richard Frank
54698f9c7a DOC: Updated http links to https 2020-08-17 10:17:46 -04:00
Scott Sanderson
94c936c0db PERF: Avoid unnecessary copy in LabelArray.
Avoid making an extra copy of non-C-contiguous arrays when factorizing inputs
to LabelArray. This requires taking care to ensure that we use the same memory
order both when ravelling and unravelling the input arrays.
2020-02-06 17:59:07 -05:00
Frederick Wagner
ecb9d0fdd5 MAINT: Avoid flake8 variable name complaints
Newer versions of pycodestyle will emit errors (not even warnings!) for
variables named `l`.
2019-01-14 14:02:27 -05:00
Joe Jevnik
e5082cbdc9 BUG: Fix label array construction with known categories at an int size boundary 2018-02-05 17:43:32 -05:00
Joe Jevnik
cdaa8ceea6 TST: add regression test for labelarray category copy 2017-08-29 19:22:06 -04:00
Scott Sanderson
09cc54e08a ENH: Improve error message on bad return. 2017-06-07 17:07:19 -04:00
Scott Sanderson
709735de51 TEST: Test map returning None. 2017-06-07 15:28:15 -04:00
Scott Sanderson
6bb31b2544 TEST: Test map ignores missing with None. 2017-06-07 14:16:17 -04:00
Scott Sanderson
5b9d5fecfb ENH: Add relabel method to string classifiers.
- Adds a `map` method to `LabelArray` that maps a unary function over
  the categories of a LabelArray, shrinking the underyling codes if
  possible.

- Adds a new `.relabel` method to string-dtype classifiers that maps a
  unary function over the unique elements of the underlying LabelArray.
  This is useful for things like cleaning noisy label data.
2017-06-07 13:14:12 -04:00
Joe Jevnik
153f6636c7 BUG: fix label array code dtype condense 2017-03-08 20:54:57 -05:00
Joe Jevnik
35338df2b7 TST: add roundtrip check 2017-03-01 15:15:16 -05:00
Joe Jevnik
d66309a3a0 TST: add tests for inferred width labelarray 2017-02-07 16:28:13 -05:00
Joe Jevnik
82361e0542 ENH: store the 'codes' for a labelarray in the narrowest int type possible 2017-02-02 20:58:36 -05:00
Scott Sanderson
0c550dc592 MAINT: Fix warnings from numpy labelarray methods. 2016-09-20 17:12:07 -04:00
Scott Sanderson
e0aeda4c3e BUG: Fix bytes/unicode issues in py3. 2016-05-05 01:46:35 -04:00
Scott Sanderson
a29da32252 TEST: Don't assert particular numpy error.
They change from version to version.
2016-05-04 19:40:50 -04:00
Scott Sanderson
620d7648b0 BUG: Tests/bugfixes for LabelArray slicing.
- Fixes a bug where __setitem__ was not called when setting with a slice
  on Python 2 (__setslice__ was called instead), which caused strange
  behavior when setting an empty string.  This is fixed by overriding
  __setslice__ and forwarding to __setitem__.

- Fixes a bug where __getitem__ returned an instance of np.void when
  returning a scalar.  We now correctly return an entry from our
  categoricals.
2016-05-04 15:54:50 -04:00
Scott Sanderson
8de45540f2 ENH: NaN semantics for LabelArray missing values. 2016-05-04 15:54:50 -04:00
Scott Sanderson
2395cbb671 ENH: Use np.void for labelarray storage.
This disables most broken ufuncs
2016-05-04 15:54:50 -04:00
Scott Sanderson
7a65121e6e BUG: contains was renamed to has_substring 2016-05-04 15:54:50 -04:00
Scott Sanderson
5f190395ad ENH: Add support for strings in Pipeline.
- Adds a new class, ``LabelArray``, which is a subclass of np.ndarray.
  LabelArray is conceptually similar to pandas.Categorical, in that it
  stores data with many duplicate values as indices into an array of
  unique values.  For string data with many duplicates (e.g. time-series
  of tickers or or industry classifications), this provides multiple
  orders of magnitude of improvement when doing string operations,
  especially string comparison/matching operations.

- Adds a new generic object "specialization" for `AdjustedArrayWindow`,
  and a corresponding ObjectOverwrite adjustment.

- Adds a new ``postprocess`` method to ``zipline.pipeline.term.Term``.
  This method is called on the final result of any pipeline expression
  after screen filtering has occurred. The default implementation of
  ``postprocess`` is identity, but Classifier overrides it to coerce
  string columns into pandas.Categoricals before presenting them to the
  user.
2016-05-04 15:50:52 -04:00