plateau's Issues

Plateau errors when loading a partition with only `None` as a category

Problem description

When loading a column as a category and one of the partitions is only None values, the error ValueError: Categorical categories cannot be null. This is actually a bug with pyarrow linked here.

I would be nice to patch this issue for the time being.

Example code (ideally copy-pastable)

from functools import partial
import minimalkv
import pandas as pd
from import read_dataset_as_dataframes__iterator, store_dataframes_as_dataset__iter

df = pd.DataFrame(
        "x": ["a", "b", None, None],
        "chunk": [0, 0, 1, 1],

path = "/tmp"
store = partial(minimalkv.get_store_from_url, f"hfs://{path}?create_if_missing=false")

    df_generator=(df.loc[lambda x: x["chunk"] == chunk] for chunk in [0, 1]),

dfs = read_dataset_as_dataframes__iterator(

df = pd.concat(dfs)


Used versions

# Name                    Version                   Build  Channel
alabaster                 0.7.12                     py_0    conda-forge
altair                    4.2.0              pyhd8ed1ab_1    conda-forge
appnope                   0.1.3              pyhd8ed1ab_0    conda-forge
argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0           py39h63b48b0_2    conda-forge
arrow-cpp                 9.0.0           py39h04a14be_7_cpu    conda-forge
asn1crypto                1.5.1              pyhd8ed1ab_0    conda-forge
asttokens                 2.0.8              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
aws-c-cal                 0.5.11               hd2e2f4b_0    conda-forge
aws-c-common              0.6.2                h0d85af4_0    conda-forge
aws-c-event-stream        0.2.7               hb9330a7_13    conda-forge
aws-c-io                  0.10.5               h35aa462_0    conda-forge
aws-checksums             0.1.11               h0010a65_7    conda-forge
aws-sdk-cpp               1.8.186              h766a74d_3    conda-forge
babel                     2.10.3             pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
bleach                    5.0.1              pyhd8ed1ab_0    conda-forge
bokeh                     2.4.3            py39h6e9494a_0    conda-forge
brotlipy                  0.7.0           py39h63b48b0_1004    conda-forge
bzip2                     1.0.8                h0d85af4_4    conda-forge
c-ares                    1.18.1               h0d85af4_0    conda-forge
ca-certificates           2022.9.24            h033912b_0    conda-forge
certifi                   2022.9.24          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39hae9ecf2_0    conda-forge
cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3            py39h6e9494a_0    conda-forge
cloudpickle               2.2.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.5              pyhd8ed1ab_0    conda-forge
contextlib2               0.5.5                      py_2    conda-forge
coverage                  6.4.4            py39h6218fd2_0    conda-forge
cryptography              35.0.0           py39h209aa08_2    conda-forge
cytoolz                   0.12.0           py39h701faf5_0    conda-forge
dask                      2022.5.2           pyhd8ed1ab_0    conda-forge
dask-core                 2022.5.2           pyhd8ed1ab_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
debugpy                   1.6.3            py39hd91caee_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distlib                   0.3.5              pyhd8ed1ab_0    conda-forge
distributed               2022.5.2           pyhd8ed1ab_0    conda-forge
docutils                  0.17.1           py39h6e9494a_2    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
executing                 1.0.0              pyhd8ed1ab_0    conda-forge
filelock                  3.8.0              pyhd8ed1ab_0    conda-forge
flit-core                 3.7.1              pyhd8ed1ab_0    conda-forge
freetype                  2.12.1               h3f81eb7_0    conda-forge
freezegun                 1.2.2              pyhd8ed1ab_0    conda-forge
fsspec                    2022.8.2           pyhd8ed1ab_0    conda-forge
gflags                    2.2.2             hb1e8313_1004    conda-forge
glog                      0.6.0                h8ac2a54_0    conda-forge
gmp                       6.2.1                h2e338ed_0    conda-forge
great-expectations        0.15.24            pyhd8ed1ab_0    conda-forge
grpc-cpp                  1.47.1               h834a566_6    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       70.1                 h96cf925_0    conda-forge
identify                  2.5.5              pyhd8ed1ab_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
importlib-metadata        4.11.4           py39h6e9494a_0    conda-forge
importlib_resources       5.9.0              pyhd8ed1ab_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
ipykernel                 6.15.3             pyh736e0ef_0    conda-forge
ipython                   8.5.0              pyhd1c38e8_1    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                8.0.2              pyhd8ed1ab_1    conda-forge
jedi                      0.18.1           py39h6e9494a_1    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jpeg                      9e                   hac89ed1_2    conda-forge
jsonpatch                 1.32               pyhd8ed1ab_0    conda-forge
jsonpointer               2.0                        py_0    conda-forge
jsonschema                4.16.0             pyhd8ed1ab_0    conda-forge
jupyter_client            7.3.4              pyhd8ed1ab_0    conda-forge
jupyter_core              4.11.1           py39h6e9494a_0    conda-forge
jupyterlab_pygments       0.2.2              pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        3.0.3              pyhd8ed1ab_0    conda-forge
jupytext                  1.14.0             pyheef035f_0    conda-forge
kartothek                 4.0.4.dev0+gdb840b5.d20221018          pypi_0    pypi
krb5                      1.19.3               hb49756b_0    conda-forge
lcms2                     2.12                 h577c468_0    conda-forge
lerc                      4.0.0                hb486fe8_0    conda-forge
libabseil                 20220623.0      cxx17_h844d122_4    conda-forge
libblas                   3.9.0           16_osx64_openblas    conda-forge
libbrotlicommon           1.0.9                h5eb16cf_7    conda-forge
libbrotlidec              1.0.9                h5eb16cf_7    conda-forge
libbrotlienc              1.0.9                h5eb16cf_7    conda-forge
libcblas                  3.9.0           16_osx64_openblas    conda-forge
libcrc32c                 1.1.2                he49afe7_0    conda-forge
libcurl                   7.83.1               h372c54d_0    conda-forge
libcxx                    14.0.6               hccf4f1f_0    conda-forge
libdeflate                1.14                 hb7f2c08_0    conda-forge
libedit                   3.1.20191231         h0678c8f_2    conda-forge
libev                     4.33                 haf1e3a3_1    conda-forge
libevent                  2.1.10               h815e4d9_4    conda-forge
libffi                    3.4.2                h0d85af4_5    conda-forge
libgfortran               5.0.0           10_4_0_h97931a8_25    conda-forge
libgfortran5              11.3.0              h082f757_25    conda-forge
libgoogle-cloud           2.2.0                hb0fe3b0_1    conda-forge
libiconv                  1.16                 haf1e3a3_0    conda-forge
liblapack                 3.9.0           16_osx64_openblas    conda-forge
libnghttp2                1.47.0               h7cbc4dc_1    conda-forge
libopenblas               0.3.21          openmp_h429af6e_3    conda-forge
libpng                    1.6.38               ha978bb4_0    conda-forge
libprotobuf               3.21.7               hbc0c0cd_0    conda-forge
libsodium                 1.0.18               hbcb3906_1    conda-forge
libsqlite                 3.39.3               ha978bb4_0    conda-forge
libssh2                   1.10.0               h7535e13_3    conda-forge
libthrift                 0.16.0               h08c06f4_2    conda-forge
libtiff                   4.4.0                hdb44e8a_4    conda-forge
libutf8proc               2.7.0                h0d85af4_0    conda-forge
libwebp-base              1.2.4                h775f41a_0    conda-forge
libxcb                    1.13              h0d85af4_1004    conda-forge
libxml2                   2.9.14               hea49891_4    conda-forge
libxslt                   1.1.35               heaa0ce8_0    conda-forge
libzlib                   1.2.12               hfd90126_3    conda-forge
llvm-openmp               14.0.4               ha654fa7_0    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lxml                      4.9.1            py39h701faf5_0    conda-forge
lz4                       4.0.0            py39h263ca4c_2    conda-forge
lz4-c                     1.9.3                he49afe7_1    conda-forge
make                      4.3                  h22f3db7_1    conda-forge
makefun                   1.15.0             pyhd8ed1ab_0    conda-forge
markdown-it-py            2.1.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.1            py39h63b48b0_1    conda-forge
marshmallow               3.18.0             pyhd8ed1ab_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mdit-py-plugins           0.3.0              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.0              pyhd8ed1ab_0    conda-forge
milksnake                 0.1.5              pyhd8ed1ab_1    conda-forge
minimalkv                 1.4.3                    pypi_0    pypi
mistune                   2.0.4              pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.4            py39h7c694c3_0    conda-forge
nbclient                  0.6.8              pyhd8ed1ab_0    conda-forge
nbconvert                 7.0.0              pyhd8ed1ab_0    conda-forge
nbconvert-core            7.0.0              pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.0.0              pyhd8ed1ab_0    conda-forge
nbformat                  5.6.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.3                  h96cf925_1    conda-forge
nest-asyncio              1.5.5              pyhd8ed1ab_0    conda-forge
nodeenv                   1.7.0              pyhd8ed1ab_0    conda-forge
notebook                  6.4.12             pyha770c72_0    conda-forge
numpy                     1.23.3           py39h34843a6_0    conda-forge
numpydoc                  1.4.0              pyhd8ed1ab_1    conda-forge
openjpeg                  2.5.0                h5d0d7b0_1    conda-forge
openssl                   1.1.1q               hfe4f2af_0    conda-forge
orc                       1.8.0                ha9d861c_0    conda-forge
oscrypto                  1.2.1              pyhd3deb0d_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.4.4            py39hca71b8a_0    conda-forge
pandoc                    2.19.2               h694c41f_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parquet-cpp               1.5.1                         1    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.3.0              pyhd8ed1ab_0    conda-forge
pbr                       5.10.0             pyhd8ed1ab_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5           py39hde42818_1002    conda-forge
pillow                    9.2.0            py39h4d560c1_2    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_0    conda-forge
plateau                   0.0.0                    pypi_0    pypi
platformdirs              2.5.2              pyhd8ed1ab_1    conda-forge
pluggy                    1.0.0            py39h6e9494a_3    conda-forge
pre-commit                2.20.0           py39h6e9494a_0    conda-forge
prometheus_client         0.14.1             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.31             pyha770c72_0    conda-forge
psutil                    5.9.2            py39ha30fb19_0    conda-forge
pthread-stubs             0.4               hc929b4f_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
py                        1.11.0             pyh6c4a22f_0    conda-forge
pyarrow                   6.0.1                    pypi_0    pypi
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pycryptodomex             3.15.0           py39h701faf5_0    conda-forge
pygments                  2.13.0             pyhd8ed1ab_0    conda-forge
pyjwt                     2.5.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyrsistent                0.18.1           py39h63b48b0_1    conda-forge
pysocks                   1.7.1            py39h6e9494a_5    conda-forge
pytest                    7.1.3            py39h6e9494a_0    conda-forge
pytest-cov                3.0.0              pyhd8ed1ab_0    conda-forge
pytest-mock               3.8.2              pyhd8ed1ab_0    conda-forge
python                    3.9.13          h57e37ff_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.16.2             pyhd8ed1ab_0    conda-forge
python-slugify            6.1.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2022.2             pyhd8ed1ab_0    conda-forge
python-xxhash             3.0.0            py39h63b48b0_1    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.2.1           pyhd8ed1ab_0    conda-forge
pytz-deprecation-shim     0.1.0.post0      py39h6e9494a_2    conda-forge
pyyaml                    6.0              py39h63b48b0_4    conda-forge
pyzmq                     24.0.1           py39hed8f129_0    conda-forge
re2                       2022.06.01           hb486fe8_0    conda-forge
readline                  8.1.2                h3899abd_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
ruamel.yaml               0.17.17          py39h89e85a6_1    conda-forge
ruamel.yaml.clib          0.2.6            py39h63b48b0_1    conda-forge
schema                    0.7.5              pyhd8ed1ab_0    conda-forge
scipy                     1.9.1            py39h9488793_0    conda-forge
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                65.3.0           py39h6e9494a_0    conda-forge
setuptools-scm            7.0.5              pyhd8ed1ab_0    conda-forge
setuptools_scm            7.0.5                hd8ed1ab_0    conda-forge
simplejson                3.17.6           py39h63b48b0_1    conda-forge
simplekv                  0.14.1             pyh9f0ad1d_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                h6e38e02_1    conda-forge
snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
snowflake-connector-python 2.7.12           py39h1584358_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
sphinx                    5.1.1              pyhd8ed1ab_1    conda-forge
sphinx_rtd_theme          1.0.0              pyhd8ed1ab_0    conda-forge
sphinxcontrib-apidoc      0.3.0                      py_1    conda-forge
sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
sphinxcontrib-htmlhelp    2.0.0              pyhd8ed1ab_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_2    conda-forge
sqlite                    3.39.3               h9ae0607_0    conda-forge
stack_data                0.5.0              pyhd8ed1ab_0    conda-forge
storefact                 0.10.0                     py_0    conda-forge
tabulate                  0.8.10             pyhd8ed1ab_0    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
termcolor                 2.0.1              pyhd8ed1ab_1    conda-forge
terminado                 0.15.0           py39h6e9494a_0    conda-forge
text-unidecode            1.3                        py_0    conda-forge
tinycss2                  1.1.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h5dbffcc_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
tornado                   6.1              py39h63b48b0_3    conda-forge
tqdm                      4.64.1             pyhd8ed1ab_0    conda-forge
traitlets                 5.4.0              pyhd8ed1ab_0    conda-forge
typing-extensions         4.3.0                hd8ed1ab_0    conda-forge
typing_extensions         4.3.0              pyha770c72_0    conda-forge
tzdata                    2022c                h191b570_0    conda-forge
tzlocal                   4.2              py39h6e9494a_1    conda-forge
ukkonen                   1.0.1            py39h7248d28_2    conda-forge
unidecode                 1.3.4              pyhd8ed1ab_0    conda-forge
uritools                  4.0.0              pyhd8ed1ab_0    conda-forge
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
urlquote                  1.1.4            py39h9b8c074_5    conda-forge
virtualenv                20.16.5          py39h6e9494a_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
widgetsnbextension        4.0.3              pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h35c211d_0    conda-forge
xorg-libxdmcp             1.1.3                h35c211d_0    conda-forge
xxhash                    0.8.0                h35c211d_3    conda-forge
xz                        5.2.6                h775f41a_0    conda-forge
yaml                      0.2.5                h0d85af4_2    conda-forge
zeromq                    4.3.4                he49afe7_1    conda-forge
zict                      2.2.0              pyhd8ed1ab_0    conda-forge
zipp                      3.8.1              pyhd8ed1ab_0    conda-forge
zlib                      1.2.12               hfd90126_3    conda-forge
zstandard                 0.18.0           py39h701faf5_0    conda-forge
zstd                      1.5.2                hfa58983_4    conda-forge

Rename code to `plateau`

We should change the module itself completely to plateau but not make any API breaking changes.

Drop pyarrow<3

We shouldn't be supporting and testing these older versions. Instead, we should add a >=2 pin.

Test Python 3.11

Problem description

Please describe the current behavior, why it is a problem and what the expected behavior should be.

Example code (ideally copy-pastable)

Please provide a minimal reproducible code example to reproduce the behavior,

# Your code example

Used versions

# Paste your output of `pip freeze` or `conda list` here

Support pandas 2.0

Our CI jobs for pandas 2.0 are currently failing (see, e.g., here).

I see (at least) two issues with supporting pandas 2.0:

  • pandas-dev/pandas#50127 (open PR here): this issue is causing our CI failures
  • pandas-dev/pandas#52212 changed how pandas is inferring dtypes from scalars, which results in issues when we partition by a datetime (loading the data eagerly will return a datetime64[ns], loading the data as dask data frame will return a datetime64[s]). I don't think we have tests for this by the way. That we could address by explicitly setting the units to nanoseconds here.

Support `pandas=1.5`

As the Index classes have been removed, plateau is not compatible with pandas=1.5. We should add support for it and add a matrix entry for older pandas versions.

Re-add numpy nightlies

Currently the nightly tests with numpy are failing since a significant period with np.array([[np.nan]]).astype(str) leading to an error. We should investigate the underlying issue and re-add the tests once we fixed those.

