Giter VIP home page Giter VIP logo

plateau's People

Contributors

amerkel2 avatar aniruddhgoteti avatar crepererum avatar damianbarabonkovqc avatar dependabot[bot] avatar eacheson avatar fhoehle avatar fjetter avatar florian-jetter-by avatar gkohen avatar hoffmann avatar imkumarg avatar janjagusch avatar jochen-ott-by avatar jonashaag avatar jorisvandenbossche avatar jtilly avatar kshitij68 avatar lr4d avatar lucas-rademaker-by avatar marco-neumann-by avatar mlondschien avatar nerocorleone avatar pacman82 avatar steffen-schroeder-by avatar stephan-hesselmann-by avatar svoons avatar treebee avatar usha-nemani-by avatar xhochy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

plateau's Issues

Plateau errors when loading a partition with only `None` as a category

Problem description

When loading a column as a category and one of the partitions is only None values, the error ValueError: Categorical categories cannot be null. This is actually a bug with pyarrow linked here.

I would be nice to patch this issue for the time being.

Example code (ideally copy-pastable)

from functools import partial
import minimalkv
import pandas as pd
from plateau.io.iter import read_dataset_as_dataframes__iterator, store_dataframes_as_dataset__iter

df = pd.DataFrame(
    {
        "x": ["a", "b", None, None],
        "chunk": [0, 0, 1, 1],
    }
)

path = "/tmp"
store = partial(minimalkv.get_store_from_url, f"hfs://{path}?create_if_missing=false")

store_dataframes_as_dataset__iter(
    df_generator=(df.loc[lambda x: x["chunk"] == chunk] for chunk in [0, 1]),
    dataset_uuid="lol",
    store=store,
    partition_on=["chunk"],
    overwrite=True
)

dfs = read_dataset_as_dataframes__iterator(
    dataset_uuid="lol",
    store=store,
    columns=["x"],
    categoricals=["x"],
    dispatch_by=["chunk"],
)

# ERROR
df = pd.concat(dfs)

print(df['x'])

Used versions

# Name                    Version                   Build  Channel
alabaster                 0.7.12                     py_0    conda-forge
altair                    4.2.0              pyhd8ed1ab_1    conda-forge
appnope                   0.1.3              pyhd8ed1ab_0    conda-forge
argon2-cffi               21.3.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0           py39h63b48b0_2    conda-forge
arrow-cpp                 9.0.0           py39h04a14be_7_cpu    conda-forge
asn1crypto                1.5.1              pyhd8ed1ab_0    conda-forge
asttokens                 2.0.8              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
aws-c-cal                 0.5.11               hd2e2f4b_0    conda-forge
aws-c-common              0.6.2                h0d85af4_0    conda-forge
aws-c-event-stream        0.2.7               hb9330a7_13    conda-forge
aws-c-io                  0.10.5               h35aa462_0    conda-forge
aws-checksums             0.1.11               h0010a65_7    conda-forge
aws-sdk-cpp               1.8.186              h766a74d_3    conda-forge
babel                     2.10.3             pyhd8ed1ab_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
bleach                    5.0.1              pyhd8ed1ab_0    conda-forge
bokeh                     2.4.3            py39h6e9494a_0    conda-forge
brotlipy                  0.7.0           py39h63b48b0_1004    conda-forge
bzip2                     1.0.8                h0d85af4_4    conda-forge
c-ares                    1.18.1               h0d85af4_0    conda-forge
ca-certificates           2022.9.24            h033912b_0    conda-forge
certifi                   2022.9.24          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39hae9ecf2_0    conda-forge
cfgv                      3.3.1              pyhd8ed1ab_0    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3            py39h6e9494a_0    conda-forge
cloudpickle               2.2.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.5              pyhd8ed1ab_0    conda-forge
contextlib2               0.5.5                      py_2    conda-forge
coverage                  6.4.4            py39h6218fd2_0    conda-forge
cryptography              35.0.0           py39h209aa08_2    conda-forge
cytoolz                   0.12.0           py39h701faf5_0    conda-forge
dask                      2022.5.2           pyhd8ed1ab_0    conda-forge
dask-core                 2022.5.2           pyhd8ed1ab_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
debugpy                   1.6.3            py39hd91caee_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distlib                   0.3.5              pyhd8ed1ab_0    conda-forge
distributed               2022.5.2           pyhd8ed1ab_0    conda-forge
docutils                  0.17.1           py39h6e9494a_2    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
executing                 1.0.0              pyhd8ed1ab_0    conda-forge
filelock                  3.8.0              pyhd8ed1ab_0    conda-forge
flit-core                 3.7.1              pyhd8ed1ab_0    conda-forge
freetype                  2.12.1               h3f81eb7_0    conda-forge
freezegun                 1.2.2              pyhd8ed1ab_0    conda-forge
fsspec                    2022.8.2           pyhd8ed1ab_0    conda-forge
gflags                    2.2.2             hb1e8313_1004    conda-forge
glog                      0.6.0                h8ac2a54_0    conda-forge
gmp                       6.2.1                h2e338ed_0    conda-forge
great-expectations        0.15.24            pyhd8ed1ab_0    conda-forge
grpc-cpp                  1.47.1               h834a566_6    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       70.1                 h96cf925_0    conda-forge
identify                  2.5.5              pyhd8ed1ab_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
importlib-metadata        4.11.4           py39h6e9494a_0    conda-forge
importlib_resources       5.9.0              pyhd8ed1ab_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
ipykernel                 6.15.3             pyh736e0ef_0    conda-forge
ipython                   8.5.0              pyhd1c38e8_1    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                8.0.2              pyhd8ed1ab_1    conda-forge
jedi                      0.18.1           py39h6e9494a_1    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jpeg                      9e                   hac89ed1_2    conda-forge
jsonpatch                 1.32               pyhd8ed1ab_0    conda-forge
jsonpointer               2.0                        py_0    conda-forge
jsonschema                4.16.0             pyhd8ed1ab_0    conda-forge
jupyter_client            7.3.4              pyhd8ed1ab_0    conda-forge
jupyter_core              4.11.1           py39h6e9494a_0    conda-forge
jupyterlab_pygments       0.2.2              pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        3.0.3              pyhd8ed1ab_0    conda-forge
jupytext                  1.14.0             pyheef035f_0    conda-forge
kartothek                 4.0.4.dev0+gdb840b5.d20221018          pypi_0    pypi
krb5                      1.19.3               hb49756b_0    conda-forge
lcms2                     2.12                 h577c468_0    conda-forge
lerc                      4.0.0                hb486fe8_0    conda-forge
libabseil                 20220623.0      cxx17_h844d122_4    conda-forge
libblas                   3.9.0           16_osx64_openblas    conda-forge
libbrotlicommon           1.0.9                h5eb16cf_7    conda-forge
libbrotlidec              1.0.9                h5eb16cf_7    conda-forge
libbrotlienc              1.0.9                h5eb16cf_7    conda-forge
libcblas                  3.9.0           16_osx64_openblas    conda-forge
libcrc32c                 1.1.2                he49afe7_0    conda-forge
libcurl                   7.83.1               h372c54d_0    conda-forge
libcxx                    14.0.6               hccf4f1f_0    conda-forge
libdeflate                1.14                 hb7f2c08_0    conda-forge
libedit                   3.1.20191231         h0678c8f_2    conda-forge
libev                     4.33                 haf1e3a3_1    conda-forge
libevent                  2.1.10               h815e4d9_4    conda-forge
libffi                    3.4.2                h0d85af4_5    conda-forge
libgfortran               5.0.0           10_4_0_h97931a8_25    conda-forge
libgfortran5              11.3.0              h082f757_25    conda-forge
libgoogle-cloud           2.2.0                hb0fe3b0_1    conda-forge
libiconv                  1.16                 haf1e3a3_0    conda-forge
liblapack                 3.9.0           16_osx64_openblas    conda-forge
libnghttp2                1.47.0               h7cbc4dc_1    conda-forge
libopenblas               0.3.21          openmp_h429af6e_3    conda-forge
libpng                    1.6.38               ha978bb4_0    conda-forge
libprotobuf               3.21.7               hbc0c0cd_0    conda-forge
libsodium                 1.0.18               hbcb3906_1    conda-forge
libsqlite                 3.39.3               ha978bb4_0    conda-forge
libssh2                   1.10.0               h7535e13_3    conda-forge
libthrift                 0.16.0               h08c06f4_2    conda-forge
libtiff                   4.4.0                hdb44e8a_4    conda-forge
libutf8proc               2.7.0                h0d85af4_0    conda-forge
libwebp-base              1.2.4                h775f41a_0    conda-forge
libxcb                    1.13              h0d85af4_1004    conda-forge
libxml2                   2.9.14               hea49891_4    conda-forge
libxslt                   1.1.35               heaa0ce8_0    conda-forge
libzlib                   1.2.12               hfd90126_3    conda-forge
llvm-openmp               14.0.4               ha654fa7_0    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lxml                      4.9.1            py39h701faf5_0    conda-forge
lz4                       4.0.0            py39h263ca4c_2    conda-forge
lz4-c                     1.9.3                he49afe7_1    conda-forge
make                      4.3                  h22f3db7_1    conda-forge
makefun                   1.15.0             pyhd8ed1ab_0    conda-forge
markdown-it-py            2.1.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.1            py39h63b48b0_1    conda-forge
marshmallow               3.18.0             pyhd8ed1ab_0    conda-forge
matplotlib-inline         0.1.6              pyhd8ed1ab_0    conda-forge
mdit-py-plugins           0.3.0              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.0              pyhd8ed1ab_0    conda-forge
milksnake                 0.1.5              pyhd8ed1ab_1    conda-forge
minimalkv                 1.4.3                    pypi_0    pypi
mistune                   2.0.4              pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.4            py39h7c694c3_0    conda-forge
nbclient                  0.6.8              pyhd8ed1ab_0    conda-forge
nbconvert                 7.0.0              pyhd8ed1ab_0    conda-forge
nbconvert-core            7.0.0              pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.0.0              pyhd8ed1ab_0    conda-forge
nbformat                  5.6.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.3                  h96cf925_1    conda-forge
nest-asyncio              1.5.5              pyhd8ed1ab_0    conda-forge
nodeenv                   1.7.0              pyhd8ed1ab_0    conda-forge
notebook                  6.4.12             pyha770c72_0    conda-forge
numpy                     1.23.3           py39h34843a6_0    conda-forge
numpydoc                  1.4.0              pyhd8ed1ab_1    conda-forge
openjpeg                  2.5.0                h5d0d7b0_1    conda-forge
openssl                   1.1.1q               hfe4f2af_0    conda-forge
orc                       1.8.0                ha9d861c_0    conda-forge
oscrypto                  1.2.1              pyhd3deb0d_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.4.4            py39hca71b8a_0    conda-forge
pandoc                    2.19.2               h694c41f_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parquet-cpp               1.5.1                         1    conda-forge
parso                     0.8.3              pyhd8ed1ab_0    conda-forge
partd                     1.3.0              pyhd8ed1ab_0    conda-forge
pbr                       5.10.0             pyhd8ed1ab_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5           py39hde42818_1002    conda-forge
pillow                    9.2.0            py39h4d560c1_2    conda-forge
pip                       22.2.2             pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_0    conda-forge
plateau                   0.0.0                    pypi_0    pypi
platformdirs              2.5.2              pyhd8ed1ab_1    conda-forge
pluggy                    1.0.0            py39h6e9494a_3    conda-forge
pre-commit                2.20.0           py39h6e9494a_0    conda-forge
prometheus_client         0.14.1             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.31             pyha770c72_0    conda-forge
psutil                    5.9.2            py39ha30fb19_0    conda-forge
pthread-stubs             0.4               hc929b4f_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
py                        1.11.0             pyh6c4a22f_0    conda-forge
pyarrow                   6.0.1                    pypi_0    pypi
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pycryptodomex             3.15.0           py39h701faf5_0    conda-forge
pygments                  2.13.0             pyhd8ed1ab_0    conda-forge
pyjwt                     2.5.0              pyhd8ed1ab_0    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_1    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyrsistent                0.18.1           py39h63b48b0_1    conda-forge
pysocks                   1.7.1            py39h6e9494a_5    conda-forge
pytest                    7.1.3            py39h6e9494a_0    conda-forge
pytest-cov                3.0.0              pyhd8ed1ab_0    conda-forge
pytest-mock               3.8.2              pyhd8ed1ab_0    conda-forge
python                    3.9.13          h57e37ff_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.16.2             pyhd8ed1ab_0    conda-forge
python-slugify            6.1.2              pyhd8ed1ab_0    conda-forge
python-tzdata             2022.2             pyhd8ed1ab_0    conda-forge
python-xxhash             3.0.0            py39h63b48b0_1    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.2.1           pyhd8ed1ab_0    conda-forge
pytz-deprecation-shim     0.1.0.post0      py39h6e9494a_2    conda-forge
pyyaml                    6.0              py39h63b48b0_4    conda-forge
pyzmq                     24.0.1           py39hed8f129_0    conda-forge
re2                       2022.06.01           hb486fe8_0    conda-forge
readline                  8.1.2                h3899abd_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
ruamel.yaml               0.17.17          py39h89e85a6_1    conda-forge
ruamel.yaml.clib          0.2.6            py39h63b48b0_1    conda-forge
schema                    0.7.5              pyhd8ed1ab_0    conda-forge
scipy                     1.9.1            py39h9488793_0    conda-forge
send2trash                1.8.0              pyhd8ed1ab_0    conda-forge
setuptools                65.3.0           py39h6e9494a_0    conda-forge
setuptools-scm            7.0.5              pyhd8ed1ab_0    conda-forge
setuptools_scm            7.0.5                hd8ed1ab_0    conda-forge
simplejson                3.17.6           py39h63b48b0_1    conda-forge
simplekv                  0.14.1             pyh9f0ad1d_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                h6e38e02_1    conda-forge
snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
snowflake-connector-python 2.7.12           py39h1584358_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
sphinx                    5.1.1              pyhd8ed1ab_1    conda-forge
sphinx_rtd_theme          1.0.0              pyhd8ed1ab_0    conda-forge
sphinxcontrib-apidoc      0.3.0                      py_1    conda-forge
sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
sphinxcontrib-htmlhelp    2.0.0              pyhd8ed1ab_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_2    conda-forge
sqlite                    3.39.3               h9ae0607_0    conda-forge
stack_data                0.5.0              pyhd8ed1ab_0    conda-forge
storefact                 0.10.0                     py_0    conda-forge
tabulate                  0.8.10             pyhd8ed1ab_0    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
termcolor                 2.0.1              pyhd8ed1ab_1    conda-forge
terminado                 0.15.0           py39h6e9494a_0    conda-forge
text-unidecode            1.3                        py_0    conda-forge
tinycss2                  1.1.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h5dbffcc_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
tornado                   6.1              py39h63b48b0_3    conda-forge
tqdm                      4.64.1             pyhd8ed1ab_0    conda-forge
traitlets                 5.4.0              pyhd8ed1ab_0    conda-forge
typing-extensions         4.3.0                hd8ed1ab_0    conda-forge
typing_extensions         4.3.0              pyha770c72_0    conda-forge
tzdata                    2022c                h191b570_0    conda-forge
tzlocal                   4.2              py39h6e9494a_1    conda-forge
ukkonen                   1.0.1            py39h7248d28_2    conda-forge
unidecode                 1.3.4              pyhd8ed1ab_0    conda-forge
uritools                  4.0.0              pyhd8ed1ab_0    conda-forge
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
urlquote                  1.1.4            py39h9b8c074_5    conda-forge
virtualenv                20.16.5          py39h6e9494a_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
widgetsnbextension        4.0.3              pyhd8ed1ab_0    conda-forge
xorg-libxau               1.0.9                h35c211d_0    conda-forge
xorg-libxdmcp             1.1.3                h35c211d_0    conda-forge
xxhash                    0.8.0                h35c211d_3    conda-forge
xz                        5.2.6                h775f41a_0    conda-forge
yaml                      0.2.5                h0d85af4_2    conda-forge
zeromq                    4.3.4                he49afe7_1    conda-forge
zict                      2.2.0              pyhd8ed1ab_0    conda-forge
zipp                      3.8.1              pyhd8ed1ab_0    conda-forge
zlib                      1.2.12               hfd90126_3    conda-forge
zstandard                 0.18.0           py39h701faf5_0    conda-forge
zstd                      1.5.2                hfa58983_4    conda-forge

Rename code to `plateau`

We should change the module itself completely to plateau but not make any API breaking changes.

Drop pyarrow<3

We shouldn't be supporting and testing these older versions. Instead, we should add a >=2 pin.

Test Python 3.11

Problem description

Please describe the current behavior, why it is a problem and what the expected behavior should be.

Example code (ideally copy-pastable)

Please provide a minimal reproducible code example to reproduce the behavior,
c.f. https://stackoverflow.com/help/minimal-reproducible-example

# Your code example

Used versions

# Paste your output of `pip freeze` or `conda list` here

Support pandas 2.0

Our CI jobs for pandas 2.0 are currently failing (see, e.g., here).

I see (at least) two issues with supporting pandas 2.0:

  • pandas-dev/pandas#50127 (open PR here): this issue is causing our CI failures
  • pandas-dev/pandas#52212 changed how pandas is inferring dtypes from scalars, which results in issues when we partition by a datetime (loading the data eagerly will return a datetime64[ns], loading the data as dask data frame will return a datetime64[s]). I don't think we have tests for this by the way. That we could address by explicitly setting the units to nanoseconds here.

Support `pandas=1.5`

As the Index classes have been removed, plateau is not compatible with pandas=1.5. We should add support for it and add a matrix entry for older pandas versions.

Re-add numpy nightlies

Currently the nightly tests with numpy are failing since a significant period with np.array([[np.nan]]).astype(str) leading to an error. We should investigate the underlying issue and re-add the tests once we fixed those.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.