data-engineering-collective / plateau Goto Github PK
View Code? Open in Web Editor NEWFlat files, flat land.
License: MIT License
Flat files, flat land.
License: MIT License
When loading a column as a category and one of the partitions is only None
values, the error ValueError: Categorical categories cannot be null
. This is actually a bug with pyarrow linked here.
I would be nice to patch this issue for the time being.
from functools import partial
import minimalkv
import pandas as pd
from plateau.io.iter import read_dataset_as_dataframes__iterator, store_dataframes_as_dataset__iter
df = pd.DataFrame(
{
"x": ["a", "b", None, None],
"chunk": [0, 0, 1, 1],
}
)
path = "/tmp"
store = partial(minimalkv.get_store_from_url, f"hfs://{path}?create_if_missing=false")
store_dataframes_as_dataset__iter(
df_generator=(df.loc[lambda x: x["chunk"] == chunk] for chunk in [0, 1]),
dataset_uuid="lol",
store=store,
partition_on=["chunk"],
overwrite=True
)
dfs = read_dataset_as_dataframes__iterator(
dataset_uuid="lol",
store=store,
columns=["x"],
categoricals=["x"],
dispatch_by=["chunk"],
)
# ERROR
df = pd.concat(dfs)
print(df['x'])
# Name Version Build Channel
alabaster 0.7.12 py_0 conda-forge
altair 4.2.0 pyhd8ed1ab_1 conda-forge
appnope 0.1.3 pyhd8ed1ab_0 conda-forge
argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge
argon2-cffi-bindings 21.2.0 py39h63b48b0_2 conda-forge
arrow-cpp 9.0.0 py39h04a14be_7_cpu conda-forge
asn1crypto 1.5.1 pyhd8ed1ab_0 conda-forge
asttokens 2.0.8 pyhd8ed1ab_0 conda-forge
attrs 22.1.0 pyh71513ae_1 conda-forge
aws-c-cal 0.5.11 hd2e2f4b_0 conda-forge
aws-c-common 0.6.2 h0d85af4_0 conda-forge
aws-c-event-stream 0.2.7 hb9330a7_13 conda-forge
aws-c-io 0.10.5 h35aa462_0 conda-forge
aws-checksums 0.1.11 h0010a65_7 conda-forge
aws-sdk-cpp 1.8.186 h766a74d_3 conda-forge
babel 2.10.3 pyhd8ed1ab_0 conda-forge
backcall 0.2.0 pyh9f0ad1d_0 conda-forge
backports 1.0 py_2 conda-forge
backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge
beautifulsoup4 4.11.1 pyha770c72_0 conda-forge
bleach 5.0.1 pyhd8ed1ab_0 conda-forge
bokeh 2.4.3 py39h6e9494a_0 conda-forge
brotlipy 0.7.0 py39h63b48b0_1004 conda-forge
bzip2 1.0.8 h0d85af4_4 conda-forge
c-ares 1.18.1 h0d85af4_0 conda-forge
ca-certificates 2022.9.24 h033912b_0 conda-forge
certifi 2022.9.24 pyhd8ed1ab_0 conda-forge
cffi 1.15.1 py39hae9ecf2_0 conda-forge
cfgv 3.3.1 pyhd8ed1ab_0 conda-forge
charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge
click 8.1.3 py39h6e9494a_0 conda-forge
cloudpickle 2.2.0 pyhd8ed1ab_0 conda-forge
colorama 0.4.5 pyhd8ed1ab_0 conda-forge
contextlib2 0.5.5 py_2 conda-forge
coverage 6.4.4 py39h6218fd2_0 conda-forge
cryptography 35.0.0 py39h209aa08_2 conda-forge
cytoolz 0.12.0 py39h701faf5_0 conda-forge
dask 2022.5.2 pyhd8ed1ab_0 conda-forge
dask-core 2022.5.2 pyhd8ed1ab_0 conda-forge
dataclasses 0.8 pyhc8e2a94_3 conda-forge
debugpy 1.6.3 py39hd91caee_0 conda-forge
decorator 5.1.1 pyhd8ed1ab_0 conda-forge
defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge
distlib 0.3.5 pyhd8ed1ab_0 conda-forge
distributed 2022.5.2 pyhd8ed1ab_0 conda-forge
docutils 0.17.1 py39h6e9494a_2 conda-forge
entrypoints 0.4 pyhd8ed1ab_0 conda-forge
executing 1.0.0 pyhd8ed1ab_0 conda-forge
filelock 3.8.0 pyhd8ed1ab_0 conda-forge
flit-core 3.7.1 pyhd8ed1ab_0 conda-forge
freetype 2.12.1 h3f81eb7_0 conda-forge
freezegun 1.2.2 pyhd8ed1ab_0 conda-forge
fsspec 2022.8.2 pyhd8ed1ab_0 conda-forge
gflags 2.2.2 hb1e8313_1004 conda-forge
glog 0.6.0 h8ac2a54_0 conda-forge
gmp 6.2.1 h2e338ed_0 conda-forge
great-expectations 0.15.24 pyhd8ed1ab_0 conda-forge
grpc-cpp 1.47.1 h834a566_6 conda-forge
heapdict 1.0.1 py_0 conda-forge
icu 70.1 h96cf925_0 conda-forge
identify 2.5.5 pyhd8ed1ab_0 conda-forge
idna 3.4 pyhd8ed1ab_0 conda-forge
imagesize 1.4.1 pyhd8ed1ab_0 conda-forge
importlib-metadata 4.11.4 py39h6e9494a_0 conda-forge
importlib_resources 5.9.0 pyhd8ed1ab_0 conda-forge
iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge
ipykernel 6.15.3 pyh736e0ef_0 conda-forge
ipython 8.5.0 pyhd1c38e8_1 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
ipywidgets 8.0.2 pyhd8ed1ab_1 conda-forge
jedi 0.18.1 py39h6e9494a_1 conda-forge
jinja2 3.1.2 pyhd8ed1ab_1 conda-forge
jpeg 9e hac89ed1_2 conda-forge
jsonpatch 1.32 pyhd8ed1ab_0 conda-forge
jsonpointer 2.0 py_0 conda-forge
jsonschema 4.16.0 pyhd8ed1ab_0 conda-forge
jupyter_client 7.3.4 pyhd8ed1ab_0 conda-forge
jupyter_core 4.11.1 py39h6e9494a_0 conda-forge
jupyterlab_pygments 0.2.2 pyhd8ed1ab_0 conda-forge
jupyterlab_widgets 3.0.3 pyhd8ed1ab_0 conda-forge
jupytext 1.14.0 pyheef035f_0 conda-forge
kartothek 4.0.4.dev0+gdb840b5.d20221018 pypi_0 pypi
krb5 1.19.3 hb49756b_0 conda-forge
lcms2 2.12 h577c468_0 conda-forge
lerc 4.0.0 hb486fe8_0 conda-forge
libabseil 20220623.0 cxx17_h844d122_4 conda-forge
libblas 3.9.0 16_osx64_openblas conda-forge
libbrotlicommon 1.0.9 h5eb16cf_7 conda-forge
libbrotlidec 1.0.9 h5eb16cf_7 conda-forge
libbrotlienc 1.0.9 h5eb16cf_7 conda-forge
libcblas 3.9.0 16_osx64_openblas conda-forge
libcrc32c 1.1.2 he49afe7_0 conda-forge
libcurl 7.83.1 h372c54d_0 conda-forge
libcxx 14.0.6 hccf4f1f_0 conda-forge
libdeflate 1.14 hb7f2c08_0 conda-forge
libedit 3.1.20191231 h0678c8f_2 conda-forge
libev 4.33 haf1e3a3_1 conda-forge
libevent 2.1.10 h815e4d9_4 conda-forge
libffi 3.4.2 h0d85af4_5 conda-forge
libgfortran 5.0.0 10_4_0_h97931a8_25 conda-forge
libgfortran5 11.3.0 h082f757_25 conda-forge
libgoogle-cloud 2.2.0 hb0fe3b0_1 conda-forge
libiconv 1.16 haf1e3a3_0 conda-forge
liblapack 3.9.0 16_osx64_openblas conda-forge
libnghttp2 1.47.0 h7cbc4dc_1 conda-forge
libopenblas 0.3.21 openmp_h429af6e_3 conda-forge
libpng 1.6.38 ha978bb4_0 conda-forge
libprotobuf 3.21.7 hbc0c0cd_0 conda-forge
libsodium 1.0.18 hbcb3906_1 conda-forge
libsqlite 3.39.3 ha978bb4_0 conda-forge
libssh2 1.10.0 h7535e13_3 conda-forge
libthrift 0.16.0 h08c06f4_2 conda-forge
libtiff 4.4.0 hdb44e8a_4 conda-forge
libutf8proc 2.7.0 h0d85af4_0 conda-forge
libwebp-base 1.2.4 h775f41a_0 conda-forge
libxcb 1.13 h0d85af4_1004 conda-forge
libxml2 2.9.14 hea49891_4 conda-forge
libxslt 1.1.35 heaa0ce8_0 conda-forge
libzlib 1.2.12 hfd90126_3 conda-forge
llvm-openmp 14.0.4 ha654fa7_0 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
lxml 4.9.1 py39h701faf5_0 conda-forge
lz4 4.0.0 py39h263ca4c_2 conda-forge
lz4-c 1.9.3 he49afe7_1 conda-forge
make 4.3 h22f3db7_1 conda-forge
makefun 1.15.0 pyhd8ed1ab_0 conda-forge
markdown-it-py 2.1.0 pyhd8ed1ab_0 conda-forge
markupsafe 2.1.1 py39h63b48b0_1 conda-forge
marshmallow 3.18.0 pyhd8ed1ab_0 conda-forge
matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge
mdit-py-plugins 0.3.0 pyhd8ed1ab_0 conda-forge
mdurl 0.1.0 pyhd8ed1ab_0 conda-forge
milksnake 0.1.5 pyhd8ed1ab_1 conda-forge
minimalkv 1.4.3 pypi_0 pypi
mistune 2.0.4 pyhd8ed1ab_0 conda-forge
msgpack-python 1.0.4 py39h7c694c3_0 conda-forge
nbclient 0.6.8 pyhd8ed1ab_0 conda-forge
nbconvert 7.0.0 pyhd8ed1ab_0 conda-forge
nbconvert-core 7.0.0 pyhd8ed1ab_0 conda-forge
nbconvert-pandoc 7.0.0 pyhd8ed1ab_0 conda-forge
nbformat 5.6.0 pyhd8ed1ab_0 conda-forge
ncurses 6.3 h96cf925_1 conda-forge
nest-asyncio 1.5.5 pyhd8ed1ab_0 conda-forge
nodeenv 1.7.0 pyhd8ed1ab_0 conda-forge
notebook 6.4.12 pyha770c72_0 conda-forge
numpy 1.23.3 py39h34843a6_0 conda-forge
numpydoc 1.4.0 pyhd8ed1ab_1 conda-forge
openjpeg 2.5.0 h5d0d7b0_1 conda-forge
openssl 1.1.1q hfe4f2af_0 conda-forge
orc 1.8.0 ha9d861c_0 conda-forge
oscrypto 1.2.1 pyhd3deb0d_0 conda-forge
packaging 21.3 pyhd8ed1ab_0 conda-forge
pandas 1.4.4 py39hca71b8a_0 conda-forge
pandoc 2.19.2 h694c41f_0 conda-forge
pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge
parquet-cpp 1.5.1 1 conda-forge
parso 0.8.3 pyhd8ed1ab_0 conda-forge
partd 1.3.0 pyhd8ed1ab_0 conda-forge
pbr 5.10.0 pyhd8ed1ab_0 conda-forge
pexpect 4.8.0 pyh9f0ad1d_2 conda-forge
pickleshare 0.7.5 py39hde42818_1002 conda-forge
pillow 9.2.0 py39h4d560c1_2 conda-forge
pip 22.2.2 pyhd8ed1ab_0 conda-forge
pkgutil-resolve-name 1.3.10 pyhd8ed1ab_0 conda-forge
plateau 0.0.0 pypi_0 pypi
platformdirs 2.5.2 pyhd8ed1ab_1 conda-forge
pluggy 1.0.0 py39h6e9494a_3 conda-forge
pre-commit 2.20.0 py39h6e9494a_0 conda-forge
prometheus_client 0.14.1 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.31 pyha770c72_0 conda-forge
psutil 5.9.2 py39ha30fb19_0 conda-forge
pthread-stubs 0.4 hc929b4f_1001 conda-forge
ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge
pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge
py 1.11.0 pyh6c4a22f_0 conda-forge
pyarrow 6.0.1 pypi_0 pypi
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pycryptodomex 3.15.0 py39h701faf5_0 conda-forge
pygments 2.13.0 pyhd8ed1ab_0 conda-forge
pyjwt 2.5.0 pyhd8ed1ab_0 conda-forge
pyopenssl 22.0.0 pyhd8ed1ab_1 conda-forge
pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge
pyrsistent 0.18.1 py39h63b48b0_1 conda-forge
pysocks 1.7.1 py39h6e9494a_5 conda-forge
pytest 7.1.3 py39h6e9494a_0 conda-forge
pytest-cov 3.0.0 pyhd8ed1ab_0 conda-forge
pytest-mock 3.8.2 pyhd8ed1ab_0 conda-forge
python 3.9.13 h57e37ff_0_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python-fastjsonschema 2.16.2 pyhd8ed1ab_0 conda-forge
python-slugify 6.1.2 pyhd8ed1ab_0 conda-forge
python-tzdata 2022.2 pyhd8ed1ab_0 conda-forge
python-xxhash 3.0.0 py39h63b48b0_1 conda-forge
python_abi 3.9 2_cp39 conda-forge
pytz 2022.2.1 pyhd8ed1ab_0 conda-forge
pytz-deprecation-shim 0.1.0.post0 py39h6e9494a_2 conda-forge
pyyaml 6.0 py39h63b48b0_4 conda-forge
pyzmq 24.0.1 py39hed8f129_0 conda-forge
re2 2022.06.01 hb486fe8_0 conda-forge
readline 8.1.2 h3899abd_0 conda-forge
requests 2.28.1 pyhd8ed1ab_1 conda-forge
ruamel.yaml 0.17.17 py39h89e85a6_1 conda-forge
ruamel.yaml.clib 0.2.6 py39h63b48b0_1 conda-forge
schema 0.7.5 pyhd8ed1ab_0 conda-forge
scipy 1.9.1 py39h9488793_0 conda-forge
send2trash 1.8.0 pyhd8ed1ab_0 conda-forge
setuptools 65.3.0 py39h6e9494a_0 conda-forge
setuptools-scm 7.0.5 pyhd8ed1ab_0 conda-forge
setuptools_scm 7.0.5 hd8ed1ab_0 conda-forge
simplejson 3.17.6 py39h63b48b0_1 conda-forge
simplekv 0.14.1 pyh9f0ad1d_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
snappy 1.1.9 h6e38e02_1 conda-forge
snowballstemmer 2.2.0 pyhd8ed1ab_0 conda-forge
snowflake-connector-python 2.7.12 py39h1584358_0 conda-forge
sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
soupsieve 2.3.2.post1 pyhd8ed1ab_0 conda-forge
sphinx 5.1.1 pyhd8ed1ab_1 conda-forge
sphinx_rtd_theme 1.0.0 pyhd8ed1ab_0 conda-forge
sphinxcontrib-apidoc 0.3.0 py_1 conda-forge
sphinxcontrib-applehelp 1.0.2 py_0 conda-forge
sphinxcontrib-devhelp 1.0.2 py_0 conda-forge
sphinxcontrib-htmlhelp 2.0.0 pyhd8ed1ab_0 conda-forge
sphinxcontrib-jsmath 1.0.1 py_0 conda-forge
sphinxcontrib-qthelp 1.0.3 py_0 conda-forge
sphinxcontrib-serializinghtml 1.1.5 pyhd8ed1ab_2 conda-forge
sqlite 3.39.3 h9ae0607_0 conda-forge
stack_data 0.5.0 pyhd8ed1ab_0 conda-forge
storefact 0.10.0 py_0 conda-forge
tabulate 0.8.10 pyhd8ed1ab_0 conda-forge
tblib 1.7.0 pyhd8ed1ab_0 conda-forge
termcolor 2.0.1 pyhd8ed1ab_1 conda-forge
terminado 0.15.0 py39h6e9494a_0 conda-forge
text-unidecode 1.3 py_0 conda-forge
tinycss2 1.1.1 pyhd8ed1ab_0 conda-forge
tk 8.6.12 h5dbffcc_0 conda-forge
toml 0.10.2 pyhd8ed1ab_0 conda-forge
tomli 2.0.1 pyhd8ed1ab_0 conda-forge
toolz 0.12.0 pyhd8ed1ab_0 conda-forge
tornado 6.1 py39h63b48b0_3 conda-forge
tqdm 4.64.1 pyhd8ed1ab_0 conda-forge
traitlets 5.4.0 pyhd8ed1ab_0 conda-forge
typing-extensions 4.3.0 hd8ed1ab_0 conda-forge
typing_extensions 4.3.0 pyha770c72_0 conda-forge
tzdata 2022c h191b570_0 conda-forge
tzlocal 4.2 py39h6e9494a_1 conda-forge
ukkonen 1.0.1 py39h7248d28_2 conda-forge
unidecode 1.3.4 pyhd8ed1ab_0 conda-forge
uritools 4.0.0 pyhd8ed1ab_0 conda-forge
urllib3 1.26.11 pyhd8ed1ab_0 conda-forge
urlquote 1.1.4 py39h9b8c074_5 conda-forge
virtualenv 20.16.5 py39h6e9494a_0 conda-forge
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
widgetsnbextension 4.0.3 pyhd8ed1ab_0 conda-forge
xorg-libxau 1.0.9 h35c211d_0 conda-forge
xorg-libxdmcp 1.1.3 h35c211d_0 conda-forge
xxhash 0.8.0 h35c211d_3 conda-forge
xz 5.2.6 h775f41a_0 conda-forge
yaml 0.2.5 h0d85af4_2 conda-forge
zeromq 4.3.4 he49afe7_1 conda-forge
zict 2.2.0 pyhd8ed1ab_0 conda-forge
zipp 3.8.1 pyhd8ed1ab_0 conda-forge
zlib 1.2.12 hfd90126_3 conda-forge
zstandard 0.18.0 py39h701faf5_0 conda-forge
zstd 1.5.2 hfa58983_4 conda-forge
We should change the module itself completely to plateau
but not make any API breaking changes.
We should change all occurences like setup.py
and linter configurations to move to a Python 3.8+ setup.
We shouldn't be supporting and testing these older versions. Instead, we should add a >=2
pin.
Add CI for Python 3.9 and fix the failing tests.
Please describe the current behavior, why it is a problem and what the expected behavior should be.
Please provide a minimal reproducible code example to reproduce the behavior,
c.f. https://stackoverflow.com/help/minimal-reproducible-example
# Your code example
# Paste your output of `pip freeze` or `conda list` here
We should also enable the use of type annotations from minimalkv
as the package is typed.
Our CI jobs for pandas 2.0 are currently failing (see, e.g., here).
I see (at least) two issues with supporting pandas 2.0:
datetime64[ns]
, loading the data as dask data frame will return a datetime64[s]
). I don't think we have tests for this by the way. That we could address by explicitly setting the units to nanoseconds here.We currently have to pin Python in the mypy pre-commit hook as we prin numpy to 1.22 there. We should update to a new version of numpy and remove the Python pin.
As the Index classes have been removed, plateau
is not compatible with pandas=1.5
. We should add support for it and add a matrix entry for older pandas versions.
Currently the nightly tests with numpy are failing since a significant period with np.array([[np.nan]]).astype(str)
leading to an error. We should investigate the underlying issue and re-add the tests once we fixed those.
Add CI for Python 3.10 and fix the failing tests.
In the process of fixing the documentation in #5 , I've added :okwarning:
markers at various places. We should fix the warnings and remove the markers again then.
@jtilly @DamianB-BitFlipper @DamianBarabonkovQC @xhochy @fjetter seems like a good collection. Anyone else interested?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.