Comments (3)
I have same issue, please downgrade pyarrow==15.0.2, it seem datasets library need to be fix
from datasets.
Please note that the error is raised just at import:
import pyarrow.parquet as pq
Therefore it must be caused by some problem with your pyarrow installation. I would recommend you uninstall and install pyarrow again.
I also see that it seems you use conda to install pyarrow. Please note that pyarrow offers 3 different packages in conda-forge: https://arrow.apache.org/docs/python/install.html#using-conda
conda install -c conda-forge pyarrow
While the pyarrow conda-forge package is the right choice for most users, both a minimal and maximal variant of the package exist, either of which may be better for your use case. See Differences between conda-forge packages.
Please, make sure you install the right one: I guess it is either pyarrow
(or pyarrow-all
).
from datasets.
It is not a problem with the datasets
library: we support latest version of pyarrow
and our Continuous Integration tests are using pyarrow 16.1.0 without any problem.
The error reported here is raised when importing pyarrow.parquet:
---> 29 import pyarrow.parquet as pq
File /opt/conda/lib/python3.10/site-packages/pyarrow/parquet/__init__.py:20
1 # Licensed to the Apache Software Foundation (ASF) under one
2 # or more contributor license agreements. See the NOTICE file
3 # distributed with this work for additional information
(...)
17
18 # flake8: noqa
---> 20 from .core import *
File /opt/conda/lib/python3.10/site-packages/pyarrow/parquet/core.py:33
30 import pyarrow as pa
32 try:
---> 33 import pyarrow._parquet as _parquet
34 except ImportError as exc:
35 raise ImportError(
36 "The pyarrow installation is not built with support "
37 f"for the Parquet file format ({str(exc)})"
38 ) from None
File /opt/conda/lib/python3.10/site-packages/pyarrow/_parquet.pyx:1, in init pyarrow._parquet()
AttributeError: module 'pyarrow.lib' has no attribute 'ListViewType'
This can only be explained if pyarrow was not properly installed.
If the user just installed pyarrow-core
from conda-forge, then its parquet subpackage is not installed and cannot be imported. You can check pyarrow docs:
- Differences between conda-forge packages: https://arrow.apache.org/docs/python/install.html#python-conda-differences
The
pyarrow-core
package includes the following functionality:
...
Thepyarrow
package adds the following:
...
Parquet (i.e.,pyarrow.parquet
)
from datasets.
Related Issues (20)
- CI is broken for faiss tests on Windows: node down: Not properly terminated
- `drop_duplicates` method HOT 1
- `load_dataset` fails to load dataset saved by `save_to_disk`
- Casting list array to fixed size list raises error
- There is dead code after we require pyarrow >= 15.0.0
- Streaming dataset not returning data
- load_dataset on AWS lambda throws OSError(30, 'Read-only file system') error HOT 1
- Add option to disable progress bar when reading a dataset ("Loading dataset from disk") HOT 2
- CI quality is broken: use ruff check instead
- `from_generator` does not allow to specify the split name HOT 1
- Docs are not generated when a parameter defaults to a NamedSplit value
- A bug of Dataset.to_json() function HOT 1
- Yes, can definitely elaborate: HOT 1
- load `streaming=True` dataset with downloaded cache HOT 2
- `sort` after `filter` unreasonably slow HOT 1
- Save Dataset as Sharded Parquet HOT 2
- ImportError: numpy.core.multiarray when using `filter` HOT 4
- Save nparray as list HOT 5
- How to set_epoch with interleave_datasets? HOT 7
- Datasets.datafiles resolve_pattern `TypeError: can only concatenate tuple (not "str") to tuple` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datasets.