Comments (8)
I've only seen this when I'm referencing a file or container that isn't present. Can you try running:
from adlfs import AzureBlobFileSystem
fs = AzureBlobFileSystem(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY, container_name=CONTAINER)
files = fs.glob('/nyctaxi/2015/*.csv')
This instantiates the filesystem and should return a list of all files Dask will expect to find. The most likely explanation is that one of the items being returned has a size of 0. You can also try fs.walk(filepath)
and fs.info(file)
to get more detailed information.
from adlfs.
the same error occurs when I just run your code above:
from adlfs import AzureBlobFileSystem
fs = AzureBlobFileSystem(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY, container_name=CONTAINER)
files = fs.glob('/nyctaxi/2015/*.csv')
so the glob is somewhat unhappy. If I look at the files found by glob
, they are complete:
['nyctaxi/2015/yellow_tripdata_2015-01.csv',
'nyctaxi/2015/yellow_tripdata_2015-02.csv',
'nyctaxi/2015/yellow_tripdata_2015-03.csv',
'nyctaxi/2015/yellow_tripdata_2015-04.csv',
'nyctaxi/2015/yellow_tripdata_2015-05.csv',
'nyctaxi/2015/yellow_tripdata_2015-06.csv',
'nyctaxi/2015/yellow_tripdata_2015-07.csv',
'nyctaxi/2015/yellow_tripdata_2015-08.csv',
'nyctaxi/2015/yellow_tripdata_2015-09.csv',
'nyctaxi/2015/yellow_tripdata_2015-10.csv',
'nyctaxi/2015/yellow_tripdata_2015-11.csv',
'nyctaxi/2015/yellow_tripdata_2015-12.csv']
Which is the same that I see in the storage explorer:
from adlfs.
^ there appears to be a different in the initial "/"
from adlfs.
I tried files = fs.glob('nyctaxi/2015/*.csv')
, but that yields the same error....
it looks like it is happening in info
in core.py
when testing the directory for whether it is a file:
Which is called from find
in spec.py
:
Path is 'nyctaxi/2015/'
in the above case and getting blob properties on a directory path seems to fail since directories don't really exist as entities in Blob (unless created by BlobFUSE, but even then not with a trailing /
)?).
SInce it is happening after the files in the directory were listed, I am getting all my results, but somewhat consistently, this is also giving me an error: files = fs.glob('/nyctaxi/*/*.csv')
and the returned array is empty.
from adlfs.
@danielsc -- I've written a few tests that (I think) replicate the source of the problem you're observing, and then pushed a branch (blob_not_exist_exception). Any chance you can test that branch and give some feeback?
from adlfs.
I get the same kind of error when reading a partitioned parquet data set. ls returns
['bike.parq/_common_metadata',
'bike.parq/_metadata',
'bike.parq/part.0.parquet',
'bike.parq/part.1.parquet',
'bike.parq/part.10.parquet',
'bike.parq/part.2.parquet',
'bike.parq/part.3.parquet',
'bike.parq/part.4.parquet',
'bike.parq/part.5.parquet',
'bike.parq/part.6.parquet',
'bike.parq/part.7.parquet',
'bike.parq/part.8.parquet',
'bike.parq/part.9.parquet',
then when I attempt to read the directory it fails.
ERROR:azure.storage.common.storageclient:Client-Request-ID=31e51f18-1452-11ea-a105-3af9d3e408b5 Retry policy did not allow for a retry: Server-Timestamp=Sun, 01 Dec 2019 15:49:44 GMT, Server-Request-ID=99d864d0-4a64-42b5-b7ab-120b04143175, HTTP status code=404, Exception=The specified blob does not exist. ErrorCode: BlobNotFound.
---------------------------------------------------------------------------
AzureMissingResourceHttpError Traceback (most recent call last)
<ipython-input-33-0f207f108c7a> in <module>
----> 1 b = dd.read_parquet('abfs://data/bike.parquet', engine='fastparquet', storage_options=STORAGE_OPTIONS)
...
~/Development/AnacondaPlatform/training-ae5-projects/RemoteDataAzure/envs/default/lib/python3.7/site-packages/azure/storage/common/_error.py in _http_error_handler(http_error)
113 ex.error_code = error_code
114
--> 115 raise ex
116
117
AzureMissingResourceHttpError: The specified blob does not exist. ErrorCode: BlobNotFound
I will give your branch a run in few days
from adlfs.
I've confirmed that combining the two PRs fixes the glob and dask issues.
from adlfs.
Thanks for verifying @AlbertDeFusco.
from adlfs.
Related Issues (20)
- "sdk_moniker" key error HOT 9
- Avoid private APIs from azure.storage HOT 2
- InternalServerError while writing large json data.
- await file_obj.credential.close() : TypeError: object NoneType can't be used in 'await' expression HOT 4
- update readme HOT 1
- Support py3.12
- `find` doesn't accept `maxdepth` parameter HOT 1
- Add use_emulator setting to better align with object_store crate HOT 1
- Current state of the library, milestones and current development HOT 1
- Concurrent download of multiple files HOT 1
- Support virtual directory stubs with uppercase "Hdi_isfolder" metadata HOT 1
- Feature Suggestion: Optional content type when for writing file HOT 2
- Support passing url in AzureBlobFileSystem HOT 1
- Add comment why `aiohttp` is required
- Fix typo in repo About
- Python 3.12 support blocked by aiohttp HOT 1
- Feature Request: Support for Adding Metadata to Blobs
- Runtime warning from missing await HOT 2
- `fs.info()` and `fs.ls(detail=True)` return different etag formats
- Issue with parallel uploads to the same blob
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adlfs.