Giter VIP home page Giter VIP logo

Comments (8)

hayesgb avatar hayesgb commented on May 27, 2024

I've only seen this when I'm referencing a file or container that isn't present. Can you try running:

from adlfs import AzureBlobFileSystem
fs = AzureBlobFileSystem(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY, container_name=CONTAINER)
files = fs.glob('/nyctaxi/2015/*.csv')

This instantiates the filesystem and should return a list of all files Dask will expect to find. The most likely explanation is that one of the items being returned has a size of 0. You can also try fs.walk(filepath) and fs.info(file) to get more detailed information.

from adlfs.

danielsc avatar danielsc commented on May 27, 2024

the same error occurs when I just run your code above:

from adlfs import AzureBlobFileSystem
fs = AzureBlobFileSystem(account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY, container_name=CONTAINER)
files = fs.glob('/nyctaxi/2015/*.csv')

so the glob is somewhat unhappy. If I look at the files found by glob, they are complete:

['nyctaxi/2015/yellow_tripdata_2015-01.csv',
 'nyctaxi/2015/yellow_tripdata_2015-02.csv',
 'nyctaxi/2015/yellow_tripdata_2015-03.csv',
 'nyctaxi/2015/yellow_tripdata_2015-04.csv',
 'nyctaxi/2015/yellow_tripdata_2015-05.csv',
 'nyctaxi/2015/yellow_tripdata_2015-06.csv',
 'nyctaxi/2015/yellow_tripdata_2015-07.csv',
 'nyctaxi/2015/yellow_tripdata_2015-08.csv',
 'nyctaxi/2015/yellow_tripdata_2015-09.csv',
 'nyctaxi/2015/yellow_tripdata_2015-10.csv',
 'nyctaxi/2015/yellow_tripdata_2015-11.csv',
 'nyctaxi/2015/yellow_tripdata_2015-12.csv']

Which is the same that I see in the storage explorer:
image

from adlfs.

martindurant avatar martindurant commented on May 27, 2024

^ there appears to be a different in the initial "/"

from adlfs.

danielsc avatar danielsc commented on May 27, 2024

I tried files = fs.glob('nyctaxi/2015/*.csv'), but that yields the same error....

it looks like it is happening in info in core.py when testing the directory for whether it is a file:

image

Which is called from find in spec.py:
image

Path is 'nyctaxi/2015/' in the above case and getting blob properties on a directory path seems to fail since directories don't really exist as entities in Blob (unless created by BlobFUSE, but even then not with a trailing /)?).

SInce it is happening after the files in the directory were listed, I am getting all my results, but somewhat consistently, this is also giving me an error: files = fs.glob('/nyctaxi/*/*.csv') and the returned array is empty.

from adlfs.

hayesgb avatar hayesgb commented on May 27, 2024

@danielsc -- I've written a few tests that (I think) replicate the source of the problem you're observing, and then pushed a branch (blob_not_exist_exception). Any chance you can test that branch and give some feeback?

from adlfs.

AlbertDeFusco avatar AlbertDeFusco commented on May 27, 2024

I get the same kind of error when reading a partitioned parquet data set. ls returns

['bike.parq/_common_metadata',
 'bike.parq/_metadata',
 'bike.parq/part.0.parquet',
 'bike.parq/part.1.parquet',
 'bike.parq/part.10.parquet',
 'bike.parq/part.2.parquet',
 'bike.parq/part.3.parquet',
 'bike.parq/part.4.parquet',
 'bike.parq/part.5.parquet',
 'bike.parq/part.6.parquet',
 'bike.parq/part.7.parquet',
 'bike.parq/part.8.parquet',
 'bike.parq/part.9.parquet',

then when I attempt to read the directory it fails.

ERROR:azure.storage.common.storageclient:Client-Request-ID=31e51f18-1452-11ea-a105-3af9d3e408b5 Retry policy did not allow for a retry: Server-Timestamp=Sun, 01 Dec 2019 15:49:44 GMT, Server-Request-ID=99d864d0-4a64-42b5-b7ab-120b04143175, HTTP status code=404, Exception=The specified blob does not exist. ErrorCode: BlobNotFound.
---------------------------------------------------------------------------
AzureMissingResourceHttpError             Traceback (most recent call last)
<ipython-input-33-0f207f108c7a> in <module>
----> 1 b = dd.read_parquet('abfs://data/bike.parquet', engine='fastparquet', storage_options=STORAGE_OPTIONS)


...

~/Development/AnacondaPlatform/training-ae5-projects/RemoteDataAzure/envs/default/lib/python3.7/site-packages/azure/storage/common/_error.py in _http_error_handler(http_error)
    113     ex.error_code = error_code
    114 
--> 115     raise ex
    116 
    117 

AzureMissingResourceHttpError: The specified blob does not exist. ErrorCode: BlobNotFound

I will give your branch a run in few days

from adlfs.

AlbertDeFusco avatar AlbertDeFusco commented on May 27, 2024

I've confirmed that combining the two PRs fixes the glob and dask issues.

from adlfs.

hayesgb avatar hayesgb commented on May 27, 2024

Thanks for verifying @AlbertDeFusco.

from adlfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.