Comments (3)
Thanks for opening this issue. A couple questions:
- Can you share an example of an operation that's enabled by the data lake APIs that isn't possible (or is maybe slower) with the Blob Storage APIs? Just trying to understand why a user might want this.
- Do you have a suggestion for how this might be implemented, and what the user facing API would be?
It looks like historically this library had two implementations: one for Data Lake (Gen 1) and one for Blob Storage. API-wise, would we want to keep these separate? Or would we want a single AzureBlobFileSystem
with a keyword that controls the underlying Azure client we use?
from adlfs.
Not OP here, but the initial sales pitch for data lake gen2 when it was launched was that it understands file system structure. The name is a bit unfortunate, because it suggests that the product is related to data lake gen1, which it really isn't, to any significant degree. I always considered it "blob storage with first class folder structure".
With azure-storage-file-datalake
listing the contents of a directory is very fast and you can expand the file tree one level at a time. If I recall correctly, you need to do prefix/glob-match with BlobServiceClient
(this might not be true anymore, it's been years since I worked with it). Depending on use-cases, that might be very slow. For datasets with many partitions, it makes a pretty big difference if you mostly access them with partition filters since listing blobs is/used to be so slow. I guess you also have atomic/cheap renames of folders, which I can't imagine is easy to achieve with the blob API.
Data lake gen2 also supports a bunch of things that I don't think are relevant to this project, like setting up ACL/RBAC for folders, that aren't supported by blog storage. People who also use adlfs may be using azure-storage-file-datalake
to do those things before/after writing data, so they may already have a configured client instance available but that seems like a pretty weak reason to take on the complexity of supporting both clients.
from adlfs.
@efiop @hayesgb this will increase the speed a lot, please take a look :)
from adlfs.
Related Issues (20)
- UserWarning: Failed to fetch container properties for CONTAINER_NAME. Assume it exists already HOT 1
- "sdk_moniker" key error HOT 9
- Avoid private APIs from azure.storage HOT 2
- InternalServerError while writing large json data.
- await file_obj.credential.close() : TypeError: object NoneType can't be used in 'await' expression HOT 4
- update readme HOT 1
- Support py3.12
- `find` doesn't accept `maxdepth` parameter HOT 1
- Add use_emulator setting to better align with object_store crate HOT 1
- Current state of the library, milestones and current development HOT 1
- Concurrent download of multiple files HOT 1
- Support virtual directory stubs with uppercase "Hdi_isfolder" metadata HOT 1
- Feature Suggestion: Optional content type when for writing file HOT 2
- Support passing url in AzureBlobFileSystem HOT 1
- Add comment why `aiohttp` is required
- Fix typo in repo About
- Python 3.12 support blocked by aiohttp HOT 1
- Feature Request: Support for Adding Metadata to Blobs
- Runtime warning from missing await HOT 2
- `fs.info()` and `fs.ls(detail=True)` return different etag formats
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adlfs.