Giter VIP home page Giter VIP logo

Comments (3)

mrocklin avatar mrocklin commented on August 20, 2024

Perhaps there should be a separate small library to interact with Google Storage? Libraries like hdfs3 and s3fs are really designed with just their one storage system in mind. They provide similar enough interfaces that it's easy enough to connect them to dask.distributed. Perhaps what you want would require another gsfs project?

It looks like Google Storage is trying hard to piggyback on tools like boto and HDFS clients by mimicking interfaces (see https://cloud.google.com/storage/docs/gspythonlibrary). I'm not particularly surprised that their efforts at mimicking these interfaces isn't perfectly smooth, however, it might be worth seeing if there is some way to coerce Google's efforts with boto to work with boto3, at which point s3fs would be a good candidate (According to the comments in this SO post this appears to be incomplete).

@broxtronix, you're a heavy user of GCS. What would you use to pull out a byte range from a file on GS? Has anyone heard of them working to support boto3?

Alternatively, you could stick with the HDFS interface layer if you were to convince the libhdfs3 project to support whatever odd dialect GS is using, at which point this Python wrapper library would probably work.

from hdfs3.

martindurant avatar martindurant commented on August 20, 2024

It seems that GCS already has something giving file-like objects for interaction https://cloud.google.com/appengine/docs/python/googlecloudstorageclient/functions#open

It would not be difficult to write a distcp using distributed from S3 or this GCS to HDFS3 for further use.

from hdfs3.

martindurant avatar martindurant commented on August 20, 2024

I have written some proof of concept code, and may be making a separate library to handle GCS.

from hdfs3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.