Giter VIP home page Giter VIP logo

Comments (14)

colobas avatar colobas commented on May 26, 2024 1

@martindurant @zooba thanks for the replies!

@martindurant not complaining that it doesn't exist yet, just trying to gather the info needed to maybe kickstart it myself :)

@zooba you mean the library I want if I'm thinking of implementing this type of dask connector?

from adlfs.

colobas avatar colobas commented on May 26, 2024

Ok I understood now that you have to pass the storage_options dict like so:

df = dd.read_csv("adl://somedatalakestore.azuredatalakestore.net/somefile.csv",
                            storage_options={"tenant_id" : "something", "client_id" : "something"})

from adlfs.

mrocklin avatar mrocklin commented on May 26, 2024

from adlfs.

martindurant avatar martindurant commented on May 26, 2024

@mrocklin , I am not aware of a way to get Azure default credentials, but I would be surprised if there isn't one, probably in https://github.com/AzureAD/azure-activedirectory-library-for-python. I would ask some people at MS directly.

from adlfs.

mrocklin avatar mrocklin commented on May 26, 2024

@mrocklin , I am not aware of a way to get Azure default credentials, but I would be surprised if there isn't one, probably in https://github.com/AzureAD/azure-activedirectory-library-for-python. I would ask some people at MS directly.

@noelbundick or @zooba might know the right people to contact about helping out here

from adlfs.

zooba avatar zooba commented on May 26, 2024

@lmazuel may be able to help, but I suspect the easiest way is to use classes from msrestazure (which should already be installed as a dependency of the DataLake SDK)

from adlfs.

colobas avatar colobas commented on May 26, 2024

What about a similar thing for azure blob storage? I reckon it resembles S3 better and is much cheaper. And the authentication is done via shared secrets rather than active directory credentials.

I'm guessing it would amount to writing an equivalent to https://github.com/dask/s3fs, and then a wrapper like this one. Is that right?

from adlfs.

martindurant avatar martindurant commented on May 26, 2024

@colobas , it's simply a case of no-one having spent the time to look into it. I don't know how usable the existing MS library code might be for deriving from.

from adlfs.

zooba avatar zooba commented on May 26, 2024

@colobas The library you want in that case is azure-storage, and it should be pretty trivial to map. It is essentially the same thing as S3, though with some bonus features more-or-less layered on top of blob storage.

Both would be good, as DataLake is the better service for "dump all my files and process them later" (though it comes with its own analytics service, but I'd rather use Dask, so I guess other people would too :) )

from adlfs.

lmazuel avatar lmazuel commented on May 26, 2024

Thanks @zooba for the mention :)
There is two ways to automatically authenticate a ServicePrincipal for the SDK, without any configuration:

This will require this dask module to depends on azure-common, and a few code in this dask file (i.e. if CLI, try CLI, ifnot, try env variable, if not, die, etc.). And probably a few changes for me as well, since these are tight to SDK client, I'm sure there a small gap to fill to make them a little more generic. But that makes sense to enable this scenario :)

from adlfs.

zooba avatar zooba commented on May 26, 2024

@colobas That's right. Our Azure library is broken up more than boto, so sometimes it's less obvious that you don't need to depend upon the whole thing. (In particular, azure-mgmt-* will pull in a lot of dependencies, and if you're just trying to use one of the client libraries then you'll want to bypass that by depending on the more specific piece.)

Our Azure SDK expert for Python is @lmazuel, so feel free to call him in whenever you have questions :)

from adlfs.

colobas avatar colobas commented on May 26, 2024

@zooba thanks a lot for the quick answers and tips! Take care and have a good holiday

from adlfs.

martindurant avatar martindurant commented on May 26, 2024

@lmazuel , assuming you do

from azure.common.client_factory import get_client_from_cli_profile
from azure.mgmt.compute import ComputeManagementClient

client = get_client_from_cli_profile(ComputeManagementClient)

how do you get the appropriate credentials out of the client object?

from adlfs.

lmazuel avatar lmazuel commented on May 26, 2024

@martindurant It's why I was saying "And probably a few changes for me as well, since these are tight to SDK client" :)
But you can do

from azure.common.credentials import get_azure_cli_credentials
credentials, subscription_id = get_azure_cli_credentials()

From the credentials attributes, you should be able to get client_id (not tested).

from adlfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.