Giter VIP home page Giter VIP logo

Comments (8)

ap-- avatar ap-- commented on August 16, 2024 1

Okay, yes I see how the typing issue can be problem, since mypy will infer variables to be UPath instances by default.

The webknossos-libs approach is what I would prefer too. I think it'll be easier to maintain universal_pathlib if we make the public class API identical to pathlib and provide a few additional functions to access the special fsspec parts needed for 3rd party libraries.

I would create two new issues, summarize and move the discussion?
(1) Should UPath.__new__ return pathlib.Path instances for local paths
and
(2) Should the UPath public class API be identical to the public pathlib.Path class API

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

Hi @danielgafni

could you describe why you need access to the fsspec filesystem instance?
Or in other words: what functionality is currently missing in universal_pathlib that requires you to access the fsspec filesystem directly?

Regarding the .fs property: If we support this, I personally would prefer to add a function:

def get_filesystem(obj: str | pathlib.Path | universal_pathlib.UPath) -> fsspec.AbstractFileSystem:
    ...

to universal_pathlib. This function can then return LocalFileSystem for pathlib.Path instances and the correct instance for str paths.

Otherwise user code would always have to ensure that a universal_pathlib.UPath instance is provided to access the .fs property.

Cheers,
Andreas 😃

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

Okay, sorry I should have checked first 😅:

This is already part of the public API:

@property
def fs(self) -> AbstractFileSystem:
return self._accessor._fs

But maybe you had exactly the problem I described:

UPath("/some/local/path")  # will try to return a pathlib.Path instance

And the returned instance does not have a .fs property.

from universal_pathlib.

danielgafni avatar danielgafni commented on August 16, 2024

@ap-- You are right 🤣 I simply didn't see the .fs property.

Answering the first question, I needed the .fs property to pass it to third-party libraries like pyarrow (here is a snippet I wrote previously):

fs: fsspec.AbstractFileSystem | None = None

try:
    fs = path._accessor._fs
except AttributeError:
    pass

dataset = ds.dataset(str(path), filesystem=fs)

Obviously, this has to do with third-party API design. pyarrow doesn't support reading from file-like objects. So I don't think universal-pathlib can do anything about this.

Regrading the second issue - yeah, it's quite not convenient how normal pathlib.Path doesn't have a a .fs property.

How can we fix this?

I can see 2 options:

  • patch pathlib.Path and add this property (not very good solution)
  • don't convert local UPath to a Path. Support local filesystem in UPath. Always return UPath objects.

I think the second solution is better. It would also remove the need to do ugly stuff like:

path = UPath("/local/file")  # will currently return pathlib.Path

...
def read_pyarrow(path: Path | UPath) -> ...
    # inside a general function that has to work with local and remote paths
    if isinstance(path, Path):
        ...
    elif isinstance(path, UPath):
        ...

I would like to remove the union typing Path | UPath and the isinstance check and just deal only with UPath instead.

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

Okay. I understand.

I believe that returning a pathlib.Path instance is preferable to returning a UPath instance for local file systems. This allows 3rd party libraries built on the pathlib API to drop in universal_pathlib as a replacement for pathlib and support any filesystem supported by fsspec. Given that, I also believe that UPath should not provide any attributes or methods that pathlib.Path doesn't support, but of course that's an opinion and there are valid arguments for extending UPath.

So for your issue, I would propose to implement a function:

def get_openfile(obj: str | pathlib.Path | universal_pathlib.UPath, /, **kwargs) -> fsspec.core.OpenFile:
    ...

This would provide you with an OpenFile instance, that has a .fs attribute holding the corresponding FileSystem instance, and a .path property for the path (which compared to your example, won't contain the protocol anymore).

Your example would then be:

upth: UPath = ...

of = get_openfile(upth)
dataset = ds.dataset(of.path, fs=of.fs)

Let me know what you think,
Cheers,
Andreas

from universal_pathlib.

danielgafni avatar danielgafni commented on August 16, 2024

I believe that returning a pathlib.Path instance is preferable to returning a UPath instance for local file systems. This allows 3rd party libraries built on the pathlib API to drop in universal_pathlib as a replacement for pathlib and support any filesystem supported by fsspec. Given that, I also believe that UPath should not provide any attributes or methods that pathlib.Path doesn't support, but of course that's an opinion and there are valid arguments for extending UPath.

I think replacing Path with UPath will always be possible until UPath is a superset of Path. Anyway, I don't think this is the point. I'm not proposing to add additional functionality to UPath. I'm proposing to support the local filesystem without converting to Path with the same interface a Path has.

In addition to problems described above (and I agree your solution makes the code more clean), there is another one: runtime type checking.

The first time this happened to me (i was using dagster which has runtime type checking) I had to spend some time to understand what's going on. This behavior is a little non-intuitive: you create one object (UPath), but get a different (Path). I had type annotations for UPath, which failed in runtime.

I think it's really weird how this simple type annotation is actually not true:

path: UPath = UPath("/maybe/local/path")

Instead you must write

path: Path | UPath = UPath("/maybe/local/path")

I think this is just wrong. The logic of returning Path or UPath can't even be expressed with Python's type system (you can't create an overload for a regex checking if the string is local or not), and of course the value of the string often is not known until runtime.

I still don't see how supporting local filesystem would harm third-party libraries in any way. The only exception might be runtime type-checking in third-party libraries, but this seems like an explicit choice to strictly require Path.

What are your thoughts? We can also move this discussion elsewhere since the original issue is resolved.

from universal_pathlib.

normanrz avatar normanrz commented on August 16, 2024

I think the decision to return Path values has been made before I joined this project. So, I cannot comment on the motivation. In our projects, the typing issue has not been a problem because we treat all instances as path: Path = UPath("/maybe/local/path"), which is true because UPath is a subclass of Path. Only where we really need to access upath-specific attributes, we perform an instance check: https://github.com/scalableminds/webknossos-libs/blob/master/webknossos/webknossos/dataset/_array.py#L26-L33

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

I opened two new issues to focus the discussion on the two topics. Let's close this issue, and continue in the other two.

Cheers,
Andreas 😃

from universal_pathlib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.