Giter VIP home page Giter VIP logo

Comments (16)

brl0 avatar brl0 commented on August 16, 2024 1

I would say as a general case that for a given class C type(C(*args, **kwargs)) != C is a weird pattern to me

the promotion to specialized subclasses is a common pattern, also used in stdlib pathlib

The usage in the standard library allows for isinstance(Path(*args, **kwargs), Path), even though that will always return a WindowsPath or PosixPath.

My current thinking is that this behavior is a bad thing:

assert isinstance(UPath("."), Path) and not isinstance(UPath("."), UPath)

My current preference would be to revert to the old behavior, but that might be because I'm not clear on how that is breaking static type checkers. But since the consensus seems to be to avoid that approach, I would probably lean toward option 2B, while acknowledging the concern that it might run into some bugs in fsspec local file handling, but it is the backbone of this project, and those issues should be fixable.

I'd be curious what @barneygale would suggest as the most appropriate approach to inheriting from and being compatible with pathlib.

from universal_pathlib.

barneygale avatar barneygale commented on August 16, 2024 1

I'm not very familiar with this project so forgive me if I get something wrong!

For Python 3.13 (>12 months away) I'm hoping to add a pathlib.AbstractPath class. Users could then do:

pth: pathlib.AbstractPath = UPath(...)

UPath.__new__() would be free to return a pathlib.Path instance, which will be a subclass of AbstractPath.

However, Path is not a subclass of UPath, so this still feels weird to me. So I think I learn towards your option 2B.

the promotion to specialized subclasses is a common pattern, also used in stdlib pathlib

FWIW, I never liked this. If I were to design pathlib from scratch, I would remove the *PosixPath and *WindowsPath classes, and instead add a flavour argument to PurePath. This would remove the need for __new__() methods and guarantee that users get back the type they asked for.

Taking that approach in universal_pathlib would mean adding dedicated classes for HTTPPath, S3Path, etc, and probably adding a module-level function to parse a URI and return an appropriate path instance:

s3_pth: universal_pathlib.S3Path = universal_pathlib.S3Path('foo/bar', bucket='blah')
any_pth: pathlib.AbstractPath = universal_pathlib.parse_uri('...')

... but obviously that's quite different! So take it or leave it :)

from universal_pathlib.

brl0 avatar brl0 commented on August 16, 2024

I thought, perhaps mistakenly, that this previous behavior was a good thing:

import pathlib
from upath import UPath

path = UPath(".")
assert isinstance(path, pathlib.Path)
assert isinstance(path, UPath)

The idea, or at least so I thought, was to enable tools that support pathlib objects to easily support upath objects with little or no code changes.

If needed to check for identity of the class, something like this might work:

type(path) in (pathlib.WindowsPath, pathlib.PosixPath)

I'm sure there were good reasons why this behavior was removed, but I'm not sure why, and would greatly appreciate any clarification for my own understanding.

I was never sure that my previous metaclass approach was the right way to do it because the instance check stuff felt hacky, but it did seem to achieve the behavior that I thought I wanted. For reference, here's a simplified version of what that looked like:

class UPathMeta(ABCMeta):
    def __instancecheck__(cls, instance):
        return isinstance(instance, pathlib.Path)

    def __subclasscheck__(cls, subclass):
        return issubclass(subclass, pathlib.Path)

class UPath(pathlib.Path, metaclass=UPathMeta):
    ...

On a related note, I also think it would be cleaner to put the class resolution logic into get_upath_class, it seems fragmented to have some of that logic there and some in __new__.

I'm really just curious and interested in the pros and cons of various approaches, so I'd love any feedback.

from universal_pathlib.

thomasw21 avatar thomasw21 commented on August 16, 2024

Hi! Thanks for the repository, we were trying to find a way to work with multiple schemes and use the same API as pathlib, and this seems to be perfect for our usecase. We're trying to use UPath as well, and are encountering the same issue:

>>> isinstance(UPath("."), UPath)
False

This is causing some issues as we use UPath as both the constructor and the type validator. Given the previous example, I would always expect it to be True for any python class ...

from universal_pathlib.

danielgafni avatar danielgafni commented on August 16, 2024

Getting back to this... Would it be possible to get the old behavior back?

from universal_pathlib.

normanrz avatar normanrz commented on August 16, 2024

UPath subclasses Path which allows for this behavior:

assert isinstance(UPath("s3://..."), Path)
assert isinstance(UPath("."), Path) and not isinstance(UPath("."), UPath)

So, UPath can be used everywhere Path is accepted which allows for an easy upgrade path in an existing code base. There is still a way to distinguish between UPath and Path.

I don't think it is great to change the behavior through metaclasses, because that messes with static analysis tooling such as type checkers.

Perhaps the type checking issues would be resolved if the signature of https://github.com/fsspec/universal_pathlib/blob/main/upath/core.py#L152 would be changed to return Path instead of PT?

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

I don't think it is great to change the behavior through metaclasses, because that messes with static analysis tooling such as type checkers.

I agree. This implicitly rules out restoring the old behavior.

@thomasw21 @danielgafni and @brl0, have I understood correctly, that the issues you're facing all boil down to:

>>> isinstance(UPath("."), UPath)  # for local paths
False

If yes, can any of you construct a case in which an intermediate UPath class would not work for your use case? (scenario 2A)

Basically the class hierarchy would then look like this:

pathlib.Path
|
+- upath.UPath  # adds properties like `.fs`, etc...
                # and would be used for pathlib style local paths
   |
   +- FSSpecPath  # default class for fsspec filesystems without special implementation in universal_pathlib
      |
      +- HTMLPath
      +- HDF5Path
      +- ... etc           

I will work on a PR this Sunday, so we can test if it would work for everyone.

Cheers,
Andreas

from universal_pathlib.

thomasw21 avatar thomasw21 commented on August 16, 2024

I was able to fix my issue by decorrelating constructors and typing. I would say as a general case that for a given class C type(C(*args, **kwargs)) != C is a weird pattern to me.

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

I would say as a general case that for a given class C type(C(*args, **kwargs)) != C is a weird pattern to me.

I partially agree: it's confusing the way it's implemented right now, since UPath(*args, **kwargs) can currently return an instance which is not a subclass of UPath.

Though the promotion to specialized subclasses is a common pattern, also used in stdlib pathlib: https://github.com/python/cpython/blob/7279fb64089abc73c03247fb8191082ee42a9671/Lib/pathlib.py#L1154-L1157

I was able to fix my issue by decorrelating constructors and typing.

Happy to hear that you found a workaround. It'll be interesting to check if it's not needed anymore after this issue is addressed.

from universal_pathlib.

thomasw21 avatar thomasw21 commented on August 16, 2024

Ah it's not needed anymore (I guess it'd be a good to have, but not blocking), we used dacite to parse yaml file into a bunch of data classes: https://github.com/konradhalas/dacite

@dataclass
class MyRemoteConfig:
    path: UPath

The original issue is that dacite uses UPath as both the constructor UPath(path_str) and as a type validator. So all we had to do is:

@dataclass
class MyRemoteConfig:
    path: Path

(we still use UPath as a constructor so that we can actually pass string with different schemes)

from universal_pathlib.

danielgafni avatar danielgafni commented on August 16, 2024

I have similar issues with validating values in Dagster. Which are supposed to be UPath, but are actually Path for local paths.

from universal_pathlib.

barneygale avatar barneygale commented on August 16, 2024

Oh sorry, I see that there are already dedicated HTTPPath etc classes!

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

Thanks for the feedback!

FWIW, I never liked this. If I were to design pathlib from scratch, I would remove the *PosixPath and *WindowsPath classes, and instead add a flavour argument to PurePath. This would remove the need for __new__() methods and guarantee that users get back the type they asked for.

Is pathlib going to move in that direction further down? Looking at the current 3.12 implementation, it seems to be reasonable if we'd implement a filesystem_spec-url equivalent of posixpath/ntpath to be used as the flavour. In UPath we'd then select the filesystem flavour based on protocol.
Some of this functionality is already there.

Taking that approach in universal_pathlib would mean adding dedicated classes for HTTPPath, S3Path, etc, and probably adding a module-level function to parse a URI and return an appropriate path instance:

we kind of do this via the different implementations for specific filesystems.
UPath.__new__ then returns the specialized subclass (ie. the HttpPath, or an S3Path equivalent etc...) instead of a module-level function. But I see the appeal of the module level uri parse function.


Regarding `pathlib.AbstractPath`: I would be happy to contribute to this. `universal_pathlib` could be an early adopter to check if the implementation covers all use cases.

from universal_pathlib.

brl0 avatar brl0 commented on August 16, 2024

This might be a terrible idea, but another option might be to backport the latest pathlib in place of option 2A.

from universal_pathlib.

ap-- avatar ap-- commented on August 16, 2024

But since the consensus seems to be to avoid that approach, I would probably lean toward option 2B, while acknowledging the concern that it might run into some bugs in fsspec local file handling, but it is the backbone of this project, and those issues should be fixable.

Fully agree. It's just quite a bit of work to port the important parts of the test suite. But definitely doable.

This might be a terrible idea, but another option might be to backport the latest pathlib in place of option 2A.

We're basically doing this for python 3.7, 3.8 and 3.9 already 😄
Ultimately we have to do this, to avoid having to have different implementations on our side. We didn't do it for 3.11 because we were waiting for the new implementation in 3.12.

from universal_pathlib.

barneygale avatar barneygale commented on August 16, 2024

Is pathlib going to move in that direction further down?

Unlikely, but not impossible.

When users subclass AbstractPath, they should be able to choose between POSIX, Windows, or current-OS semantics. This can be done by setting _flavour to posixpath, ntpath, or os.path respectively. But the underscore at the beginning of _flavour will need to go, so PurePath.flavour will become a public attribute.

Users may then point out that there are two ways to check if a path uses Windows semantics (for example):

  1. isinstance(pth, PureWindowsPath)
  2. pth.flavour is ntpath

The first of these will not work for subclasses of AbstractPath.

If this situation proves very confusing, we might add a flavour argument to PurePath, turn the *PosixPath and *WindowsPath classes into pure protocols that check flavour and return PurePath when instantiated, and deprecate them.

But it's hard to project that far into the future, and the change would need to be extremely well motivated.

Ticket: python/cpython#100502

Regarding pathlib.AbstractPath: I would be happy to contribute to this. universal_pathlib could be an early adopter to check if the implementation covers all use cases.

Fantastic, thank you. My plan at the moment is to add a tarfile.TarPath class, using a new pathlib._AbstractPath class under the hood. I'll ping you as soon as I have something you can play with!

Ticket: python/cpython#89812

from universal_pathlib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.