Comments (16)
I would say as a general case that for a given class C
type(C(*args, **kwargs)) != C
is a weird pattern to me
the promotion to specialized subclasses is a common pattern, also used in stdlib pathlib
The usage in the standard library allows for isinstance(Path(*args, **kwargs), Path)
, even though that will always return a WindowsPath
or PosixPath
.
My current thinking is that this behavior is a bad thing:
assert isinstance(UPath("."), Path) and not isinstance(UPath("."), UPath)
My current preference would be to revert to the old behavior, but that might be because I'm not clear on how that is breaking static type checkers. But since the consensus seems to be to avoid that approach, I would probably lean toward option 2B, while acknowledging the concern that it might run into some bugs in fsspec
local file handling, but it is the backbone of this project, and those issues should be fixable.
I'd be curious what @barneygale would suggest as the most appropriate approach to inheriting from and being compatible with pathlib
.
from universal_pathlib.
I'm not very familiar with this project so forgive me if I get something wrong!
For Python 3.13 (>12 months away) I'm hoping to add a pathlib.AbstractPath
class. Users could then do:
pth: pathlib.AbstractPath = UPath(...)
UPath.__new__()
would be free to return a pathlib.Path
instance, which will be a subclass of AbstractPath
.
However, Path
is not a subclass of UPath
, so this still feels weird to me. So I think I learn towards your option 2B.
the promotion to specialized subclasses is a common pattern, also used in stdlib pathlib
FWIW, I never liked this. If I were to design pathlib from scratch, I would remove the *PosixPath
and *WindowsPath
classes, and instead add a flavour argument to PurePath
. This would remove the need for __new__()
methods and guarantee that users get back the type they asked for.
Taking that approach in universal_pathlib
would mean adding dedicated classes for HTTPPath
, S3Path
, etc, and probably adding a module-level function to parse a URI and return an appropriate path instance:
s3_pth: universal_pathlib.S3Path = universal_pathlib.S3Path('foo/bar', bucket='blah')
any_pth: pathlib.AbstractPath = universal_pathlib.parse_uri('...')
... but obviously that's quite different! So take it or leave it :)
from universal_pathlib.
I thought, perhaps mistakenly, that this previous behavior was a good thing:
import pathlib
from upath import UPath
path = UPath(".")
assert isinstance(path, pathlib.Path)
assert isinstance(path, UPath)
The idea, or at least so I thought, was to enable tools that support pathlib objects to easily support upath objects with little or no code changes.
If needed to check for identity of the class, something like this might work:
type(path) in (pathlib.WindowsPath, pathlib.PosixPath)
I'm sure there were good reasons why this behavior was removed, but I'm not sure why, and would greatly appreciate any clarification for my own understanding.
I was never sure that my previous metaclass approach was the right way to do it because the instance check stuff felt hacky, but it did seem to achieve the behavior that I thought I wanted. For reference, here's a simplified version of what that looked like:
class UPathMeta(ABCMeta):
def __instancecheck__(cls, instance):
return isinstance(instance, pathlib.Path)
def __subclasscheck__(cls, subclass):
return issubclass(subclass, pathlib.Path)
class UPath(pathlib.Path, metaclass=UPathMeta):
...
On a related note, I also think it would be cleaner to put the class resolution logic into get_upath_class
, it seems fragmented to have some of that logic there and some in __new__
.
I'm really just curious and interested in the pros and cons of various approaches, so I'd love any feedback.
from universal_pathlib.
Hi! Thanks for the repository, we were trying to find a way to work with multiple schemes and use the same API as pathlib, and this seems to be perfect for our usecase. We're trying to use UPath as well, and are encountering the same issue:
>>> isinstance(UPath("."), UPath)
False
This is causing some issues as we use UPath
as both the constructor and the type validator. Given the previous example, I would always expect it to be True for any python class ...
from universal_pathlib.
Getting back to this... Would it be possible to get the old behavior back?
from universal_pathlib.
UPath
subclasses Path
which allows for this behavior:
assert isinstance(UPath("s3://..."), Path)
assert isinstance(UPath("."), Path) and not isinstance(UPath("."), UPath)
So, UPath
can be used everywhere Path
is accepted which allows for an easy upgrade path in an existing code base. There is still a way to distinguish between UPath
and Path
.
I don't think it is great to change the behavior through metaclasses, because that messes with static analysis tooling such as type checkers.
Perhaps the type checking issues would be resolved if the signature of https://github.com/fsspec/universal_pathlib/blob/main/upath/core.py#L152 would be changed to return Path
instead of PT
?
from universal_pathlib.
I don't think it is great to change the behavior through metaclasses, because that messes with static analysis tooling such as type checkers.
I agree. This implicitly rules out restoring the old behavior.
@thomasw21 @danielgafni and @brl0, have I understood correctly, that the issues you're facing all boil down to:
>>> isinstance(UPath("."), UPath) # for local paths
False
If yes, can any of you construct a case in which an intermediate UPath
class would not work for your use case? (scenario 2A)
Basically the class hierarchy would then look like this:
pathlib.Path
|
+- upath.UPath # adds properties like `.fs`, etc...
# and would be used for pathlib style local paths
|
+- FSSpecPath # default class for fsspec filesystems without special implementation in universal_pathlib
|
+- HTMLPath
+- HDF5Path
+- ... etc
I will work on a PR this Sunday, so we can test if it would work for everyone.
Cheers,
Andreas
from universal_pathlib.
I was able to fix my issue by decorrelating constructors and typing. I would say as a general case that for a given class C type(C(*args, **kwargs)) != C
is a weird pattern to me.
from universal_pathlib.
I would say as a general case that for a given class C type(C(*args, **kwargs)) != C is a weird pattern to me.
I partially agree: it's confusing the way it's implemented right now, since UPath(*args, **kwargs)
can currently return an instance which is not a subclass of UPath.
Though the promotion to specialized subclasses is a common pattern, also used in stdlib pathlib: https://github.com/python/cpython/blob/7279fb64089abc73c03247fb8191082ee42a9671/Lib/pathlib.py#L1154-L1157
I was able to fix my issue by decorrelating constructors and typing.
Happy to hear that you found a workaround. It'll be interesting to check if it's not needed anymore after this issue is addressed.
from universal_pathlib.
Ah it's not needed anymore (I guess it'd be a good to have, but not blocking), we used dacite
to parse yaml file into a bunch of data classes: https://github.com/konradhalas/dacite
@dataclass
class MyRemoteConfig:
path: UPath
The original issue is that dacite
uses UPath
as both the constructor UPath(path_str)
and as a type validator. So all we had to do is:
@dataclass
class MyRemoteConfig:
path: Path
(we still use UPath as a constructor so that we can actually pass string with different schemes)
from universal_pathlib.
I have similar issues with validating values in Dagster. Which are supposed to be UPath, but are actually Path for local paths.
from universal_pathlib.
Oh sorry, I see that there are already dedicated HTTPPath
etc classes!
from universal_pathlib.
Thanks for the feedback!
FWIW, I never liked this. If I were to design pathlib from scratch, I would remove the
*PosixPath
and*WindowsPath
classes, and instead add a flavour argument toPurePath
. This would remove the need for__new__()
methods and guarantee that users get back the type they asked for.
Is pathlib
going to move in that direction further down? Looking at the current 3.12 implementation, it seems to be reasonable if we'd implement a filesystem_spec-url equivalent of posixpath/ntpath
to be used as the flavour. In UPath we'd then select the filesystem flavour based on protocol
.
Some of this functionality is already there.
Taking that approach in universal_pathlib would mean adding dedicated classes for HTTPPath, S3Path, etc, and probably adding a module-level function to parse a URI and return an appropriate path instance:
we kind of do this via the different implementations for specific filesystems.
UPath.__new__
then returns the specialized subclass (ie. the HttpPath, or an S3Path equivalent etc...) instead of a module-level function. But I see the appeal of the module level uri parse function.
Regarding `pathlib.AbstractPath`: I would be happy to contribute to this. `universal_pathlib` could be an early adopter to check if the implementation covers all use cases.
from universal_pathlib.
This might be a terrible idea, but another option might be to backport the latest pathlib
in place of option 2A.
from universal_pathlib.
But since the consensus seems to be to avoid that approach, I would probably lean toward option 2B, while acknowledging the concern that it might run into some bugs in fsspec local file handling, but it is the backbone of this project, and those issues should be fixable.
Fully agree. It's just quite a bit of work to port the important parts of the test suite. But definitely doable.
This might be a terrible idea, but another option might be to backport the latest
pathlib
in place of option 2A.
We're basically doing this for python 3.7
, 3.8
and 3.9
already 😄
Ultimately we have to do this, to avoid having to have different implementations on our side. We didn't do it for 3.11
because we were waiting for the new implementation in 3.12
.
from universal_pathlib.
Is
pathlib
going to move in that direction further down?
Unlikely, but not impossible.
When users subclass AbstractPath
, they should be able to choose between POSIX, Windows, or current-OS semantics. This can be done by setting _flavour
to posixpath
, ntpath
, or os.path
respectively. But the underscore at the beginning of _flavour
will need to go, so PurePath.flavour
will become a public attribute.
Users may then point out that there are two ways to check if a path uses Windows semantics (for example):
isinstance(pth, PureWindowsPath)
pth.flavour is ntpath
The first of these will not work for subclasses of AbstractPath
.
If this situation proves very confusing, we might add a flavour argument to PurePath
, turn the *PosixPath
and *WindowsPath
classes into pure protocols that check flavour
and return PurePath
when instantiated, and deprecate them.
But it's hard to project that far into the future, and the change would need to be extremely well motivated.
Ticket: python/cpython#100502
Regarding
pathlib.AbstractPath
: I would be happy to contribute to this.universal_pathlib
could be an early adopter to check if the implementation covers all use cases.
Fantastic, thank you. My plan at the moment is to add a tarfile.TarPath
class, using a new pathlib._AbstractPath
class under the hood. I'll ping you as soon as I have something you can play with!
Ticket: python/cpython#89812
from universal_pathlib.
Related Issues (20)
- Dependencies: introduce minimum version requirement for fsspec HOT 1
- Provide support for relative file uris? HOT 2
- UPath handling of paths with double slashes `//` HOT 1
- Different return types for S3Path.stat() and WindowsUPath.stat() HOT 4
- S3Path.iterdir() gives wrong paths when used with trailing slash HOT 1
- UPath formatting of empty netloc breaks compatibility HOT 1
- Incorrect type hint for `register_implementation`
- GitHubPath not working after 0.1.0 HOT 2
- storage_options doesn't work for local paths HOT 3
- ci: add code-coverage runs
- UPath(path).parent does not work as expected for memory URIs with authority component HOT 3
- .glob output contains twitch the dir path when AzurePath(path).glob("*") HOT 4
- Fail to instantiate s3 path with `#` character in URI HOT 2
- / (__truediv__) does not handle trailing slash correctly
- mkdir broken for adlfs
- S3 path resolution extra slash when joined to bucket with key HOT 2
- Reuse s3 boto client? HOT 1
- `relative_to` on `GCSPath`s does not return a relative path HOT 1
- Problem with opening local file with "compression=..." HOT 1
- Provide py38-py311 and py312 compatible way for subclassing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from universal_pathlib.