Comments (8)
Okay, yes I see how the typing issue can be problem, since mypy will infer variables to be UPath
instances by default.
The webknossos-libs approach is what I would prefer too. I think it'll be easier to maintain universal_pathlib if we make the public class API identical to pathlib and provide a few additional functions to access the special fsspec parts needed for 3rd party libraries.
I would create two new issues, summarize and move the discussion?
(1) Should UPath.__new__
return pathlib.Path
instances for local paths
and
(2) Should the UPath
public class API be identical to the public pathlib.Path
class API
from universal_pathlib.
Hi @danielgafni
could you describe why you need access to the fsspec filesystem instance?
Or in other words: what functionality is currently missing in universal_pathlib that requires you to access the fsspec filesystem directly?
Regarding the .fs
property: If we support this, I personally would prefer to add a function:
def get_filesystem(obj: str | pathlib.Path | universal_pathlib.UPath) -> fsspec.AbstractFileSystem:
...
to universal_pathlib
. This function can then return LocalFileSystem for pathlib.Path
instances and the correct instance for str
paths.
Otherwise user code would always have to ensure that a universal_pathlib.UPath
instance is provided to access the .fs
property.
Cheers,
Andreas
from universal_pathlib.
Okay, sorry I should have checked first
This is already part of the public API:
universal_pathlib/upath/core.py
Lines 593 to 595 in 5d844b2
But maybe you had exactly the problem I described:
UPath("/some/local/path") # will try to return a pathlib.Path instance
And the returned instance does not have a .fs
property.
from universal_pathlib.
@ap-- You are right .fs
property.
Answering the first question, I needed the .fs
property to pass it to third-party libraries like pyarrow
(here is a snippet I wrote previously):
fs: fsspec.AbstractFileSystem | None = None
try:
fs = path._accessor._fs
except AttributeError:
pass
dataset = ds.dataset(str(path), filesystem=fs)
Obviously, this has to do with third-party API design. pyarrow
doesn't support reading from file-like objects. So I don't think universal-pathlib
can do anything about this.
Regrading the second issue - yeah, it's quite not convenient how normal pathlib.Path
doesn't have a a .fs
property.
How can we fix this?
I can see 2 options:
- patch
pathlib.Path
and add this property (not very good solution) - don't convert local
UPath
to aPath
. Support local filesystem inUPath
. Always returnUPath
objects.
I think the second solution is better. It would also remove the need to do ugly stuff like:
path = UPath("/local/file") # will currently return pathlib.Path
...
def read_pyarrow(path: Path | UPath) -> ...
# inside a general function that has to work with local and remote paths
if isinstance(path, Path):
...
elif isinstance(path, UPath):
...
I would like to remove the union typing Path | UPath
and the isinstance
check and just deal only with UPath
instead.
from universal_pathlib.
Okay. I understand.
I believe that returning a pathlib.Path
instance is preferable to returning a UPath
instance for local file systems. This allows 3rd party libraries built on the pathlib
API to drop in universal_pathlib
as a replacement for pathlib
and support any filesystem supported by fsspec
. Given that, I also believe that UPath
should not provide any attributes or methods that pathlib.Path
doesn't support, but of course that's an opinion and there are valid arguments for extending UPath
.
So for your issue, I would propose to implement a function:
def get_openfile(obj: str | pathlib.Path | universal_pathlib.UPath, /, **kwargs) -> fsspec.core.OpenFile:
...
This would provide you with an OpenFile
instance, that has a .fs
attribute holding the corresponding FileSystem instance, and a .path
property for the path (which compared to your example, won't contain the protocol anymore).
Your example would then be:
upth: UPath = ...
of = get_openfile(upth)
dataset = ds.dataset(of.path, fs=of.fs)
Let me know what you think,
Cheers,
Andreas
from universal_pathlib.
I believe that returning a pathlib.Path instance is preferable to returning a UPath instance for local file systems. This allows 3rd party libraries built on the pathlib API to drop in universal_pathlib as a replacement for pathlib and support any filesystem supported by fsspec. Given that, I also believe that UPath should not provide any attributes or methods that pathlib.Path doesn't support, but of course that's an opinion and there are valid arguments for extending UPath.
I think replacing Path
with UPath
will always be possible until UPath
is a superset of Path
. Anyway, I don't think this is the point. I'm not proposing to add additional functionality to UPath
. I'm proposing to support the local filesystem without converting to Path
with the same interface a Path
has.
In addition to problems described above (and I agree your solution makes the code more clean), there is another one: runtime type checking.
The first time this happened to me (i was using dagster
which has runtime type checking) I had to spend some time to understand what's going on. This behavior is a little non-intuitive: you create one object (UPath
), but get a different (Path
). I had type annotations for UPath
, which failed in runtime.
I think it's really weird how this simple type annotation is actually not true:
path: UPath = UPath("/maybe/local/path")
Instead you must write
path: Path | UPath = UPath("/maybe/local/path")
I think this is just wrong. The logic of returning Path
or UPath
can't even be expressed with Python's type system (you can't create an overload
for a regex checking if the string is local or not), and of course the value of the string often is not known until runtime.
I still don't see how supporting local filesystem would harm third-party libraries in any way. The only exception might be runtime type-checking in third-party libraries, but this seems like an explicit choice to strictly require Path
.
What are your thoughts? We can also move this discussion elsewhere since the original issue is resolved.
from universal_pathlib.
I think the decision to return Path
values has been made before I joined this project. So, I cannot comment on the motivation. In our projects, the typing issue has not been a problem because we treat all instances as path: Path = UPath("/maybe/local/path")
, which is true because UPath
is a subclass of Path
. Only where we really need to access upath-specific attributes, we perform an instance check: https://github.com/scalableminds/webknossos-libs/blob/master/webknossos/webknossos/dataset/_array.py#L26-L33
from universal_pathlib.
I opened two new issues to focus the discussion on the two topics. Let's close this issue, and continue in the other two.
Cheers,
Andreas
from universal_pathlib.
Related Issues (20)
- Adhere to semver and add a changelog
- UPath broken in Python 3.11
- The noxfile has to be changed to support newer nox versions
- UPath with_name, with_stem, with_suffix is broken
- Glob on s3 bucket should be coherent with other implementations? HOT 5
- Handling of absolute file:// paths HOT 6
- mkdir does not respect exist_ok when parents is set to True
- URI paths are incorrectly parsed as posix paths HOT 3
- Updating admins/maintainers HOT 3
- Use `urllib.parse.urljoin` when joining paths HOT 5
- Support for data URI scheme? HOT 5
- Should `UPath.__new__` return `pathlib.Path` instances for local paths HOT 16
- Should the `UPath` public class API be identical to the public `pathlib.Path` class API HOT 2
- URI query component is ignored when opening a file HOT 5
- Inconsistency with Path HOT 3
- `joinpath` does not update url
- Refactor tests HOT 3
- Updating package dev tools HOT 4
- Add ability to register custom UPath implementations HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from universal_pathlib.