jgoutin / airfs Goto Github PK
View Code? Open in Web Editor NEWA Python library for cloud and remote file Systems
License: Apache License 2.0
A Python library for cloud and remote file Systems
License: Apache License 2.0
Add (optional) async evaluation for some operations that require multiples requests (like listdir on some storage).
Fully type the library like most modern packages.
Add a local cache system for requests result, file content and more.
Aim is to reduce latency due to web requests and data transfer.
Add Asyncio support for all functions.
ObjectBufferedIOBase
does not support the "a" mode.
Add Linux FUSE support.
See libfuse Python bindings.
Add buffer management features to improve performance:
Listing files and directories
Removing files and directories
Creating directories
Checks if directory exists:
Path handling:
Tree iterators:
Moving files:
Copy files and directories
Archives:
Others:
Add of "Google Cloud Storage" support. See WIP on feature_google_cloud_storage
branch.
Provide a Unix like CLI interface.
Add a "sync" command.
Originally, methods only have a path
argument, it now have a client_kwargs
and headers
arguments that allow to caching some result related to path.
Replace all this arguments by a single Path
object containing all of them.
Advantages:
client_kwargs
, headers
, ... and even moreTo improve performance, also add a caching feature of theses objects:
azure-storage-blob
API changed in version 12.
The version was pinned to 2.1.0 in setup.py, but Airfs needs to be updated to use the new API.
Evaluate results inside "stat_result" when accessed instead of on object creation, and memoize results.
There are some cases that need to re-run the full test sequence at least with mocks:
Excepted test result must be adapted to each case (Actually only work with full RW access).
Allow to access/manage versionned files (Like on AWS S3).
Today, GitHub is one of the most large file storage over the internet. Repositories stored on GitHub are a valuable source of data.
The idea is to allow navigate over GitHub repositories as a filesystem.
Sometime the path is fully resolved but this is not required to get the needed data.
Example:
On Github storage, a branch is resolved as a commit and then headers are retrieved. But in this case, the branch already contain some information that can be directly used. With the resolution, 2 API requests (branch, commit) are required to get an information that can be get with a single API request to the branch.
For path target and parents directories:
For parent directories:
use functools.cached_property
to handle memoization.
Refactor functions test to use tests.storage_package.mock
and perform more complete tests and add local filesystem function comparison.
Can also use CPython tests as base to help get a better compliance with stdlib:
https://github.com/python/cpython/blob/master/Lib/test/test_os.py
Add symlink support.
The new Python 3.10 "strict" argument currently only works with local paths but should also works with storage objects.
Some storage support random access write (Full, by allowing writing bytes ranges, or partials by allowing rewriting of blocks).
Currently, ObjectBufferedIOBase
only allows writing a file from the start.
It would be interesting to improve it to allow seeking and writing when random writing is possible.
This also permits the a
mode support.
Raid over multiple storage.
Object/blob Cloud storage:
Documents cloud storage:
Git Repositories (Read only)
Clustering/Big Data
Other protocols
Appeared in CI, only on Python 3.6 but on all OS.
Seems to be random and do not occur each time if re-run jobs.
Failed to reproduce it locally.
tests/test_storage_github.py:151:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test_storage_github.py:162: in github_storage_scenario
exists_scenario()
tests/test_storage_github.py:618: in exists_scenario
assert airfs.exists("https://github.com/jgoutin/airfs/releases/latest")
airfs/_core/functions_core.py:88: in decorated
result = cos_function(path_str, *args, **kwargs)
airfs/_core/functions_os_path.py:25: in exists
return get_instance(path).exists(path, follow_symlinks=True)
airfs/_core/io_base_system.py:154: in exists
path, client_kwargs, header, follow_symlinks
airfs/_core/io_base_system.py:938: in resolve
return self._resolve(path, client_kwargs, header)
airfs/_core/io_base_system.py:955: in _resolve
target = self.read_link(path, client_kwargs, header)
airfs/storage/github/__init__.py:349: in read_link
return client_kwargs["object"].read_link(self.client, client_kwargs)
airfs/storage/github/_model_base.py:417: in read_link
target = cls.SYMLINK.format(**ChainMap(spec, cls.head(client, spec)))
../../../hostedtoolcache/Python/3.6.12/x64/lib/python3.6/collections/__init__.py:883: in __getitem__
return self.__missing__(key) # support subclasses that define __missing__
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = ChainMap({'full_path': 'https://github.com/jgoutin/airfs/releases/latest', 'object': <class 'airfs.storage.github._mod...20-01-10T22:58:52Z', 'name': '1.4.0', 'tag': '1.4.0', 'sha': '<Not evaluated yet>', 'tree_sha': '<Not evaluated yet>'})
key = 'tree_sha'
def __missing__(self, key):
> raise KeyError(key)
E KeyError: 'tree_sha'
../../../hostedtoolcache/Python/3.6.12/x64/lib/python3.6/collections/__init__.py:875: KeyError
Files "lock" for concurrent file access.
Add support for access controls.
Also add mode
support in makedirs
, mkdir
, stat
, lstat
.
Permit the add of os.chmod
.
In many case, using the API v4 can have a far better performance and rate limit use than using the API v3.
Airfs GitHub support can be improved using it.
Note that API v3 support needs to be kept to allow unauthenticated use. API v4 must be enabled only if authenticated use is detected.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.