Giter VIP home page Giter VIP logo

airfs's People

Contributors

dependabot[bot] avatar jgoutin avatar stewartadam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

darnox

airfs's Issues

Full typing

Fully type the library like most modern packages.

Caching system

Add a local cache system for requests result, file content and more.

Aim is to reduce latency due to web requests and data transfer.

asyncio

Add Asyncio support for all functions.

Buffer management

Add buffer management features to improve performance:

  • Define maximum buffers count based on available memory.
  • Allow HDD swap.
  • Dynamic buffer size and preloading based on access type (Random sparse access or sequential access)

More standard library like functions

Listing files and directories

  • os.listdir: Done 1.2.0

Removing files and directories

  • os.remove / os.unlink: Done 1.2.0
  • shutil.rmtree
  • os.rmdir: Done 1.2.0
  • os.removedirs

Creating directories

  • os.mkdir: Done 1.1.0
  • os.makedirs: Done 1.1.0

Checks if directory exists:

  • os.path.isdir: Done 1.1.0
  • os.path.exists: Done 1.1.0
  • os.path.lexists: Done 1.5.0

Path handling:

  • os.path.isabs: Done 1.1.0
  • os.path.ismount: Done 1.1.0
  • os.path.splitdrive: Done 1.1.0
  • os.path.samefile: Done 1.1.0
  • os.path.realpath: Done 1.5.0
  • os.path.normpath
  • os.path.normcase
  • os.path.join
  • os.path.abspath
  • os.path.split

Tree iterators:

  • os.scandir: Done 1.2.0
  • os.walk

Moving files:

  • os.rename
  • os.renames
  • os.replace
  • shutil.move,

Copy files and directories

  • shutil.copyfile: Done 1.2.0
  • shutil.copytree (With new Python 3.8 API)

Archives:

  • bz2.open
  • gzip.open
  • gzip.GzipFile
  • lzma.open
  • tarfile.open
  • tarfile.TarFile
  • zipfile.ZipFile

Others:

  • os.lstat: Done 1.2.0 (+ "st_mode" 1.5.0)
  • os.stat: Done 1.2.0 (+ "st_mode" 1.5.0)
  • os.path.samestat
  • os.chmod
  • os.truncate
  • os.symlink: Done 1.5.0
  • os.readlink: Done 1.5.0
  • os.access
  • pathlib

Google Cloud Storage

Add of "Google Cloud Storage" support. See WIP on feature_google_cloud_storage branch.

[Performance, maintainability] Refactor SystemBase methods arguments to Path object

Originally, methods only have a path argument, it now have a client_kwargs and headers arguments that allow to caching some result related to path.

Replace all this arguments by a single Path object containing all of them.

Advantages:

  • Simplify methods code
  • Improve caching (and performance) by allowing method to lazily evaluate and memoize client_kwargs, headers, ... and even more
  • Move some methods from the system class to the path class to reduce system class size.
  • Easier addition of cached results in the future.

To improve performance, also add a caching feature of theses objects:

  • Cache a specified amount of caches (Remove oldest when amount is reached)
  • First get path from cache if exist, and cache any new entry.
  • Reset path object internal cache (headers and other evaluated properties) when:
    • airfs change the file/directory associated with the path.
    • airfs request cached values after a too long amount of time and the file changed. Head the file and look if a storage specific value changed (modification date, Etag, ...) to check if file changed.

Partial access test

There are some cases that need to re-run the full test sequence at least with mocks:

  • Write-only access to storage.
  • Read-only access to storage.

Excepted test result must be adapted to each case (Actually only work with full RW access).

Files versions

Allow to access/manage versionned files (Like on AWS S3).

GitHub as a storage

Today, GitHub is one of the most large file storage over the internet. Repositories stored on GitHub are a valuable source of data.

The idea is to allow navigate over GitHub repositories as a filesystem.

[Performance] Partial symlink resolution when enough data

Sometime the path is fully resolved but this is not required to get the needed data.

Example:
On Github storage, a branch is resolved as a commit and then headers are retrieved. But in this case, the branch already contain some information that can be directly used. With the resolution, 2 API requests (branch, commit) are required to get an information that can be get with a single API request to the branch.

Functions that does not yet support symlinks

For path target and parents directories:

  • airfs.rmdir
  • airfs.samefile
  • airfs.copy
  • airfs.copypath

For parent directories:

  • airfs.makedirs
  • airfs.mkdir
  • airfs.remove
  • airfs.lstat
  • airfs.lexists
  • airfs.islink

Seekable ObjectBufferedIOBase in write mode

Some storage support random access write (Full, by allowing writing bytes ranges, or partials by allowing rewriting of blocks).

Currently, ObjectBufferedIOBase only allows writing a file from the start.

It would be interesting to improve it to allow seeking and writing when random writing is possible.

This also permits the a mode support.

Raid

Raid over multiple storage.

Implement more storage support

Object/blob Cloud storage:

  • Alibaba Cloud OSS: Done 1.0.0
  • AWS S3: Done 1.0.0
  • Google Cloud Storage: see #11
  • Huawei OBS (Object Storage Service)
  • Microsoft Azure Blobs Storage: Done 1.3.0
  • Microsoft Azure Files Storage: Done 1.3.0
  • OpenStack Swift: Done 1.0.0
  • SIA

Documents cloud storage:

  • Dropbox
  • Google Drive
  • Microsoft Onedrive
  • NextClould/OwnCloud
  • Cozy Cloud

Git Repositories (Read only)

  • GitHub: Done 1.5.0
  • GitLab
  • Bitbucket
  • Git (Any git repository)

Clustering/Big Data

Other protocols

  • FTP
  • SFTP (See Paramiko library)
  • SMB
  • IPFS (Python library: py-ipfs-api)
  • WebDAV

GitHub: Python 3.6, random "KeyError" in read_link

Appeared in CI, only on Python 3.6 but on all OS.
Seems to be random and do not occur each time if re-run jobs.
Failed to reproduce it locally.

tests/test_storage_github.py:151: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/test_storage_github.py:162: in github_storage_scenario
    exists_scenario()
tests/test_storage_github.py:618: in exists_scenario
    assert airfs.exists("https://github.com/jgoutin/airfs/releases/latest")
airfs/_core/functions_core.py:88: in decorated
    result = cos_function(path_str, *args, **kwargs)
airfs/_core/functions_os_path.py:25: in exists
    return get_instance(path).exists(path, follow_symlinks=True)
airfs/_core/io_base_system.py:154: in exists
    path, client_kwargs, header, follow_symlinks
airfs/_core/io_base_system.py:938: in resolve
    return self._resolve(path, client_kwargs, header)
airfs/_core/io_base_system.py:955: in _resolve
    target = self.read_link(path, client_kwargs, header)
airfs/storage/github/__init__.py:349: in read_link
    return client_kwargs["object"].read_link(self.client, client_kwargs)
airfs/storage/github/_model_base.py:417: in read_link
    target = cls.SYMLINK.format(**ChainMap(spec, cls.head(client, spec)))
../../../hostedtoolcache/Python/3.6.12/x64/lib/python3.6/collections/__init__.py:883: in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = ChainMap({'full_path': 'https://github.com/jgoutin/airfs/releases/latest', 'object': <class 'airfs.storage.github._mod...20-01-10T22:58:52Z', 'name': '1.4.0', 'tag': '1.4.0', 'sha': '<Not evaluated yet>', 'tree_sha': '<Not evaluated yet>'})
key = 'tree_sha'

    def __missing__(self, key):
>       raise KeyError(key)
E       KeyError: 'tree_sha'

../../../hostedtoolcache/Python/3.6.12/x64/lib/python3.6/collections/__init__.py:875: KeyError

ACL support

Add support for access controls.

Also add mode support in makedirs, mkdir, stat, lstat.
Permit the add of os.chmod.

GitHub storage performance improvement using API v4 ( GraphQL)

In many case, using the API v4 can have a far better performance and rate limit use than using the API v3.

Airfs GitHub support can be improved using it.

Note that API v3 support needs to be kept to allow unauthenticated use. API v4 must be enabled only if authenticated use is detected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.