tinche / aiofiles Goto Github PK

File support for asyncio

License: Apache License 2.0

Python 99.82% Makefile 0.18%

aiofiles's Issues

DeprecationWarning at aiofiles.os:8

Hi there! Pytest is correctly complaining about the use of @coroutine decorator at aiofiles.os line 8. This is for the current latest 0.5.0 version.

aiofiles/aiofiles/os.py

Lines 7 to 16 in 258e956

 def wrap(func): 

 @asyncio.coroutine 

 @wraps(func) 

 def run(*args, loop=None, executor=None, **kwargs): 

 if loop is None: 

 loop = asyncio.get_event_loop() 

 pfunc = partial(func, *args, **kwargs) 

 return loop.run_in_executor(executor, pfunc) 

 return run

The fix would be to use async def directly:

def wrap(func):
    @wraps(func)
    async def run(*args, loop=None, executor=None, **kwargs):
        if loop is None:
            loop = asyncio.get_event_loop()
        pfunc = partial(func, *args, **kwargs)
        return loop.run_in_executor(executor, pfunc)

    return run

Feature Request: gzip support

https://docs.python.org/3/library/gzip.html

Would be nice to have gzip support.

Add support for creating symlinks, like os.symlink

I would find it useful to be able to create symlinks in a non-blocking way.

aiofile Corrupts images

It seems like aiofile doesnt support writing bytes, or respect this line: async with aiofile.AIOFile(temp,"wb") as f:

export `wrap()` function for wrapping opened files

One of the potential uses of aiofiles is to read sys.stdin asynchronously, which is currently not possible with only publicly available aiofiles.open function.

Asynchonous context managers and iterators

README.md says :

The closing of a file may block, and yielding from a coroutine while exiting from a context manager isn't possible, so aiofiles file objects can't be used as context managers. Use the try/finally construct from the introductory section to ensure files are closed.

Iteration is also unsupported. To iterate over a file, call readline repeatedly until an empty result is returned. Keep in mind readline doesn't strip newline characters.

Python 3.5 provides asynchronous context managers and iterators, using __aenter__, __aexit__, __aiter__ and __anext__. I think adding support for these could provide a nice API for aiofiles, and it wouldn'bt be harder than something along the lines of:

class IterableContextManagerFileWrapper:  # yes it needs a new name
    def __init__(self, *args, **kwargs):
        # store arguments so that file can be created in __aenter__
        self.__args, self.__kwargs = args, kwargs
        self.__file = None  # set in __aenter__

     def __getattr__(self, name):
         assert self.__file is not None, "file context manager not entered"
         return getattr(self.__file, name)   # wrap the async file object

     async def __aenter__(self):
         self.__file = await open(*self.args, **self.kwargs)
         return self  # return self and not file so that the value can be used as iterable

     async def __aexit__(self, exc_type, exc_value, exc_tb):
         await self.close()
         return False  # no reason to intercept exception

     async def __aiter__(self):
         return self  # this object is an iterator, so just has to return itself to be an iterable

     async def __anext__(self):
          line = await self.readline()
          if not line:  # EOF
              raise StopAsyncIteration  # not StopIteration !
          else:
              return line

The resulting wrapper could now be used as :

async with IterableContextManagerFileWrapper("/tmp/lol") as aiofile:
    async for line in aiofile:
        line = line.rstrip()
        ...

My proposed POC has a weird name because I don't really know to integrate it into the class hierarchy, so I made a wrapper that provides the proposed features on top of the objects returned by aiofiles coroutines.

This is also why I submit this is an issue and not a PR, because I would like to get your opinion on what would be the best way to implement.

[Question] How to hashlib a large number of csv files with aiofiles to increase performance

Saw this article but async is slower than standard way ?

https://stackoverflow.com/questions/56685967/asynchronous-programming-for-calculating-hashes-of-files

OSError: [Errno 24] Too many open file

When I try to open a json in every half second I get this error.

Support for type hints

It'd be really nice to be able to use mypy against files that include aiofiles, but that's not possible now since aiofiles isn't typed.

A simple example shows that mypy --strict doesn't allow use of aiofiles.open:

$ mypy --strict main.py 
main.py:13: error: Call to untyped function "open" in typed context

Here's the code:

#!/usr/bin/env python
import aiofiles
import asyncio

def main() -> None:
    el = asyncio.get_event_loop()
    el.run_until_complete(async_main())

async def async_main() -> None:
    async with aiofiles.open('test_aiofiles', 'wb') as f:
        await f.write(b'Hello, world!\n')

if __name__ == '__main__':
    main()

Test fails for sendfile with EINVAL

==================================================================================================== FAILURES =====================================================================================================
_______________________________________________________________________________________________ test_sendfile_file ________________________________________________________________________________________________

tmpdir = local('/tmp/pytest-of-nwani/pytest-6/test_sendfile_file0')

    @pytest.mark.asyncio
    def test_sendfile_file(tmpdir):
        """Test the sendfile functionality, file-to-file."""
        filename = join(dirname(__file__), 'resources', 'test_file1.txt')
        tmp_filename = tmpdir.join('tmp.bin')

        with open(filename) as f:
            contents = f.read()

        input_file = yield from aiofiles.open(filename)
        output_file = yield from aiofiles.open(str(tmp_filename), mode='w+')

        size = (yield from aiofiles.os.stat(filename)).st_size

        input_fd = input_file.fileno()
        output_fd = output_file.fileno()

>       yield from aiofiles.os.sendfile(output_fd, input_fd, 0, size)

tests/test_os.py:35:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../_t_env/lib/python3.6/asyncio/coroutines.py:213: in coro
    res = yield from res
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <concurrent.futures.thread._WorkItem object at 0x2af170783ef0>

    def run(self):
        if not self.future.set_running_or_notify_cancel():
            return

        try:
>           result = self.fn(*self.args, **self.kwargs)
E           OSError: [Errno 22] Invalid argument

../_t_env/lib/python3.6/concurrent/futures/thread.py:55: OSError
====================================================================================== 1 failed, 229 passed in 6.33 seconds =======================================================================================

Strace to the rescue:

669   sendfile(19, 18, [0], 10)         = -1 EINVAL (Invalid argument)

Python 3.7

Are there any known issues running in python 3.7?

rmtree support

Hi,

It would be great to support recursive directory removal.

Regards

Update pip package.

I just pip installed the package and attempted to do aiofiles.os.mkdir
and it crashed because that function didn't exist. After debugging it appears that the pip package is missing mkdir inside os.py. Please update the pip package.

import error

when I import aiofile,it will raise a OS Error like that:

Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import aiofile
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rouzip/.local/lib/python3.5/site-packages/aiofile/__init__.py", line 1, in <module>
    from .utils import Reader, Writer
  File "/home/rouzip/.local/lib/python3.5/site-packages/aiofile/utils.py", line 3, in <module>
    from .aio import AIOFile
  File "/home/rouzip/.local/lib/python3.5/site-packages/aiofile/aio.py", line 6, in <module>
    from .posix_aio import IO_NOP, IO_WRITE, IO_READ, AIOOperation
  File "aiofile/posix_aio.pyx", line 67, in init aiofile.posix_aio (aiofile/posix_aio.c:7366)
OSError: [Errno 22] Invalid argument

I don't know what should I do to fix this problem,can anybody help me?
I use python3.5.2 and my os is ubuntu16.04

Memory continue to grow while reading large file

async with aiofiles.open(fileName) as f:
    i = 0
    async for line in f:
        i += 1
        print(i, len(line))

Memory usage grows like there's no tomorrow.

Using explicit executor and close it after every usage seems to solve this problem. But I don't know why it works. What happens to those run_in_executor coroutine data?

Open files in r+b mode

Reports "FileNotFoundError: [Errno 2] No file or directory: '/path/to/file'

Python 3.8 deprecation warnings

e.g.
/home/daves/github/aiofiles/.pybuild/cpython3_3.8/build/aiofiles/threadpool/utils.py:33: DeprecationWarning: "@coroutine" decorator is deprecated since Python 3.8, use "async def" instead
def method(self, *args, **kwargs):

.pybuild/cpython3_3.8/build/tests/test_simple.py::test_serve_small_bin_file_sync
/home/daves/github/aiofiles/.pybuild/cpython3_3.8/build/tests/test_simple.py:28: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
server = yield from asyncio.start_server(serve_file, port=unused_tcp_port,

Many writes to a file

I want to write to a file many times during execution of my code.

Looking at the examples and the code a thread is created using run_in_executor during the open method of aiofiles.
So each time I want to write to the file a new thread is created. Which is expensive I assume ? (am I now developing for a raspberrypi)

Is it possible to dedicate a specific thread to a specific file ? Or should I keep the file open all the time. Do my writes and finally close it ?

Thanks !

Impossible to properly cancel coroutine waiting for blocking operation

I believe this is fundamental limitation of current approach of running blocking operations in threads, and I don't see any way to workaround this.

Any ideas?

Support PEP 519

accept pathlib.Path as filename etc.

Please tag 0.4.0 release in git

In Fedora we package from github tarball rather than PyPI, so this would help us updating to 0.4.0.

Thanks!

awaiting for aiofiles.os.remove() returns TypeError: object NoneType can't be used in 'await' expression

I'm trying to remove a file from a local directory asynchronously; however, I get the following error:
object NoneType can't be used in 'await' expression (<class 'TypeError'>)
I'm using ver aiofiles 0.5.0 and Python 3.6.5

my code is as straightforward as such:

async def delete_local_file(file_to_del):
    await aiof.os.remove(file_to_del)
    print("deleted: "+file_to_del)

await delete_local_file(localfile)

Read a file in binary mode from stdin using coroutine

I have used aiofiles to read files if we have path to the file

#! /usr/bin/python3.6
import asyncio
import aiofiles

async def read_input():
    async with aiofiles.open('hello.txt', mode='rb') as f:
        while True:
            inp = await f.read(4)
            if not inp: break
            print('got :' , inp)

async def main():
    await asyncio.wait([read_input()])

event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(main())
event_loop.run_forever()

I need to read the file from standard input as ./reader.py < file.txt
The above code reads file as binary.
But I need to read stdin as binary using coroutines. I am unable to figure out any way to do that.

Regards,
Sarbjit Singh

Missing .name and .mode attributes

The FileIO attributes .name and .mode are missing from the wrappers.

So this works:

with open(filename) as f:
    print(f.name, f.mode)

This doesn't, resulting in attribute errors.

async with aiofiles.open(filename) as f:
    print(f.name, f.mode)

tempfile support

I would have use for aiofiles if it supported creating and managing temporary files, especially via the standard tempfile interface. Potential new feature?

inconsistency between version & tag

Version is 0.3.0, but tag is 0.3... Would be nice to give some consistency ;)

async versions of shutil

shutil.copyfile and shutil.copyfileobj

Package cannot be installed via `pip --no-binaries`

I prefer to not install binary packages like wheels for libraries, that do not require compilation during installation.
But there is no .gz package in PIPI, so pip install --no-binaries :all: aiofiles will fail.

pickle support

Similar to issue #20 , how can standard pickle interface be used along with aiofiles? How could it be done? By mimicking pickle sourcecode equivalent into aiofiles? Can such support be a feature of aiofiles?

Can we get an update release on pypi

It would be nice to get access to latest changes made in project through installing the package from pypi.

tell() has to be awaited twice when subclassed.

I guess the tell() function is bugged for subclassed AsyncBufferedReader objects

the following program returns the this output:

b'ADSEGMENTE' <generator object _make_delegate_method..method at 0x10ac77938>

import aiofiles
import asyncio
#!/usr/bin/env python3.6
# -*- encoding: utf-8 -*-
"""
How do i create attributes in async functions?

Expected result:
    b'THISISTEXT' 10

Actual result:
    b'THISISTEXT' <generator object _make_delegate_method.<locals>.method at 0x10e7cda40>
"""
class Buffer(aiofiles.threadpool.binary.AsyncBufferedReader):

    def __init__(self, stream, block_size):
        super().__init__(stream, loop=stream._loop, executor=stream._executor)
        self.stream = stream
        self.block_size = block_size

    async def __aiter__(self):
        return self

    async def __anext__(self):
        return await self.read(self.block_size)

    async def get_index(self):
        # the following line is a workaround... 
        # return await self.stream.tell()
        return await self.tell()

    @classmethod
    async def new(cls, file_path,  mode='rb', block_size=1024):
        stream = await aiofiles.open(file_path, mode=mode)
        return cls(stream, block_size)

async def run_read():
    bin_buffer = await  Buffer.new('file_name.raw', block_size=10)
    async for chunk_data in bin_buffer:
        # the following line is also a workaround... 
        # print(await chunk_data, await (await bin_buffer.get_index()))
        print(await chunk_data, await bin_buffer.get_index())


if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(run_read())

When I call stream directly, which is an AsyncBufferedReader, I can access tell() with one await, when I subclass the AsyncBufferedReader I have to await twice to access tell().

Please add benchmarks for 1G files compared to no asyncness

It would be great to know what could be the possible gain of using it (other than being forced to while using aio stuff).

Atomic writes with aiofiles

Hi,

For a project, I need to have atomic writes for files.
I've found this library: https://github.com/untitaker/python-atomicwrites

I don't see an obvious way to integrate python-atomicwrites with aiofiles.

What's your recommended approach ?
Use python-atomicwrites with a threadpool, or fork aiofiles to use python-atomicwrites ?
Other approach ?

Have a nice day.

Extremely slow line iteration on Windows/ProactorEventLoop

Windows 7 64-bit, Python 3.7.0 64-bit, aiofiles 0.4.0

Test program:

import asyncio, aiofiles, timeit

with open('sample.txt', 'w') as f:
	for i in range(10000):
		print('A'*100, file=f)

loop = asyncio.ProactorEventLoop()
asyncio.set_event_loop(loop)

def traditional_read():
	with open('sample.txt', encoding='utf-8') as f:
		for ln in f:
			pass

async def aiofiles_read():
	async with aiofiles.open('sample.txt', encoding='utf=8') as f:
		async for ln in f:
			pass

print('traditional:', timeit.timeit('traditional_read()', number=1, globals=globals()))
print('aiofiles:', timeit.timeit('loop.run_until_complete(aiofiles_read())', number=1, globals=globals()))

Output:

traditional: 0.005477579999999982
aiofiles: 1.563328274

The sample file is not a very big one (less than 1MB), but it contains a bit "too many" lines. So is the result caused by excess thread context switches? Can this be avoided by some approach?

file watch support?

While answering aio-libs/aiohttp-devtools#118 I suddenly wondered if aiofiles had support for watching for file changes.

It seems it doesn't at the moment, is it something you'd consider in the future?

Like watchdog it could start by implementing a brute force "iterate over files looking for changes" interface then go on to use inotify in the future (and equivalent for other OSs).

Add support of some `os.` posix calls, like fsync()

feature request: read backward

Just an easy way to read line by lines from the end of the file without loading all of it in memory

Doesn't save file

I have standard method to save file.
But when i use

        with open(file_path, 'wb') as fsave:
            fsave.write(file_data)

Files saved without errors.
If i use

        async with aiofiles.open(file_path, 'wb') as fsave:
            fsave.write(file_data)

Files created, but with 0 size. file_data the same.

Implement fdopen

For giving async API for file discriptor.
Like https://docs.python.org/3/library/os.html#os.fdopen

If you like the idea I could make a patch.

memory leak?

Hi Tinche

First of all thanks for the great work! Asyncronous file support for asyncio is a great thing to have!

While testing a small project, I noticed a large amount of threads and big memory consumption in the python process. I deciced to write a small testscript which just writes to a file in a loop and tracks the memory:

#!/usr/bin/python3
import asyncio
import aiofiles

import os
import psutil

async def printMemory():
    for iteration in range(0, 20):

        # grab the memory statistics
        p = psutil.Process(os.getpid())
        vms =p.memory_info().vms / (1024.0*1024.0)
        threads = p.num_threads()
        print(f'Iteration {iteration:>2d} - Memory usage (VMS): {vms:>6.1f} Mb; # threads: {threads:>2d}')

        # simple write to a test file
        async with aiofiles.open('test.txt',mode='w') as f:
            await f.write('hello\n')

        # a wait, just for the sake of it
        await asyncio.sleep(1)


loop = asyncio.get_event_loop()

try:
    loop.run_until_complete(printMemory())
finally:
    loop.close()

The output shows some worrisome numbers (run with Python 3.6.5 on Debian 8.10 (jessy) ):

Iteration  0 - Memory usage (VMS):   92.5 Mb; # threads:  1
Iteration  1 - Memory usage (VMS):  308.5 Mb; # threads:  4
Iteration  2 - Memory usage (VMS):  524.6 Mb; # threads:  7
Iteration  3 - Memory usage (VMS):  740.6 Mb; # threads: 10
Iteration  4 - Memory usage (VMS):  956.6 Mb; # threads: 13
Iteration  5 - Memory usage (VMS): 1172.6 Mb; # threads: 16
Iteration  6 - Memory usage (VMS): 1388.7 Mb; # threads: 19
Iteration  7 - Memory usage (VMS): 1604.7 Mb; # threads: 22
Iteration  8 - Memory usage (VMS): 1820.8 Mb; # threads: 25
Iteration  9 - Memory usage (VMS): 2036.8 Mb; # threads: 28
Iteration 10 - Memory usage (VMS): 2252.8 Mb; # threads: 31
Iteration 11 - Memory usage (VMS): 2468.8 Mb; # threads: 34
Iteration 12 - Memory usage (VMS): 2684.8 Mb; # threads: 37
Iteration 13 - Memory usage (VMS): 2900.8 Mb; # threads: 40
Iteration 14 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 15 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 16 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 17 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 18 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 19 - Memory usage (VMS): 2972.8 Mb; # threads: 41

Any idea where this could come from?

nonblocking IO

I've been comparing how aiofiles does things compared to other libraries like aiohttp.

aiofiles runs blocking functions in threadpools using loop.run_in_executor, while aiohttp mostly sets file descriptors to nonblocking mode, then uses loop.add_writer/loop.add_reader as necessary.

Why did aiofiles choose the former approach?

tempfile support

Right now the tempfile module included in the Python stdlib operates on classic file objects.

Looking at the code though, it would be pretty simple to adapt it to work with aiofiles

creating tests with fake files

I noticed aiofiles' unit tests use real file IO.

In my project that uses aiofiles, I think it might make more sense to unit test with faked file IO, using something like pyfakefs.

The setup code I wrote for this feels a little clunky, but it seems to work:

@pytest.fixture
def fs(request):
    """ Fake filesystem. """
    patcher = Patcher()
    patcher.setUp()

    patcher._stubs.SmartSet(threadpool, '_sync_open', patcher.fake_open)

    request.addfinalizer(patcher.tearDown)
    return patcher.fs


@threadpool.wrap.register(FakeFileWrapper)
def _(file, *, loop=None, executor=None):
    return AsyncBufferedIOBase(file, loop=loop, executor=executor)

So there are two problems that this code solves:

Patcher monkey-patches builtins.open, but because aiofiles already made a reference to open, this doesn't get replaced (contrast to referencing builtins.open instead of _sync_open)
FakeFileWrapper from pyfakefs I think probably should be a subclass or registered class of io.FileIO or similar.

I am unsure if I should either:

directly "fix" these problems in aiofiles and pyfakefs respectively, or
make a small pytest plugin that handles this like the code example above.

What do you think?

StreamWriter/StreamReader api

https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamWriter

for those more used to x.write(b"foo") and await x.drain()

async implementation of pathlib

so we can do things like:

import asyncio

import aiostream
from aiofiles import pathlib

async def export_as_js(p: pathlib.Path) -> None:
    t = await p.read_text()
    await asyncio.gather(
        p.with_suffix('.js').write_text(f'export default `{t}`;'),
        p.unlink(),
    )
    

async def amain():
    await aiostream.map(pathlib.Path('.').glob('*/**.html'), export_as_js)

def main():
    asyncio.run(amain())

How can i read a line and specific row

I have this non async method

     def lastname():
        with open('surnames.csv') as csvfile:
             readCSV = csv.reader(csvfile, delimiter=',')
             for row in readCSV:
              print(row[0],row[1])

that reads a row

and i have this async

 async def firstname():
         async with aiofiles.open('firstnames.csv') as csvfile:
              readCSV  = await csvfile.read()
              return readCSV

How can i read a row in the async method?

Asynchronously check file/folder exists

Hi,
Thanks for the good library.

Can we check path existence asynchronously with this library? I mean something like this:

path_existence = await aiofiles.path.exists(some_path)

If not, I request this feature. Thanks again.

`async with` throws error

import aiofiles
print(aiofiles.__version__)

async with aiofiles.open('wctDA2.tmp', mode='r') as f:
    contents = await f.read()
print(contents)

Output:


File "a.py", line 4
    async with aiofiles.open('wctDA2.tmp', mode='r') as f:
             ^
SyntaxError: invalid syntax

Details:

python: Python 3.6.4 :: Anaconda custom (64-bit)
OS: Windows 10 x64
aiofiles: 0.3.2

Add more os calls like os.stat

I'm just poking at this - came to it through a reference on MagicStack/uvloop#1 (comment)

Would you be interested in adding support to the project for more calls from the os module like os.stat?

e.g.

os.close
os.fstat
os.read
os.write
os.unlink
os.listdir
os.path.exists
os.rmdir

Deprecation warning on aiter

In a small bit of test code I trigger (I think) the changed behavior of aiter between Python 3.5 and 3.6. The code is this below, with the error. This is my very first use of aiofiles (and almost my first use of asyncio) so I'm happy to accept that I could be doing something wrong. 😉

async def query_feed(src, queue):
    print('Starting read from {}'.format(src))
    # The file-handle itself can be non-blocking and asynchronous
    async with aiofiles.open(src) as fh:
        async for line in fh:
            await asyncio.sleep(2)
            await queue.put(line.strip())
            # print() is synchronous and completes atomically w/o control switch
            print("From feed:\t", line.strip())

DeprecationWarning: 'AsyncTextIOWrapper' implements legacy __aiter__ protocol; __aiter__ should return an asynchronous iterator, not awaitable

	def wrap(func):
	@asyncio.coroutine
	@wraps(func)
	def run(args, loop=None, executor=None, *kwargs):
	if loop is None:
	loop = asyncio.get_event_loop()
	pfunc = partial(func, args, *kwargs)
	return loop.run_in_executor(executor, pfunc)

	return run

tinche / aiofiles Goto Github PK

aiofiles's Issues

Recommend Projects

Recommend Topics

Recommend Org