tinche / aiofiles Goto Github PK
View Code? Open in Web Editor NEWFile support for asyncio
License: Apache License 2.0
File support for asyncio
License: Apache License 2.0
Hi there! Pytest is correctly complaining about the use of @coroutine
decorator at aiofiles.os line 8. This is for the current latest 0.5.0 version.
Lines 7 to 16 in 258e956
The fix would be to use async def
directly:
def wrap(func):
@wraps(func)
async def run(*args, loop=None, executor=None, **kwargs):
if loop is None:
loop = asyncio.get_event_loop()
pfunc = partial(func, *args, **kwargs)
return loop.run_in_executor(executor, pfunc)
return run
https://docs.python.org/3/library/gzip.html
Would be nice to have gzip support.
I would find it useful to be able to create symlinks in a non-blocking way.
It seems like aiofile doesnt support writing bytes, or respect this line: async with aiofile.AIOFile(temp,"wb") as f:
One of the potential uses of aiofiles is to read sys.stdin
asynchronously, which is currently not possible with only publicly available aiofiles.open
function.
README.md
says :
The closing of a file may block, and yielding from a coroutine while exiting from a context manager isn't possible, so aiofiles file objects can't be used as context managers. Use the try/finally construct from the introductory section to ensure files are closed.
Iteration is also unsupported. To iterate over a file, call readline repeatedly until an empty result is returned. Keep in mind readline doesn't strip newline characters.
Python 3.5 provides asynchronous context managers and iterators, using __aenter__
, __aexit__
, __aiter__
and __anext__
. I think adding support for these could provide a nice API for aiofiles, and it wouldn'bt be harder than something along the lines of:
class IterableContextManagerFileWrapper: # yes it needs a new name
def __init__(self, *args, **kwargs):
# store arguments so that file can be created in __aenter__
self.__args, self.__kwargs = args, kwargs
self.__file = None # set in __aenter__
def __getattr__(self, name):
assert self.__file is not None, "file context manager not entered"
return getattr(self.__file, name) # wrap the async file object
async def __aenter__(self):
self.__file = await open(*self.args, **self.kwargs)
return self # return self and not file so that the value can be used as iterable
async def __aexit__(self, exc_type, exc_value, exc_tb):
await self.close()
return False # no reason to intercept exception
async def __aiter__(self):
return self # this object is an iterator, so just has to return itself to be an iterable
async def __anext__(self):
line = await self.readline()
if not line: # EOF
raise StopAsyncIteration # not StopIteration !
else:
return line
The resulting wrapper could now be used as :
async with IterableContextManagerFileWrapper("/tmp/lol") as aiofile:
async for line in aiofile:
line = line.rstrip()
...
My proposed POC has a weird name because I don't really know to integrate it into the class hierarchy, so I made a wrapper that provides the proposed features on top of the objects returned by aiofiles
coroutines.
This is also why I submit this is an issue and not a PR, because I would like to get your opinion on what would be the best way to implement.
Saw this article but async is slower than standard way ?
When I try to open a json in every half second I get this error.
It'd be really nice to be able to use mypy against files that include aiofiles, but that's not possible now since aiofiles isn't typed.
A simple example shows that mypy --strict
doesn't allow use of aiofiles.open
:
$ mypy --strict main.py
main.py:13: error: Call to untyped function "open" in typed context
Here's the code:
#!/usr/bin/env python
import aiofiles
import asyncio
def main() -> None:
el = asyncio.get_event_loop()
el.run_until_complete(async_main())
async def async_main() -> None:
async with aiofiles.open('test_aiofiles', 'wb') as f:
await f.write(b'Hello, world!\n')
if __name__ == '__main__':
main()
==================================================================================================== FAILURES =====================================================================================================
_______________________________________________________________________________________________ test_sendfile_file ________________________________________________________________________________________________
tmpdir = local('/tmp/pytest-of-nwani/pytest-6/test_sendfile_file0')
@pytest.mark.asyncio
def test_sendfile_file(tmpdir):
"""Test the sendfile functionality, file-to-file."""
filename = join(dirname(__file__), 'resources', 'test_file1.txt')
tmp_filename = tmpdir.join('tmp.bin')
with open(filename) as f:
contents = f.read()
input_file = yield from aiofiles.open(filename)
output_file = yield from aiofiles.open(str(tmp_filename), mode='w+')
size = (yield from aiofiles.os.stat(filename)).st_size
input_fd = input_file.fileno()
output_fd = output_file.fileno()
> yield from aiofiles.os.sendfile(output_fd, input_fd, 0, size)
tests/test_os.py:35:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../_t_env/lib/python3.6/asyncio/coroutines.py:213: in coro
res = yield from res
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <concurrent.futures.thread._WorkItem object at 0x2af170783ef0>
def run(self):
if not self.future.set_running_or_notify_cancel():
return
try:
> result = self.fn(*self.args, **self.kwargs)
E OSError: [Errno 22] Invalid argument
../_t_env/lib/python3.6/concurrent/futures/thread.py:55: OSError
====================================================================================== 1 failed, 229 passed in 6.33 seconds =======================================================================================
Strace to the rescue:
669 sendfile(19, 18, [0], 10) = -1 EINVAL (Invalid argument)
Are there any known issues running in python 3.7?
Hi,
It would be great to support recursive directory removal.
Regards
I just pip installed the package and attempted to do aiofiles.os.mkdir
and it crashed because that function didn't exist. After debugging it appears that the pip package is missing mkdir inside os.py. Please update the pip package.
when I import aiofile,it will raise a OS Error like that:
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import aiofile
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rouzip/.local/lib/python3.5/site-packages/aiofile/__init__.py", line 1, in <module>
from .utils import Reader, Writer
File "/home/rouzip/.local/lib/python3.5/site-packages/aiofile/utils.py", line 3, in <module>
from .aio import AIOFile
File "/home/rouzip/.local/lib/python3.5/site-packages/aiofile/aio.py", line 6, in <module>
from .posix_aio import IO_NOP, IO_WRITE, IO_READ, AIOOperation
File "aiofile/posix_aio.pyx", line 67, in init aiofile.posix_aio (aiofile/posix_aio.c:7366)
OSError: [Errno 22] Invalid argument
I don't know what should I do to fix this problem,can anybody help me?
I use python3.5.2 and my os is ubuntu16.04
async with aiofiles.open(fileName) as f:
i = 0
async for line in f:
i += 1
print(i, len(line))
Memory usage grows like there's no tomorrow.
Using explicit executor and close it after every usage seems to solve this problem. But I don't know why it works. What happens to those run_in_executor
coroutine data?
Reports "FileNotFoundError: [Errno 2] No file or directory: '/path/to/file'
e.g.
/home/daves/github/aiofiles/.pybuild/cpython3_3.8/build/aiofiles/threadpool/utils.py:33: DeprecationWarning: "@coroutine" decorator is deprecated since Python 3.8, use "async def" instead
def method(self, *args, **kwargs):
.pybuild/cpython3_3.8/build/tests/test_simple.py::test_serve_small_bin_file_sync
/home/daves/github/aiofiles/.pybuild/cpython3_3.8/build/tests/test_simple.py:28: DeprecationWarning: The loop argument is deprecated since Python 3.8, and scheduled for removal in Python 3.10.
server = yield from asyncio.start_server(serve_file, port=unused_tcp_port,
I want to write to a file many times during execution of my code.
Looking at the examples and the code a thread is created using run_in_executor
during the open
method of aiofiles
.
So each time I want to write to the file a new thread is created. Which is expensive I assume ? (am I now developing for a raspberrypi)
Is it possible to dedicate a specific thread to a specific file ? Or should I keep the file open all the time. Do my writes and finally close it ?
Thanks !
I believe this is fundamental limitation of current approach of running blocking operations in threads, and I don't see any way to workaround this.
Any ideas?
accept pathlib.Path
as filename etc.
In Fedora we package from github tarball rather than PyPI, so this would help us updating to 0.4.0.
Thanks!
I'm trying to remove a file from a local directory asynchronously; however, I get the following error:
object NoneType can't be used in 'await' expression (<class 'TypeError'>)
I'm using ver aiofiles 0.5.0 and Python 3.6.5
my code is as straightforward as such:
async def delete_local_file(file_to_del):
await aiof.os.remove(file_to_del)
print("deleted: "+file_to_del)
await delete_local_file(localfile)
I have used aiofiles to read files if we have path to the file
#! /usr/bin/python3.6
import asyncio
import aiofiles
async def read_input():
async with aiofiles.open('hello.txt', mode='rb') as f:
while True:
inp = await f.read(4)
if not inp: break
print('got :' , inp)
async def main():
await asyncio.wait([read_input()])
event_loop = asyncio.get_event_loop()
event_loop.run_until_complete(main())
event_loop.run_forever()
I need to read the file from standard input as ./reader.py < file.txt
The above code reads file as binary.
But I need to read stdin as binary using coroutines. I am unable to figure out any way to do that.
Regards,
Sarbjit Singh
The FileIO attributes .name
and .mode
are missing from the wrappers.
So this works:
with open(filename) as f:
print(f.name, f.mode)
This doesn't, resulting in attribute errors.
async with aiofiles.open(filename) as f:
print(f.name, f.mode)
I would have use for aiofiles if it supported creating and managing temporary files, especially via the standard tempfile interface. Potential new feature?
Version is 0.3.0, but tag is 0.3... Would be nice to give some consistency ;)
shutil.copyfile and shutil.copyfileobj
I prefer to not install binary packages like wheels for libraries, that do not require compilation during installation.
But there is no .gz package in PIPI, so pip install --no-binaries :all: aiofiles
will fail.
It would be nice to get access to latest changes made in project through installing the package from pypi.
I guess the tell() function is bugged for subclassed AsyncBufferedReader objects
the following program returns the this output:
b'ADSEGMENTE' <generator object _make_delegate_method..method at 0x10ac77938>
import aiofiles
import asyncio
#!/usr/bin/env python3.6
# -*- encoding: utf-8 -*-
"""
How do i create attributes in async functions?
Expected result:
b'THISISTEXT' 10
Actual result:
b'THISISTEXT' <generator object _make_delegate_method.<locals>.method at 0x10e7cda40>
"""
class Buffer(aiofiles.threadpool.binary.AsyncBufferedReader):
def __init__(self, stream, block_size):
super().__init__(stream, loop=stream._loop, executor=stream._executor)
self.stream = stream
self.block_size = block_size
async def __aiter__(self):
return self
async def __anext__(self):
return await self.read(self.block_size)
async def get_index(self):
# the following line is a workaround...
# return await self.stream.tell()
return await self.tell()
@classmethod
async def new(cls, file_path, mode='rb', block_size=1024):
stream = await aiofiles.open(file_path, mode=mode)
return cls(stream, block_size)
async def run_read():
bin_buffer = await Buffer.new('file_name.raw', block_size=10)
async for chunk_data in bin_buffer:
# the following line is also a workaround...
# print(await chunk_data, await (await bin_buffer.get_index()))
print(await chunk_data, await bin_buffer.get_index())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(run_read())
When I call stream directly, which is an AsyncBufferedReader, I can access tell() with one await, when I subclass the AsyncBufferedReader I have to await twice to access tell().
It would be great to know what could be the possible gain of using it (other than being forced to while using aio stuff).
Hi,
For a project, I need to have atomic writes for files.
I've found this library: https://github.com/untitaker/python-atomicwrites
I don't see an obvious way to integrate python-atomicwrites with aiofiles.
What's your recommended approach ?
Use python-atomicwrites with a threadpool, or fork aiofiles to use python-atomicwrites ?
Other approach ?
Have a nice day.
Windows 7 64-bit, Python 3.7.0 64-bit, aiofiles 0.4.0
Test program:
import asyncio, aiofiles, timeit
with open('sample.txt', 'w') as f:
for i in range(10000):
print('A'*100, file=f)
loop = asyncio.ProactorEventLoop()
asyncio.set_event_loop(loop)
def traditional_read():
with open('sample.txt', encoding='utf-8') as f:
for ln in f:
pass
async def aiofiles_read():
async with aiofiles.open('sample.txt', encoding='utf=8') as f:
async for ln in f:
pass
print('traditional:', timeit.timeit('traditional_read()', number=1, globals=globals()))
print('aiofiles:', timeit.timeit('loop.run_until_complete(aiofiles_read())', number=1, globals=globals()))
Output:
traditional: 0.005477579999999982
aiofiles: 1.563328274
The sample file is not a very big one (less than 1MB), but it contains a bit "too many" lines. So is the result caused by excess thread context switches? Can this be avoided by some approach?
While answering aio-libs/aiohttp-devtools#118 I suddenly wondered if aiofiles had support for watching for file changes.
It seems it doesn't at the moment, is it something you'd consider in the future?
Like watchdog it could start by implementing a brute force "iterate over files looking for changes" interface then go on to use inotify in the future (and equivalent for other OSs).
Just an easy way to read line by lines from the end of the file without loading all of it in memory
I have standard method to save file.
But when i use
with open(file_path, 'wb') as fsave:
fsave.write(file_data)
Files saved without errors.
If i use
async with aiofiles.open(file_path, 'wb') as fsave:
fsave.write(file_data)
Files created, but with 0 size. file_data the same.
For giving async API for file discriptor.
Like https://docs.python.org/3/library/os.html#os.fdopen
If you like the idea I could make a patch.
Hi Tinche
First of all thanks for the great work! Asyncronous file support for asyncio is a great thing to have!
While testing a small project, I noticed a large amount of threads and big memory consumption in the python process. I deciced to write a small testscript which just writes to a file in a loop and tracks the memory:
#!/usr/bin/python3
import asyncio
import aiofiles
import os
import psutil
async def printMemory():
for iteration in range(0, 20):
# grab the memory statistics
p = psutil.Process(os.getpid())
vms =p.memory_info().vms / (1024.0*1024.0)
threads = p.num_threads()
print(f'Iteration {iteration:>2d} - Memory usage (VMS): {vms:>6.1f} Mb; # threads: {threads:>2d}')
# simple write to a test file
async with aiofiles.open('test.txt',mode='w') as f:
await f.write('hello\n')
# a wait, just for the sake of it
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(printMemory())
finally:
loop.close()
The output shows some worrisome numbers (run with Python 3.6.5 on Debian 8.10 (jessy) ):
Iteration 0 - Memory usage (VMS): 92.5 Mb; # threads: 1
Iteration 1 - Memory usage (VMS): 308.5 Mb; # threads: 4
Iteration 2 - Memory usage (VMS): 524.6 Mb; # threads: 7
Iteration 3 - Memory usage (VMS): 740.6 Mb; # threads: 10
Iteration 4 - Memory usage (VMS): 956.6 Mb; # threads: 13
Iteration 5 - Memory usage (VMS): 1172.6 Mb; # threads: 16
Iteration 6 - Memory usage (VMS): 1388.7 Mb; # threads: 19
Iteration 7 - Memory usage (VMS): 1604.7 Mb; # threads: 22
Iteration 8 - Memory usage (VMS): 1820.8 Mb; # threads: 25
Iteration 9 - Memory usage (VMS): 2036.8 Mb; # threads: 28
Iteration 10 - Memory usage (VMS): 2252.8 Mb; # threads: 31
Iteration 11 - Memory usage (VMS): 2468.8 Mb; # threads: 34
Iteration 12 - Memory usage (VMS): 2684.8 Mb; # threads: 37
Iteration 13 - Memory usage (VMS): 2900.8 Mb; # threads: 40
Iteration 14 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 15 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 16 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 17 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 18 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Iteration 19 - Memory usage (VMS): 2972.8 Mb; # threads: 41
Any idea where this could come from?
I've been comparing how aiofiles does things compared to other libraries like aiohttp.
aiofiles runs blocking functions in threadpools using loop.run_in_executor
, while aiohttp mostly sets file descriptors to nonblocking mode, then uses loop.add_writer
/loop.add_reader
as necessary.
Why did aiofiles choose the former approach?
Right now the tempfile
module included in the Python stdlib operates on classic file objects.
Looking at the code though, it would be pretty simple to adapt it to work with aiofiles
I noticed aiofiles' unit tests use real file IO.
In my project that uses aiofiles, I think it might make more sense to unit test with faked file IO, using something like pyfakefs.
The setup code I wrote for this feels a little clunky, but it seems to work:
@pytest.fixture
def fs(request):
""" Fake filesystem. """
patcher = Patcher()
patcher.setUp()
patcher._stubs.SmartSet(threadpool, '_sync_open', patcher.fake_open)
request.addfinalizer(patcher.tearDown)
return patcher.fs
@threadpool.wrap.register(FakeFileWrapper)
def _(file, *, loop=None, executor=None):
return AsyncBufferedIOBase(file, loop=loop, executor=executor)
So there are two problems that this code solves:
Patcher
monkey-patches builtins.open
, but because aiofiles already made a reference to open
, this doesn't get replaced (contrast to referencing builtins.open
instead of _sync_open
)FakeFileWrapper
from pyfakefs I think probably should be a subclass or registered class of io.FileIO
or similar.I am unsure if I should either:
What do you think?
https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamWriter
for those more used to x.write(b"foo")
and await x.drain()
so we can do things like:
import asyncio
import aiostream
from aiofiles import pathlib
async def export_as_js(p: pathlib.Path) -> None:
t = await p.read_text()
await asyncio.gather(
p.with_suffix('.js').write_text(f'export default `{t}`;'),
p.unlink(),
)
async def amain():
await aiostream.map(pathlib.Path('.').glob('*/**.html'), export_as_js)
def main():
asyncio.run(amain())
I have this non async method
def lastname():
with open('surnames.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row[0],row[1])
that reads a row
and i have this async
async def firstname():
async with aiofiles.open('firstnames.csv') as csvfile:
readCSV = await csvfile.read()
return readCSV
How can i read a row in the async method?
Hi,
Thanks for the good library.
Can we check path existence asynchronously with this library? I mean something like this:
path_existence = await aiofiles.path.exists(some_path)
If not, I request this feature. Thanks again.
import aiofiles
print(aiofiles.__version__)
async with aiofiles.open('wctDA2.tmp', mode='r') as f:
contents = await f.read()
print(contents)
Output:
File "a.py", line 4
async with aiofiles.open('wctDA2.tmp', mode='r') as f:
^
SyntaxError: invalid syntax
Details:
I'm just poking at this - came to it through a reference on MagicStack/uvloop#1 (comment)
Would you be interested in adding support to the project for more calls from the os module like os.stat
?
e.g.
os.close
os.fstat
os.read
os.write
os.unlink
os.listdir
os.path.exists
os.rmdir
In a small bit of test code I trigger (I think) the changed behavior of aiter between Python 3.5 and 3.6. The code is this below, with the error. This is my very first use of aiofiles
(and almost my first use of asyncio
) so I'm happy to accept that I could be doing something wrong. ๐
async def query_feed(src, queue):
print('Starting read from {}'.format(src))
# The file-handle itself can be non-blocking and asynchronous
async with aiofiles.open(src) as fh:
async for line in fh:
await asyncio.sleep(2)
await queue.put(line.strip())
# print() is synchronous and completes atomically w/o control switch
print("From feed:\t", line.strip())
DeprecationWarning: 'AsyncTextIOWrapper' implements legacy __aiter__ protocol; __aiter__ should return an asynchronous iterator, not awaitable
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.