python-thread / thread Goto Github PK
View Code? Open in Web Editor NEWA Python threading library extension ⭐️ Star to support our work!
Home Page: https://thread.ngjx.org
License: BSD 3-Clause "New" or "Revised" License
A Python threading library extension ⭐️ Star to support our work!
Home Page: https://thread.ngjx.org
License: BSD 3-Clause "New" or "Revised" License
Your issue may already be reported!
Please check out our active issues before creating one.
To allow the type safety to extend to developers' projects, non-private types should be exposed to the main import
Types can be access by:
import thread
thread.types.ThreadStatus
The current main branch code is ready for a v0.1.2 minor release, and the docs will be updated soon before the release
Your issue may already be reported!
Please check out our active issues before creating one.
When used in a Flask application ran with threading (i.e. with gunicorn), the graceful exiting kills the application thread workers. This could cause some unintended behavior like the WSGI server not shutting down properly.
A way to disable graceful exit or a way to gracefully exit without affecting WSGI worker threads
Your issue may already be reported!
Please check out our active issues before creating one.
At present, a simple parallel processing takes roughly 26-30s
.
While the synchronous equivalent takes barely half a second.
This is because of how .kill()
is implemented. Hooking to global and local trace tanks performance.
Using the ctypes
library to raise SystemExit
from within the sub threads.
Your issue may already be reported!
Please check out our active issues before creating one.
The README.md file usage section still states "Docs soon!!"
Your issue may already be reported!
Please check out our active issues before creating one.
Type hinting should not mark a function with no parameters as incompatible with Thread.target
Type hinting marks parameter-less functions as incompatible
No.
__init__.py
pyproject.toml
Your issue may already be reported!
Please check out our active issues before creating one.
Since Thread() has a decorator, parallelprocessing should also have a decorator that allows users to mark a function as a data function.
An example usage of the decorator
@thread.parallelprocess
def myFunc(dataEntry) -> Any: ...
data = myFunc([ 1, 2, 3, 4 ,5, ... ])
Similarly to @thread.threaded
, @thread.parallelprocess
should also support decorator parameters.
Your issue may already be reported!
Please check out our active issues before creating one.
As laid out in the docs, initializing Thread() with ignore_errors = [Exception] should have ignored all exceptions
Currently, initializing Thread() with ignore_errors = [Exception] does not ignore all exceptions
No
Code Snippet:
from thread import Thread
def myFunction(x=False) -> str:
raise RuntimeError()
newThread = Thread(
target = myFunction,
ignore_errors = [Exception]
)
newThread.start()
newThread.join()
Traceback:
Traceback (most recent call last):
File "/main.py", line 12, in <module>
newThread.join()
File "/thread/thread.py", line 208, in join
self._handle_exceptions()
File "/thread/thread.py", line 143, in _handle_exceptions
raise e
File "/thread/thread.py", line 108, in wrapper
self.returned_value = target(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/main.py", line 4, in myFunction
raise RuntimeError()
RuntimeError
Your issue may already be reported!
Please check out our active issues before creating one.
At present, decorated functions do not have type hints
Say for this code for example
import thread
@thread.threaded
def myFunc(x: str) -> int: ...
myFunc(4) # This is not properly type hinted as "x: str", but "..."
Decorated functions to be type hinted
Your issue may already be reported!
Please check out our active issues before creating one.
Extensive testing is essential to ensure stability
Your issue may already be reported!
Please check out our active issues before creating one.
In development, people may not want to explicitly wrap function every single time and would prefer not to use private functions with the public function wrapping the private function.
def _doWork(*args, **kwargs): ...
def doWork(*args, **kwargs):
job = Thread(_doWork, args = args, kwargs = kwargs)
return job
An example of the feature
@thread.threaded
def doWork(*args, **kwargs): ...
#OR
@thread.threaded(args = ['defaultArg'], arg_mode = 'join | replace')
def doWork(*args, **kwargs): ...
Your issue may already be reported!
Please check out our active issues before creating one.
The current method for how kwargs are parsed is not the best. Miss-spelt options or arguments could lead to unintended kwargs being parsed to the function.
Similar to how args are processed, a --kwarg option could be utilised, then processed within the command.
$ thread process ... --kwarg a1:a2 --kwarg a3:a4
Your issue may already be reported!
Please check out our active issues before creating one.
No. There are some specific use-cases where a thread needss to be killed without utilizing daemon
a .kill()
method to Thread class
Your issue may already be reported!
Please check out our active issues before creating one.
Currently, multiple new lists are created from the dataset and are passed into each child thread.
This is not memory efficient and could limit performance in large datasets.
A more memory-efficient solution using generators or indexes.
Your issue may already be reported!
Please check out our active issues before creating one.
"See our active issues" links in issue templates link to the wrong repository
I understand that this project uses Poetry to manage and handle project dependencies, though I'm unsure how to build the project. I understand there is documentation for installing it manually. These are the steps I went through.
pipx install poetry
poetry install
poetry run python src/thread
Though when I run the project it return's this error:
Traceback (most recent call last):
File "C:\Users\shirotohu\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\shirotohu\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Mine\Programming\thread\src\thread\__main__.py", line 3, in <module>
from .cli import app
ImportError: attempted relative import with no known parent package
poetry run python
and importing sys
D:\\Mine\\Programming\\thread\\src
Your issue may already be reported!
Please check out our active issues before creating one.
At present, the Parallel Processing class is utilizing numpy to sort datasets into its chunk form where
Numpy is only used for calculating the chunks, which should not be the best solution for thread as the numpy library is huge and would be impractical for this use case.
This got me thinking: Would it be more practical to drop numpy for pure python alternative or stick with numpy's C utilization?
To figure this out, I profiled a pure python solution with the numpy solution as found out that for a dataset of 10^6 entries:
profilingNP.py
import time
import numpy
def profile(func):
def wrapped(*args, **kwargs):
iteration = 100
total_time = 0
for _ in range(iteration):
start = time.perf_counter()
result = func(*args, **kwargs)
total_time += (time.perf_counter() - start)
avg_time = round(total_time / iteration, 10)
print(f'{func.__name__} took on average of {avg_time}s for {iteration} iterations')
return result, avg_time
return wrapped
dataset = list(range(10**6))
threads = 8
# numpy solution
@profile
def np():
chunks = numpy.array_split(dataset, threads)
return [ chunk.tolist() for chunk in chunks ]
@profile
def pure():
length = len(dataset)
chunk_count = length // threads
overflow = length % threads
i = 0
final = []
while i < length:
chunk_length = chunk_count + int(overflow > 0)
b = i + chunk_length
final.append(dataset[i:b])
overflow -= 1
i = b
return final
if __name__ == '__main__':
npResult, npTime = np()
pureResult, pureTime = pure()
print(f'Pure python was {-1 * round(((pureTime - npTime) / npTime) * 100, 10)}% faster than the numpy solution')
assert npResult == pureResult, 'There was an algorithm error'
Your issue may already be reported!
Please check out our active issues before creating one.
Parallel Processing's function signature should accept one or more keyword arguments to the function
argument.
import thread
def my_func(a: int, b: int): int -> ...
thread.ParallelProcessing(function = my_func, dataset = [], args=(1,))
It only accepts functions that take only one keyword argument
No
Update function signature
Your issue may already be reported!
Please check out our active issues before creating one.
A dataset of length 0 should throw an error when initialized instead of throwing a DivisionByZeroError when .start()
is invoked
.start()
methodYour issue may already be reported!
Please check out our active issues before creating one.
Data processing in Python is usually done with pandas or other libraries. Datasets created with these libraries are not fully compatible with ParallelProcessing.
We will not support any one library explicitly as it will increase the maintenance burden oh keeping up-to-date with each of their best practices and breaking changes.
It makes more sense to provide a way to customize how data and length is retrieved with optional arguments.
from thread import ParallelProcessing
ParallelProcessing(
function=lambda x:x,
dataset=[1, 2],
_get_value=lambda dataset, index: dataset[index],
_length=2
)
dataset[index]
(__getitem__) and len(dataset)
(__len__)dataset[index]
(__getitem__) and len(dataset)
(__len__)dataset[index]
(__getitem__)len(dataset)
(__len__)Your issue may already be reported!
Please check out our active issues before creating one.
Documentation needed for CLI
Should core and CLI be separated into 2 packages, with CLI requiring core as a dependency?
The CLI is undoubtedly not as utilized and adds bloat to core library for.
This will add more libraries that developers have to watch for version releases and CVEs.
Furthermore, the CLI is not stable as due to my limited knowledge, I've had to rely on eval()
which is an attack vector for arbitrary code execution. By continuing to include this as a primary dependency to thread may introduce unintended security vulnerabilities from a not-as-utilized feature.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.