Giter VIP home page Giter VIP logo

thread's People

Contributors

caffeine-addictt avatar robvdl avatar shirotohu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

thread's Issues

Expose internal types

Enhancement Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Enhancement Request Related to an Issue?

To allow the type safety to extend to developers' projects, non-private types should be exposed to the main import

Describe the Solution You'd Like

Types can be access by:

import thread
thread.types.ThreadStatus

Option to disable graceful exiting

Feature Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Feature Request Related to an Issue?

When used in a Flask application ran with threading (i.e. with gunicorn), the graceful exiting kills the application thread workers. This could cause some unintended behavior like the WSGI server not shutting down properly.

Describe the Solution You'd Like

A way to disable graceful exit or a way to gracefully exit without affecting WSGI worker threads

[CI] Improvements

Checklist

  • Adding linting CI
  • Require linting to pass for a PR merge
  • Adding labeler CI

Improving runtime performance

Enhancement Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Enhancement Request Related to an Issue?

At present, a simple parallel processing takes roughly 26-30s.
image

While the synchronous equivalent takes barely half a second.
image

Why

This is because of how .kill() is implemented. Hooking to global and local trace tanks performance.

Describe the Solution You'd Like

Using the ctypes library to raise SystemExit from within the sub threads.

After switching to ctypes

image

v2.0.0

Planned Features and Changes for v2.0.0

New

Changes

  • BREAKING: Change naming of "Parallel" to "Concurrent"

[Bug] Thread type annotation does not support target with no parameters

Bug report

Your issue may already be reported!
Please check out our active issues before creating one.

Expected Behavior

Type hinting should not mark a function with no parameters as incompatible with Thread.target

Current Behavior

Type hinting marks parameter-less functions as incompatible

Is this a regression?

No.

Possible Solution

Steps to Reproduce (for bugs)

  1. write a function that takes no arguments
  2. instantiate a Thread()
  3. type hinting acts up

Context

image
image

Your Environment

  • Version used: 0.1.1
  • Python version: 3.11.6
  • Operating System and version (desktop or mobile): Linux

v1.0.0 Release

Version Release Checklist

  • I have updated the README.md file
  • I have ensured that all tests pass
  • I have incremented the version number in __init__.py
  • I have incremented the version number in pyproject.toml

ParallelProcessing decorators

Feature Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Feature Request Related to an Issue?

Since Thread() has a decorator, parallelprocessing should also have a decorator that allows users to mark a function as a data function.

Describe the Solution You'd Like

An example usage of the decorator

@thread.parallelprocess
def myFunc(dataEntry) -> Any: ...

data = myFunc([ 1, 2, 3, 4 ,5, ... ])

Similarly to @thread.threaded, @thread.parallelprocess should also support decorator parameters.

[Bug] Ignore_Errors does not support inheritance

Bug report

Your issue may already be reported!
Please check out our active issues before creating one.

Expected Behavior

As laid out in the docs, initializing Thread() with ignore_errors = [Exception] should have ignored all exceptions

Current Behavior

Currently, initializing Thread() with ignore_errors = [Exception] does not ignore all exceptions

Is this a regression?

No

Possible Solution

Steps to Reproduce (for bugs)

  1. make a function which raises an exception
  2. initialize a Thread() with ignore_exceptions = [Exception]
  3. start the thread
  4. unexpected behavior occurs

Context

Code Snippet:

from thread import Thread

def myFunction(x=False) -> str:
  raise RuntimeError()

newThread = Thread(
  target = myFunction,
  ignore_errors = [Exception]
)

newThread.start()
newThread.join()

Traceback:

Traceback (most recent call last):
  File "/main.py", line 12, in <module>
    newThread.join()
  File "/thread/thread.py", line 208, in join
    self._handle_exceptions()
  File "/thread/thread.py", line 143, in _handle_exceptions
    raise e
  File "/thread/thread.py", line 108, in wrapper
    self.returned_value = target(*args, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/main.py", line 4, in myFunction
    raise RuntimeError()
RuntimeError

Your Environment

  • Version used: 0.0.1
  • Operating System and version (desktop or mobile): Linux
  • Python version: 3.11.6

Type Safety on decorators

Enhancement Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Enhancement Request Related to an Issue?

At present, decorated functions do not have type hints
Say for this code for example

import thread

@thread.threaded
def myFunc(x: str) -> int: ...

myFunc(4) # This is not properly type hinted as "x: str", but "..."

Describe the Solution You'd Like

Decorated functions to be type hinted

Tests for deccorators

Feature Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Feature Request Related to an Issue?

Extensive testing is essential to ensure stability

Threading decorators

Feature Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Feature Request Related to an Issue?

In development, people may not want to explicitly wrap function every single time and would prefer not to use private functions with the public function wrapping the private function.

def _doWork(*args, **kwargs): ...

def doWork(*args, **kwargs):
  job = Thread(_doWork, args = args, kwargs = kwargs)
  return job

Describe the Solution You'd Like

An example of the feature

@thread.threaded
def doWork(*args, **kwargs): ...

#OR

@thread.threaded(args = ['defaultArg'], arg_mode = 'join | replace')
def doWork(*args, **kwargs): ...

CLI process command kwargs

Enhancement Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Enhancement Request Related to an Issue?

The current method for how kwargs are parsed is not the best. Miss-spelt options or arguments could lead to unintended kwargs being parsed to the function.

Describe the Solution You'd Like

Similar to how args are processed, a --kwarg option could be utilised, then processed within the command.

$ thread process ... --kwarg a1:a2  --kwarg a3:a4

Ability to kill threads

Feature Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Feature Request Related to an Issue?

No. There are some specific use-cases where a thread needss to be killed without utilizing daemon

Describe the Solution You'd Like

a .kill() method to Thread class

[Enhancement] Optimise memory performance in Parallel Processing

Enhancement Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Enhancement Request Related to an Issue?

Currently, multiple new lists are created from the dataset and are passed into each child thread.
This is not memory efficient and could limit performance in large datasets.

Describe the Solution You'd Like

A more memory-efficient solution using generators or indexes.

Issue templates have faulty links

Documentation Issue Report

Your issue may already be reported!
Please check out our active issues before creating one.

Describe the Bug

"See our active issues" links in issue templates link to the wrong repository

Steps to Reproduce

  1. Create a new documentation issue report
  2. The first link for "active issues" link to somewhere else

How to build project for local development?

How to build project for local development?

I understand that this project uses Poetry to manage and handle project dependencies, though I'm unsure how to build the project. I understand there is documentation for installing it manually. These are the steps I went through.

pipx install poetry
poetry install
poetry run python src/thread

Though when I run the project it return's this error:

Traceback (most recent call last):
  File "C:\Users\shirotohu\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\shirotohu\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Mine\Programming\thread\src\thread\__main__.py", line 3, in <module>
    from .cli import app
ImportError: attempted relative import with no known parent package
  • I am running python version 3.9.2
  • I have looked at sys.path using poetry run python and importing sys
    • Only thing to note with this output is D:\\Mine\\Programming\\thread\\src

Expected Output

image
If I'm not mistaken, this should be the expected output.

Optimizing parallelprocessing

Enhancement Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Enhancement Request Related to an Issue?

At present, the Parallel Processing class is utilizing numpy to sort datasets into its chunk form where

  • (Number of chunks == throttled max threads) <= specified max threads
  • Chunk size == ( (dataset size % throttled max threads) + (0 or 1) )

Numpy is only used for calculating the chunks, which should not be the best solution for thread as the numpy library is huge and would be impractical for this use case.

This got me thinking: Would it be more practical to drop numpy for pure python alternative or stick with numpy's C utilization?

Additional Context

To figure this out, I profiled a pure python solution with the numpy solution as found out that for a dataset of 10^6 entries:

profiling result

profilingNP.py

import time
import numpy

def profile(func):
  def wrapped(*args, **kwargs):
    iteration = 100
    total_time = 0

    for _ in range(iteration):
      start = time.perf_counter()
      result = func(*args, **kwargs)
      total_time += (time.perf_counter() - start)

    avg_time = round(total_time / iteration, 10)
    print(f'{func.__name__} took on average of {avg_time}s for {iteration} iterations')

    return result, avg_time
  return wrapped


dataset = list(range(10**6))
threads = 8

# numpy solution
@profile
def np():
  chunks = numpy.array_split(dataset, threads)
  return [ chunk.tolist() for chunk in chunks ]

@profile
def pure():
  length = len(dataset)
  chunk_count = length // threads
  overflow = length % threads

  i = 0
  final = []
  while i < length:
    chunk_length = chunk_count + int(overflow > 0)
    b = i + chunk_length

    final.append(dataset[i:b])
    overflow -= 1
    i = b

  return final


if __name__ == '__main__':
  npResult, npTime = np()
  pureResult, pureTime = pure()

  print(f'Pure python was {-1 * round(((pureTime - npTime) / npTime) * 100, 10)}% faster than the numpy solution')

  assert npResult == pureResult, 'There was an algorithm error'

[Bug] Parallel Processing function signature does not support multi keyword arguments

Bug report

Your issue may already be reported!
Please check out our active issues before creating one.

Expected Behavior

Parallel Processing's function signature should accept one or more keyword arguments to the function argument.

import thread

def my_func(a: int, b: int): int -> ...
thread.ParallelProcessing(function = my_func, dataset = [], args=(1,))

Current Behavior

It only accepts functions that take only one keyword argument

Is this a regression?

No

Possible Solution

Update function signature

Your Environment

  • Version used: main branch
  • Python version: 3.11.7
  • Operating System and version (desktop or mobile): Archlinux

[Bug] Dataset of length 0 is not validated

Bug report

Your issue may already be reported!
Please check out our active issues before creating one.

Expected Behavior

A dataset of length 0 should throw an error when initialized instead of throwing a DivisionByZeroError when .start() is invoked

Steps to Reproduce (for bugs)

  1. Initialize a ParallelProcessing object with an empty dataset
  2. Notice how no error is raised
  3. Invoke the .start() method
  4. Notice a DivisionByZeroError is raised

Your Environment

  • Version used: v1.0.1

[Feature] ParallelProcessing Support for dataframes

Feature Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Feature Request Related to an Issue?

Data processing in Python is usually done with pandas or other libraries. Datasets created with these libraries are not fully compatible with ParallelProcessing.

Describe the Solution You'd Like

We will not support any one library explicitly as it will increase the maintenance burden oh keeping up-to-date with each of their best practices and breaking changes.

It makes more sense to provide a way to customize how data and length is retrieved with optional arguments.

from thread import ParallelProcessing

ParallelProcessing(
    function=lambda x:x,
    dataset=[1, 2],
    _get_value=lambda dataset, index: dataset[index],
    _length=2
)

Progress

  • Drop explicit Sequence typing
  • Updating ParallelProcessing function signature to include the 2 arguments
  • Overloading __init__ to support optional arguments when dataset supports dataset[index] (__getitem__) and len(dataset) (__len__)
  • Overloading __init__ to require arguments when dataset does not support dataset[index] (__getitem__) and len(dataset) (__len__)
  • Overloading __init__ to require _get_value when dataset does not support dataset[index] (__getitem__)
  • Overloading __init__ to require _length when dataset does not support len(dataset) (__len__)
  • Implementing the logic
  • Updating ParallelProcessing decorator
  • Updating Documentation
  • [4/4] Add tests for ParallelProcessing

CLI Documentation

Documentation Issue Report

Your issue may already be reported!
Please check out our active issues before creating one.

Documentation needed for CLI

Separating Core and CLI

Question

Describe your question or ask for support

Should core and CLI be separated into 2 packages, with CLI requiring core as a dependency?

Reasons

The CLI is undoubtedly not as utilized and adds bloat to core library for.
This will add more libraries that developers have to watch for version releases and CVEs.

Furthermore, the CLI is not stable as due to my limited knowledge, I've had to rely on eval() which is an attack vector for arbitrary code execution. By continuing to include this as a primary dependency to thread may introduce unintended security vulnerabilities from a not-as-utilized feature.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.