Comments (7)
Related Work
- async-executor
- global queue + local queues + steal. All tasks will be firstly pushed to global queue. When a thread tries to get a task, it will find tasks from its local queue. If local queue is empty, the thread will try to steal half of the tasks from the global queue. If the global queue is still empty, then the thread will try to steal tasks from other thread's local queue (select a random thread and try to steal half of the tasks from the local queue. If failed, try next thread.)
- crossbeam-deque
- Concurrent work-stealing deques. most commonly used in work-stealing schedulers.
- Worker + Injector + Stealer. Each thread corresponds to a worker. And each worker has a corresponding stealer, this stealer can steal tasks from its worker. Injector is a queue shared by threads. Hence, worker is the local queue, injector is the global queue. get tasks from local worker's queue => get tasks from global injector's queue => get tasks from stealer (other worker's queue)
- tokio
- local queue + Inject + lifo_slot + Steal. Similar to crossbeam-deque, inject is global queue. 1. Tasks will be pushed to local queue. if local queue is full, push tasks to global queue. 2. Get tasks from the local queue first most of the time. 3. each thread has a lifo_slot, which with higher priority than local_queue. 3. If local queue is empty, steal tasks from other local queues first. if steal failed, steal from global queue.
from ngo.
The results of the discussion with @tatetian :
- idle vcpu (the thead of the async runtime) should sleep and wakeup later. When the vcpu's run_queue is empty, the vcpu need sleep some time, eg, sleep 50ms. We can use eventfd (better) or futex in occlum to implement this feature. However, when the run_queue is empty, there may be some pending tasks on this vcpu, other vcpu need to invoke wake(), insert the task to the run_queue, and wakeup the corresbonding vcpu.
- performance problem of io_uring callback. In current ngo implementation, we set a sched_callback (io_uring callback) before init async runtime, then each vcpu will invoke sched_callback (eg, poll io_uring completion queue) in each event loop. Actually, It is not necessary for all vcpu to poll io_uring queue. We can use only one vcpu, or several vcpu to poll the io_uring queue. We can deprecated sched_callback, and treat polling io_uring queue as a async task.
- [RFC] Config the number vCPUs #16 . Implement this RFC
- sleep function for user task. User task maybe want to sleep some time, eg, timer. Since async-rt is a no_std runtime, there is no time function in no_std env. We can implement sleep function in ngo libos.
from ngo.
Our Design Goals
A Rust async runtime with no_std
, for LibOS, In particular, for SGX LibOS.
Take ngo as an example, ngo need rust async runtime with following features:
- Idle thread sleep
- Load balancing
- Support priority
Idle thread sleep (Already supported)
If one thread is idle, this thread will sleep. When a new task is ready, the sleeping thread will be waked up.
Load balancing
- Assign tasks to each thread in a balanced manner
- Threads with fewer workload should execute some workload on other threads
- Try to avoid starvation
- Can not violate the cpu affinity.
Support priority
There are different kinds of tasks in LibOS, e.g. user tasks and LibOS tasks. User tasks generally have higher priority, while LibOS tasks maybe running in the background and with lower priority.
- Distinguish different kinds of tasks.
- Support priority in the scheduler
- Shutdown background tasks correctly. (background tasks may be infinite loop pattern, we need shutdown it correctly before exits)
from ngo.
Design Overview
Firstly, thanks to tokio's blog https://tokio.rs/blog/2019-10-scheduler, those ideas are very usefull.
Basic Idea
- local queues: each thread has a local queue, in particular, a bounded mpmc / spmc queue.
- global queue: a unbounded mpmc queue. Since the local queue is bounded (fixed size), the global queue is used to store tasks when the local queue is full. The implementation of global queue in tokio is a intrusive linked list guarded by mutex.
- steal: if one thread's local queue is empty, it can steal tasks from other threads' local queue and global queue.
Task
Task priority
Task priority affects how to select tasks in the thread.
- default (mid) : If not specify priority, use default. User tasks usually are default priority.
- high
- low
Task type
Task type affects how to assign task to threads.
- default : if not specify type, use default. User tasks usually are default type.
- blocking : a task that will block for a time. If one thread executes a blocking task, other tasks on this thread may starve. Hence, when accept a blocking task to one thread, we will move all tasks on this thread to global queue to avoid starvation.
- background : e.g. LibOS tasks. These tasks may be keep running until runtime shutdown.
CPU affinity
In our async runtime, the thead is called vcpu. Since user can bind threads to specific cpu set in OS, we allow user to bind tasks to specific vcpu set.
If one task is bound to a vcpu set, then the task can only run in those vcpus. vcpu outside the set can not steal the task.
Run queue
Each thread has three local queues. All threads share one global queue.
- high priority queue
- mid priority queue
- low priority queue
- global queue
from ngo.
Assign tasks to threads
When accept one task, we need to decide which thread to run the task.
In the assign algorithm, we consider following factors:
- If the task is new or not: If the task is not new (a wake up task), that is, the task has been scheduled before, we will try to schedule the task to the previous thread.
- cpu affinity: we need meet the cpu affinity.
- task type: If the type is blocking, we need try to find a thread that has relatively few tasks, and we need to move those tasks to other threads. And we need consider the cpu affinity.
- local queue lengh: according to the task's priority, we find a thread with short corresponding local queue.
- wait time: when each task is assigned, we record a assign_tick. And we record a global_tick.
wait_time = global_tick - queue.front().assign_tick
. we find a thread with short wait time.
from ngo.
Select one task to run
- high probability: select from high_priority queue.
- medium probability: select form mid_pri queue.
- low probability: select from low_pri queue.
- very low probability: select from global queue.
- If first try failed (the queue is empty), try other local queues.
- If still failed, try to steal.
Steal
- select one random thread to steal half of tasks.
- do not steal tasks with invalid cpu affinity.
- If steal failed, try next thread. (try k times)
- If steal still failed, try steal global queue.
from ngo.
Optimization: Temporarily increase priority
- If a mid_pri default_type task is wake up, insert it to high_pri quue temporarily to reduce delay.
Optimization: Throttle stealing (tokio's opt)
- Limit the max number of concurrent threads performing steal to reduce contention.
from ngo.
Related Issues (20)
- [BUG] ioctl TCGETS return and glibc throws "**** stack smashing detected ***"
- [BUG]ioctl UT crashed
- Use arc_new_cyclic
- [RFC] Memory Cleaning Threads
- [BUG] pthread case hang when running glibc test with simulation mode
- [RFC] Use Completely Fair Scheduler (CFS) in NGO
- [BUG] fstatat crashed with invalid path
- [BUG] Cannot update atime, mtime and ctime for inode
- The design of the `page-cache` crate HOT 1
- [RFC] Use PKU for isolation between LibOS and userspace program HOT 3
- Support io_uring's IORING_SETUP_SQPOLL flag HOT 1
- [BUG] Running iperf and the client side never exits HOT 1
- Improve the scalability of SyncIoDisk HOT 4
- Improve the performance of IoUringDisk HOT 1
- [RFC] Introduce Asynchronous Filesystem in NGO
- [RFC] Introduce Page Cache in NGO
- [RFC] Use Occlum configuration in yaml format in NGO HOT 3
- [RFC] Extend the SGX untrusted allocator to support untrusted device memory HOT 1
- [RFC] Introduce SwornDisk in NGO
- Steps to Enable Async-SFS + SwornDisk as the RootFS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ngo.