I have created a heavily threaded test rig to try and point out what I think are thread-safety problems in aws-sdk-core
. Before diving too far into investigation I wanted to throw this a bit wider to see if I'm missing something.
These tests were carried out in the following environment:
$ uname -a
Linux myitcv-virtual-machine 3.11.0-17-generic #31-Ubuntu SMP Mon Feb 3 21:52:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ ruby -v
rubinius 2.2.5 (2.1.0 e543ba32 2014-02-08 JI) [x86_64-linux-gnu]
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 70
Stepping: 1
CPU MHz: 2594.193
BogoMIPS: 5188.38
Hypervisor vendor: VMware
Virtualisation type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
L4 cache: 131072K
NUMA node0 CPU(s): 0-3
Source code for v1
of the test rig and the accompanying Gemfile
behind those links. bundle install
to get up and running.
The reason for using celluloid
is that we are building a process atop celluloid
hence the test is more fair (but admittedly not fully stripped back to bare Ruby/Rubinius)
access_key_id
etc will need to be populated before using test.rb
- The code creates a pool of 50 threads, then makes 100 async calls into that pool
- Each call makes a call to DynamoDB to list tables
- After
bundle install
, ruby test.rb
(assuming you have the right Ruby interpreter set via rbenv
etc) should be enough
- Yes, this line could be made more efficient but leaving it as such makes the thread safety problem more apparent (see later discussion about a revised versions
v2
and v3
)
This should be as vanilla as it gets, yet there are three types of exception I've been hitting. But not consistently which is what leads me to believe there's a thread safety issue. They are:
- Unrecognized properties exception
- Tuple out of bounds exception
- Invalid signature exception
Looking at the top of the call stack of exception 1, we are taken to this code. There are lots of class instance variables here which I don't believe are thread safe unless I'm missing something about how this get's called?
v2
and v3
present alternatives which create one Aws::DynamoDB
instance per thread and globally respectively. Both suffer similar issues to varying degrees. The one regularly occurring common error between all three versions is point 3 above, the invalid signature error.
Before we look any further, is there an assumed usage pattern here? i.e. should one create a single, global Aws::DynamoDB
instance, or one per thread, or per call?
Any thoughts on what the issue is here?
Are any of these issues potentially related to #43?