Comments (9)
For the staging site that kept running into issues yesterday:
DB_POOL = 30
RAILS_MAX_THREADS = 10
GRAPHITI_CONCURRENCY_MAX_THREADS = 3
WEB_CONCURRENCY = 2
It's running on a VPS and when inspecting the process list it didn't seem like any processes had CPU spiked, or anything else that might usually jump out at me as a problem. I wasn't sure what I should be looking for in spotting any hung threads from the command line, even. Any pointers there?
I can give #472 a try this week sometime. Appreciate the follow up
from graphiti.
Sure I can do that. There's no (real) limit to the threads that the released version of graphiti can use. Instead the problem is that the unbounded threads eventually can exhaust the available database connection pool as ActiveRecord has a 1 thread = 1 connection policy.
For this failure example, we have a puma thread pool of 1 and a database connection pool of 1:
- Thread 1 - The parent resource is accessed.
- Thread 1 - The parent resource loads data from ActiveRecord. ActiveRecord leases the thread a connection from the connection pool. The database connection pool is now empty
- Thread 1 - The parent resource creates Thread 2 to load the
:authors
resource - Thread 1 - waits on thread 2.
- Thread 2 - Attempts to load author data from ActiveRecord. AR attempts to lease a database connection to the thread but can't because thread 1 has the only available connection.
- Thread 2 - raises ActiveRecord::ConnectionTimeoutError "all pooled connections were in use".
In the real world this is less likely to happen because there have to be enough web threads running at the same time to exhaust the database pool.
For example, consider we have 5 puma threads and a database connection pool of 5:
- Thread 1 - The parent resource is accessed.
- Thread 1 - The parent resource loads data from ActiveRecord. ActiveRecord leases the thread a connection from the connection pool. There are 4 connections left in the pool
- Thread 1 - The parent resource creates Thread 2 to load the
:authors
resource - Thread 1 - waits on thread 2.
- Thread 2 - Attempts to load author data from ActiveRecord. ActiveRecord leases the thread a connection from the connection pool. There are 3 connections left in the pool
- Thread 3 - a new request comes in. ActiveRecord leases the thread a connection from the connection pool. There are 2 connections left in the pool
- Thread 2 - finishes. The connection is made available to the pool again. There are 3 connections left in the pool
- Thread 1 - finishes waiting. The connection is made available to the pool again. There are 4 connections left in the pool
And so on. If someone has limited traffic, limited sideloads, or a large database connection pool, they may never see this issue.
A naive way to solve it is to just increase the database connection pool but the AR error will raise its head eventually with enough traffic and sideloads.
from graphiti.
@MattFenelon I was adjusting values for db pool, max threads, and web_concurrency today when I ran square into this exact problem! I never ran into it before because of how my database.yml is setup where RAILS_MAX_THREADS
wasn't being used for the pool value, where instead I had a separate DB_POOL
var defined (and set at 40). Others might have a similar setup, thereby never running into it
![image](https://private-user-images.githubusercontent.com/8379/337783685-5e00eb80-e37f-4ea8-a5a0-d2b727d8f474.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkyNjM4OTksIm5iZiI6MTcxOTI2MzU5OSwicGF0aCI6Ii84Mzc5LzMzNzc4MzY4NS01ZTAwZWI4MC1lMzdmLTRlYTgtYTVhMC1kMmI3MjdkOGY0NzQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDYyNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA2MjRUMjExMzE5WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Mzk4NDAyOGNkNWIxOWYxYjdkNDMyYTI5YjZjMjYwMGNiZTBiOGI2MmI4ZWRhMzg4NzYyMTg1MTZmZjA1MWM5MyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.-riUTlmWAVp6ySdqSj0UMGbPFbS7ttEp8gQYg5y52sU)
from graphiti.
๐ This issue has been resolved in version 1.6.0 ๐
The release is available on:
v1.6.0
- GitHub release
Your semantic-release bot ๐ฆ๐
from graphiti.
@MattFenelon moving our convo from #470 into here.
In short I'm still having a ton of trouble with web server lockups after this change and it's hard to figure out what exactly is going onโฆ but it seems like without this change everything is a-ok. My DB pool values and the sideload thread max values seem to be reasonable, and yet sometimes things will just hang causing the webserver to not respond to requests.
I think we need to roll this thing back and investigate further on an unreleased branch. The part that worries me most is that there's no error message or any sort of clue that this is the issue, meaning there could be people out there with apps breaking without a clear cut cause. Also for me in order to get things running properly again it has required a server restart, which is a real bummer.
I'm going to put out a new patch release without these changes, and then maybe we can work together to try and resolve what's going on by working off a branch?
from graphiti.
Sorry to hear that @jkeen. I see you've reverted, that makes sense until we can clear this up.
In terms of diagnosing:
- What're your puma thread and workers set to?
- Did you use the default concurrency_max_threads setting?
- What's
pool
set to in database.yml? - Does your database have a connection limit? If so, what is it?
- How many dynos/containers/vms do you have running?
from graphiti.
The fact that it's locking up completely makes me think it's a deadlock related to the use of the Mutex. I've gone with a different approach in #472 but I haven't been able to test those changes yet. Do you want to give it a try? I can test them in about a week or so.
from graphiti.
I think I've found the issue. It only seems to happen when side-loading children of child resources. Does that tally up with the side loading you're doing, e.g. include=resource1.resource2(.resource3)+
? I don't think this happens when the includes are only 1 level deep, e.g. include=resource1,resource2,(resource3)+
.
For the sake of an example, let's imagine a thread pool of 1 and an unbounded queue:
- The parent resource creates Promise 1 to load child-resource
:authors
- Waits for Promise 1 to finish
- Thread 1 is leased to run Promise 1. There are no more threads available in the pool.
- Thread 1 - Promise 1 - the authors resource creates Promise 2 to load child-resource
:books
- Thread 1 - Promise 1 - waits for Promise 2 to finish
- Promise 2 can't run because the thread pool is exhausted. Thread 1 can't be released because Promise 1 is waiting on Promise 2. This is a deadlock.
Though it's technically a deadlock the graphiti code uses sleep to wait for the promises so it manifests itself as a hang. In my PR I replaced the sleep with Concurrent::Promise.zip(*promises).value!
, which connects the threads together, allowing ruby to detect the deadlock and raise a fatal exception No live threads left. Deadlock?
.
I think it can be fixed by making the child resource sideload logic non-blocking. That would remove the need for the child resources to wait, which would then free up the threads to be used for other promises in the queue. But it probably needs some kind of queue to allow the parent resource to know when all child resources are loaded.
- The parent resource creates Promise 1 to load child-resource
:authors
- Waits for all promises in the queue to finish
- Thread 1 is leased to run Promise 1. There are no more threads available in the pool.
- Thread 1 - Promise 1 - the authors resource creates Promise 2 to load child-resource
:books
- Thread 1 - Promise 1 finishes. Thread 1 is added back to the thread pool.
- Thread 1 is leased to run Promise 2.
- Thread 1 - Promise 2 - the books resource is loaded. There are no other resources to load.
- Thread 1 - Promise 2 finishes.
- The parent resource stops waiting as all promises in the queue have finished
- The parent resource returns. All resources have been loaded at this point.
A few ifs there but I'm going to try this approach and see how I get on.
I've added a successfully failing test case that shows the deadlock: e4de93e
from graphiti.
And just for the sake of clarityโhow is that scenario you outlined working now in the released version of graphiti?
I guess what I'm trying to understand is if the strategy you're describing works, does it yield in performance gains, or does it yield a more safe approach to what is currently happening by default?
from graphiti.
Related Issues (20)
- sideposting - associating same newly created record with multiple records that are also being created
- Sideposting- includes don't show when test with Postman
- Polymorphic has_one and has_many doesn't respect custom polymorphic_name
- Data access questions HOT 1
- Missing next page link when the resource is a postgres table has json columns HOT 1
- Proposal to allow for overriding relationship getter HOT 1
- Record ID for authorization
- Guide - Outdated external link
- can not call method with args including keyword inside attribute/extra_attribute
- Problem in namespacing controllers HOT 2
- Template for tutorials are broken
- Parsing filter values with {{}} has unwanted side effects.
- Resources with circular relationships fail to load
- Can not call method with key args inside extra_attribute/attribute block HOT 4
- my_resource.data raises Graphiti::Errors::RecordNotFound in #create action after Graphiti::Resource#build(params) when the table is empty HOT 5
- Deep Query Filter HOT 1
- Not compatible with Rails 7.1 HOT 3
- The automated release is failing ๐จ
- The automated release is failing ๐จ
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphiti.