Comments (13)
I was able to strip the needed parts out of my existing code and created a first cut of code to demonstrate the difference of prepared statements. Please review - https://github.com/lbe/go-sql-test
As I state in the readme, something is going bad wrong with update - horrendous performance and it gets worse as the number of records in the database increases. I'm sure that I am doing something stupid here. I will look at and correct.,
The numbers for insert scenario are now closer to 10:1 better. I found some sub-optimal code in my previous 2 - 3:1 improvement.
Please take a look and let me know if this helps. I'm happy to accept any feedback and/or pull requests to make it a better test bed.
lbe
from jet.
Regarding the performance difference. I don't think your code is testing only differences without and with a prepared statement.
A side note, but still important, is that both test cases will use a prepared statement. Standard go SQL library will prepare a statement for each query before execution. So for each Query/Exec, there would be two calls to db, one to prepare and one to execute the query. With an explicitly prepared statement there would be two times fewer calls to db(one to prepare and then one execute for each query). Also, some databases like Postgres will cache the most used prepared statements.
Ordinary requests to some web server with typical time durations between each step look something like this:
User --------------> Server -----------------> Database
~10ms ~200ns ~1ms ~1ms
These times are rough estimates and can differentiate significantly depending of the closeness of the user and database server. What is important to note is that time spent on the server is on average 10,000 less than any other step. This means server computation time is negligible and not of huge importance. If this were not the case, everybody would be forced to use C/C++ or Rust for web development.
In your tests, there is no latency between the user and the server. Also, there is no latency between the server and database, because the SQLite database is accessed from the same process as the test(server). The database is empty, meaning the computation on the database would be on par with the test code. This leads to a conclusion what your tests are additionally measuring, is how much faster querying a database using raw string vs constructing the query dynamically with each request.
So, if you want to test the difference with and without explicitly prepared statements, you can compare the time between raw query with and without explicitly prepared statements, or jet query with and without explicitly prepared statements. That would probably give you around 5% speed up. Which shouldn't make any difference for common web server requests. I don't know what type of application you are building, but if those 5% or 40% percent are making a difference, maybe go is not the right language for your project.
In summary, usually, there is no need to use a prepared statement explicitly. There are only a couple of use cases in which I would consider using a prepared statement. One is if the server and database are not close, for instance not in the same AWS region. Using a prepared cached statement would save you one round trip to the database. The second use case is if there is a huge query and I want to make sure the database is using prepared statements every time, to save time on parsing the query each time on db. The third is if you are querying a database in a for loop(for instance to migrate some data).
from jet.
I've deleted previous message about prepared statement branch, to avoid confusion.
The prepared statement implementation can be found on stmt-cache
branch. To use it, you'll just need to update database opening code.
For instance, currenlty you would have something like this for sqlite:
var db *sql.DB
db, err := sql.Open(driverName, connectionString)
and you need to replace it with:
var db *sqlite.DB
sqlDB, err := sql.Open(driverName, connectionString)
db = sqlite.NewDB(sqlDB).WithStatementsCaching(true)
sqlite.DB
is a wrapper around sql.DB
and can be used in all the places where you would use sql.DB
. For instance:
err := stmt.Query(db, &dest)
This wrapper will intercept all the database queries. For new queries this wrapper will first create prepared statement and cache it, and then it will execute query. Any subsequent query will use cached prepared statement.
If you don't want to use prepared statement caching, you can save *sql.DB
also, or you can access underline *sql.DB
from *sqlite.DB
. For instance:
err := stmt.Query(sqlDB, &dest)
// or
err := stmt.Query(db.DB, &dest)
Prepared statement caching is enabled on transaction created from *sqlite.DB
as well.
from jet.
Hi @lbe ,
I see a performance increase of 2.5 - 3 times
Interesting. I've never managed to get much performance improvement from prepared statements. Although I mainly tested with postgres. How complex are your queries? And how many times is insert or update called?
Regardless, I think we should add a prepared statements to jet.
from jet.
The queries are not very complex. The insert insert/update has about 12 columns, none of them very long. My test case inserts 11K records and updates one column in the same 11K records via an upsert. The select calls each record by the key.
Since you are interested in adding prepared statements. I will create a project on GitHub that is similar to my application to demonstrate the performance differences. I will need to work on this in my spare time. I should have something available within a couple of weeks.
from jet.
I will create a project on GitHub that is similar to my application to demonstrate the performance differences.
That would be great. Thanks.
from jet.
As I state in the readme, something is going bad wrong with update - horrendous performance and it gets worse as the number of records in the database increases. I'm sure that I am doing something stupid here. I will look at and correct.,
I seems it is because of a trigger.
from jet.
Sorry for not responding sooner. Too much work and too little life :(
I will check out the prep-stmt branch. It will probably be next weekend before I can.
Please share how you determined the update performance problem is due to the trigger. When I ran the profile, I could not see what was going on in cGo. I would love to know how to gain that visibility.
Thanks! lbe
from jet.
I will check out the prep-stmt branch. It will probably be next weekend before I can.
Final version of prepared statement would be different, but you still can give this version a test.
Please share how you determined the update performance problem is due to the trigger. When I ran the profile, I could not see what was going on in cGo. I would love to know how to gain that visibility.
It was a guess. When I removed the trigger, performance went up.
from jet.
I took a look at the prep-stmt branch and saw a 4X improvement on inserts, but it is still only about 40% of the speed of the raw insert. See below;
Run without prepared statement
go run main.go -updateCount=0 -useBoth -useTransaction
2024/03/03 09:06:00 Execution Starting
2024/03/03 09:06:00 dbFilename = ./data/go-sql-test.sqlite
2024/03/03 09:06:00 dsn = ./data/go-sql-test.sqlite?cache=shared&_journal_mode=WAL&_synchronous=NORMAL
2024/03/03 09:06:01 Sort data Starting
2024/03/03 09:06:01 Sort data Ended
2024/03/03 09:06:01 Executing insertWithRawSQLUpsert
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 94311 it/s)
2024/03/03 09:06:01 Commit Start
2024/03/03 09:06:01 Commit Finished
2024/03/03 09:06:01 Executing selectWithRawSQLUpsert
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 49383 it/s)
2024/03/03 09:06:01 Commit Start
2024/03/03 09:06:01 Commit Finished
2024/03/03 09:06:01 Reset database for Jet
2024/03/03 09:06:01 Executing insertWithJet
100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 9673 it/s)
2024/03/03 09:06:02 Commit Start
2024/03/03 09:06:02 Commit Finished
2024/03/03 09:06:02 Executing selectWithJet
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 38788 it/s)
2024/03/03 09:06:02 Commit Start
2024/03/03 09:06:02 Commit Finished
2024/03/03 09:06:02 Execution Completed
Run with Prepared Statement
go run main.go -updateCount=0 -useBoth -useTransaction
2024/03/03 09:02:22 Execution Starting
2024/03/03 09:02:22 dbFilename = ./data/go-sql-test.sqlite
2024/03/03 09:02:22 dsn = ./data/go-sql-test.sqlite?cache=shared&_journal_mode=WAL&_synchronous=NORMAL
2024/03/03 09:02:23 Sort data Starting
2024/03/03 09:02:23 Sort data Ended
2024/03/03 09:02:23 Executing insertWithRawSQLUpsert
100% |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 103841 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Executing selectWithRawSQLUpsert
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 48054 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Reset database for Jet
2024/03/03 09:02:23 Executing insertWithJet
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 39444 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Executing selectWithJet
100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 37583 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Execution Completed
I don't have enough time right now to fully implement the switches to select which test to run in my test app. I quickly hacked in the code for the test above in the prep-stmt branch of my test app. The command line to run it is shown above.
One question, are you only planning on supporting the calls with context? Since database/sql supports both with and without context, I think it would be great if you supported both in a way similar to database/sql.
Thanks for improvement. I look forward to seeing more.
lbe
from jet.
Regarding the performance difference. I don't think your code is testing only differences without and with a prepared
...
data).
Sorry for not responding sooner. For some reason, I did not get a notification on GitHub.
I suspect you may be correct that the remaining 60% gap can largely be attributed to the dynamic query generation. I had not originally anticipated much overhead regarding this as I thought the pre-generated code handled this with runtime similar to code written by hand. My original thought was that once the statement was prepared, there would be very little overhead in jet creating the query for each insert, select or update. After reading your response, I realize that I am probably way off here.
Will you be adding the stmt-cache into a release in the near future?
With respect to my application, the cases where I am most concerned about speed are in loops for large initial data loads which include coordination across tables. For those cases, I have concluded that I will need to write the SQL by hand to make sure that it is as efficient at runtime as possible. After running a lot of benchmarks and analyzing in pprof, I have a code base that will reliably load 100MM records across 4 tables in 4 hours vs my original run that took almost 4 days. I will use jet for the rest of the app where where execution speeds are less critical
Lastly, from an earlier post - I was able to get the expected performance on my upserts when I dropped the trigger on the changed timestamp and modified the on conflict section of the insert to handle it. I still don't know why the trigger was so slow - a quirk of sqlite having to do with non-integer primary keys is my guess. Given that this is a bespoke database and will not be used directly outside of my application, I am not sufficiently concerned over this issue to jump into C to investigate.
Cheers, lbe
from jet.
Hi @go-jet , I have similar problems without prepared statement support. My GoLang project subscribes to a pubsub topic(approx.300 thousands of messages) and process the messages and store them to postgres table which maintains history.
Here is metrics extraction from GCP SQL Query insights, which is measured in first 1 hour time after the message processing has started
(with prepared statement - branch used 'stmt-cache')
times called: 52,340
avg exec time(ms): 88
(with out prepared statement - branch used 'master')
times called: 25,283
avg exec time(ms): 1230
When are you planning to officially release this branch 'stmt-cache'? Great if you can respond ASAP.
from jet.
Hi @kiranmylam , the next release should be around October.
In the meantime you can use stmt-cache
branch. It is now in sync with master branch.
from jet.
Related Issues (20)
- Postgis geography type shows as `postgres.ColumnString` HOT 3
- More convenient way to implement raw/custom operators HOT 5
- Support `WHERE "column" = ANY($1)` HOT 7
- Should return value with return type HOT 1
- Support `WHERE()` on `INSERT ON CONFLICT DO UPDATE` HOT 2
- Support for SQLite FTS5? HOT 2
- Missing struct fields in SELECT due to Postgres truncation of identifiers HOT 4
- Unsupported sql column for the array in postgres HOT 2
- Incorrect Docs: Generator Customization references invalid interfaces HOT 5
- `Generate` function signature doesn't match documentation HOT 1
- Is there a way to make generic db functions for all models/tables? HOT 1
- Sqlite GENERATE columns, but being used in INSERT HOT 4
- Support for keyword arguments to functions HOT 3
- Support for filtered aggregations HOT 1
- Improving Document about SELECT clause. HOT 3
- ProjectionList{ColumnList{...}}.As("some_alias") results in no columns being rendered in the SQL HOT 2
- Ability to add db tags to use pgx struct scanning HOT 2
- Support Reuse of Custom Raw Expressions HOT 4
- View models - NOT NULL columns have pointers HOT 2
- Database Name Change HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jet.