Giter VIP home page Giter VIP logo

Comments (13)

lbe avatar lbe commented on August 11, 2024 1

I was able to strip the needed parts out of my existing code and created a first cut of code to demonstrate the difference of prepared statements. Please review - https://github.com/lbe/go-sql-test

As I state in the readme, something is going bad wrong with update - horrendous performance and it gets worse as the number of records in the database increases. I'm sure that I am doing something stupid here. I will look at and correct.,

The numbers for insert scenario are now closer to 10:1 better. I found some sub-optimal code in my previous 2 - 3:1 improvement.

Please take a look and let me know if this helps. I'm happy to accept any feedback and/or pull requests to make it a better test bed.

lbe

from jet.

go-jet avatar go-jet commented on August 11, 2024 1

Regarding the performance difference. I don't think your code is testing only differences without and with a prepared statement.
A side note, but still important, is that both test cases will use a prepared statement. Standard go SQL library will prepare a statement for each query before execution. So for each Query/Exec, there would be two calls to db, one to prepare and one to execute the query. With an explicitly prepared statement there would be two times fewer calls to db(one to prepare and then one execute for each query). Also, some databases like Postgres will cache the most used prepared statements.

Ordinary requests to some web server with typical time durations between each step look something like this:

User --------------> Server -----------------> Database
          ~10ms     ~200ns         ~1ms         ~1ms

These times are rough estimates and can differentiate significantly depending of the closeness of the user and database server. What is important to note is that time spent on the server is on average 10,000 less than any other step. This means server computation time is negligible and not of huge importance. If this were not the case, everybody would be forced to use C/C++ or Rust for web development.

In your tests, there is no latency between the user and the server. Also, there is no latency between the server and database, because the SQLite database is accessed from the same process as the test(server). The database is empty, meaning the computation on the database would be on par with the test code. This leads to a conclusion what your tests are additionally measuring, is how much faster querying a database using raw string vs constructing the query dynamically with each request.

So, if you want to test the difference with and without explicitly prepared statements, you can compare the time between raw query with and without explicitly prepared statements, or jet query with and without explicitly prepared statements. That would probably give you around 5% speed up. Which shouldn't make any difference for common web server requests. I don't know what type of application you are building, but if those 5% or 40% percent are making a difference, maybe go is not the right language for your project.

In summary, usually, there is no need to use a prepared statement explicitly. There are only a couple of use cases in which I would consider using a prepared statement. One is if the server and database are not close, for instance not in the same AWS region. Using a prepared cached statement would save you one round trip to the database. The second use case is if there is a huge query and I want to make sure the database is using prepared statements every time, to save time on parsing the query each time on db. The third is if you are querying a database in a for loop(for instance to migrate some data).

from jet.

go-jet avatar go-jet commented on August 11, 2024 1

I've deleted previous message about prepared statement branch, to avoid confusion.

The prepared statement implementation can be found on stmt-cache branch. To use it, you'll just need to update database opening code.

For instance, currenlty you would have something like this for sqlite:

var db *sql.DB


db, err := sql.Open(driverName, connectionString)

and you need to replace it with:

var db *sqlite.DB

sqlDB, err := sql.Open(driverName, connectionString)

db = sqlite.NewDB(sqlDB).WithStatementsCaching(true)

sqlite.DB is a wrapper around sql.DB and can be used in all the places where you would use sql.DB. For instance:

err := stmt.Query(db, &dest)

This wrapper will intercept all the database queries. For new queries this wrapper will first create prepared statement and cache it, and then it will execute query. Any subsequent query will use cached prepared statement.

If you don't want to use prepared statement caching, you can save *sql.DB also, or you can access underline *sql.DB from *sqlite.DB. For instance:

err := stmt.Query(sqlDB, &dest)

// or

err := stmt.Query(db.DB, &dest)

Prepared statement caching is enabled on transaction created from *sqlite.DB as well.

from jet.

go-jet avatar go-jet commented on August 11, 2024

Hi @lbe ,

I see a performance increase of 2.5 - 3 times

Interesting. I've never managed to get much performance improvement from prepared statements. Although I mainly tested with postgres. How complex are your queries? And how many times is insert or update called?

Regardless, I think we should add a prepared statements to jet.

from jet.

lbe avatar lbe commented on August 11, 2024

The queries are not very complex. The insert insert/update has about 12 columns, none of them very long. My test case inserts 11K records and updates one column in the same 11K records via an upsert. The select calls each record by the key.

Since you are interested in adding prepared statements. I will create a project on GitHub that is similar to my application to demonstrate the performance differences. I will need to work on this in my spare time. I should have something available within a couple of weeks.

from jet.

go-jet avatar go-jet commented on August 11, 2024

I will create a project on GitHub that is similar to my application to demonstrate the performance differences.

That would be great. Thanks.

from jet.

go-jet avatar go-jet commented on August 11, 2024

As I state in the readme, something is going bad wrong with update - horrendous performance and it gets worse as the number of records in the database increases. I'm sure that I am doing something stupid here. I will look at and correct.,

I seems it is because of a trigger.

from jet.

lbe avatar lbe commented on August 11, 2024

Sorry for not responding sooner. Too much work and too little life :(

I will check out the prep-stmt branch. It will probably be next weekend before I can.

Please share how you determined the update performance problem is due to the trigger. When I ran the profile, I could not see what was going on in cGo. I would love to know how to gain that visibility.

Thanks! lbe

from jet.

go-jet avatar go-jet commented on August 11, 2024

I will check out the prep-stmt branch. It will probably be next weekend before I can.

Final version of prepared statement would be different, but you still can give this version a test.

Please share how you determined the update performance problem is due to the trigger. When I ran the profile, I could not see what was going on in cGo. I would love to know how to gain that visibility.

It was a guess. When I removed the trigger, performance went up.

from jet.

lbe avatar lbe commented on August 11, 2024

I took a look at the prep-stmt branch and saw a 4X improvement on inserts, but it is still only about 40% of the speed of the raw insert. See below;

Run without prepared statement

go run main.go -updateCount=0 -useBoth -useTransaction
2024/03/03 09:06:00 Execution Starting
2024/03/03 09:06:00 dbFilename = ./data/go-sql-test.sqlite
2024/03/03 09:06:00 dsn = ./data/go-sql-test.sqlite?cache=shared&_journal_mode=WAL&_synchronous=NORMAL
2024/03/03 09:06:01 Sort data Starting
2024/03/03 09:06:01 Sort data Ended
2024/03/03 09:06:01 Executing insertWithRawSQLUpsert
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 94311 it/s)
2024/03/03 09:06:01 Commit Start
2024/03/03 09:06:01 Commit Finished
2024/03/03 09:06:01 Executing selectWithRawSQLUpsert
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 49383 it/s)
2024/03/03 09:06:01 Commit Start
2024/03/03 09:06:01 Commit Finished
2024/03/03 09:06:01 Reset database for Jet
2024/03/03 09:06:01 Executing insertWithJet
 100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 9673 it/s)
2024/03/03 09:06:02 Commit Start
2024/03/03 09:06:02 Commit Finished
2024/03/03 09:06:02 Executing selectWithJet
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 38788 it/s)
2024/03/03 09:06:02 Commit Start
2024/03/03 09:06:02 Commit Finished
2024/03/03 09:06:02 Execution Completed

Run with Prepared Statement

go run main.go -updateCount=0 -useBoth -useTransaction
2024/03/03 09:02:22 Execution Starting
2024/03/03 09:02:22 dbFilename = ./data/go-sql-test.sqlite
2024/03/03 09:02:22 dsn = ./data/go-sql-test.sqlite?cache=shared&_journal_mode=WAL&_synchronous=NORMAL
2024/03/03 09:02:23 Sort data Starting
2024/03/03 09:02:23 Sort data Ended
2024/03/03 09:02:23 Executing insertWithRawSQLUpsert
 100% |███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 103841 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Executing selectWithRawSQLUpsert
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 48054 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Reset database for Jet
2024/03/03 09:02:23 Executing insertWithJet
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 39444 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Executing selectWithJet
 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (10000/10000, 37583 it/s)
2024/03/03 09:02:23 Commit Start
2024/03/03 09:02:23 Commit Finished
2024/03/03 09:02:23 Execution Completed

I don't have enough time right now to fully implement the switches to select which test to run in my test app. I quickly hacked in the code for the test above in the prep-stmt branch of my test app. The command line to run it is shown above.

One question, are you only planning on supporting the calls with context? Since database/sql supports both with and without context, I think it would be great if you supported both in a way similar to database/sql.

Thanks for improvement. I look forward to seeing more.

lbe

from jet.

lbe avatar lbe commented on August 11, 2024

Regarding the performance difference. I don't think your code is testing only differences without and with a prepared
...
data).

Sorry for not responding sooner. For some reason, I did not get a notification on GitHub.

I suspect you may be correct that the remaining 60% gap can largely be attributed to the dynamic query generation. I had not originally anticipated much overhead regarding this as I thought the pre-generated code handled this with runtime similar to code written by hand. My original thought was that once the statement was prepared, there would be very little overhead in jet creating the query for each insert, select or update. After reading your response, I realize that I am probably way off here.

Will you be adding the stmt-cache into a release in the near future?

With respect to my application, the cases where I am most concerned about speed are in loops for large initial data loads which include coordination across tables. For those cases, I have concluded that I will need to write the SQL by hand to make sure that it is as efficient at runtime as possible. After running a lot of benchmarks and analyzing in pprof, I have a code base that will reliably load 100MM records across 4 tables in 4 hours vs my original run that took almost 4 days. I will use jet for the rest of the app where where execution speeds are less critical

Lastly, from an earlier post - I was able to get the expected performance on my upserts when I dropped the trigger on the changed timestamp and modified the on conflict section of the insert to handle it. I still don't know why the trigger was so slow - a quirk of sqlite having to do with non-integer primary keys is my guess. Given that this is a bespoke database and will not be used directly outside of my application, I am not sufficiently concerned over this issue to jump into C to investigate.

Cheers, lbe

from jet.

kiranmylam avatar kiranmylam commented on August 11, 2024

Hi @go-jet , I have similar problems without prepared statement support. My GoLang project subscribes to a pubsub topic(approx.300 thousands of messages) and process the messages and store them to postgres table which maintains history.

Here is metrics extraction from GCP SQL Query insights, which is measured in first 1 hour time after the message processing has started

(with prepared statement - branch used 'stmt-cache')
times called: 52,340
avg exec time(ms): 88

(with out prepared statement - branch used 'master')
times called: 25,283
avg exec time(ms): 1230

When are you planning to officially release this branch 'stmt-cache'? Great if you can respond ASAP.

from jet.

go-jet avatar go-jet commented on August 11, 2024

Hi @kiranmylam , the next release should be around October.
In the meantime you can use stmt-cache branch. It is now in sync with master branch.

from jet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.