I'm having big performance issue iterating over a query, this is my code: <div cla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I have created a project with benchmark <a href="https://github.com/TomasLinhart/SQLit

Thanks! I'll take a look at this one—and <a class="user-mention notranslate" data-hove

performance issue iterating over a query,about stephencelis/sqlite.swift

Comments (47)

stephencelis commented on May 18, 2024

There are likely some low-hanging performance improvements to be made right now, especially around row iteration. I've just been optimizing the interface, first. Can we isolate a couple factors, though?

Is this a Release build, or Debug?
Performance vs. raw query. Let's find out how much more expensive the type-safe iteration is. You can run pdfDb!.trace(println) before all the above queries to set up a SQL query logger, from here, take the queries that spit out and test them with raw iteration (for _ in db.prepare(query) {}). Run both queries at least twice to account for warm-up.

If you have runnable code you can share, please email it and it should make any performance testing a bit easier.

from sqlite.swift.

violabg commented on May 18, 2024

I have changed the code to trace raw iteration, and it is much faster

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()

        pdfDb!.db!.trace(println)

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table
        start = NSDate()

        let query = "SELECT p.* , c.id as categoryid, c.name as categoryname, c.thumb as categorythumb, c.orderid as categorysortid FROM pdf p inner join categories c on c.id=p.categoryid WHERE p.languageId='\(language.id)'"
        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        start = NSDate()

        let stmt = pdfDb!.db!.prepare(query)
        for row in stmt {

        }

        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for row in query \(timeInterval2)")

        return categoriesForLanguage
    }

query -0.00111699104309082
SELECT p.* , c.id as categoryid, c.name as categoryname, c.thumb as categorythumb, c.orderid as categorysortid FROM pdf p inner join categories c on c.id=p.categoryid WHERE p.languageId='4'
for row in query -0.623710989952087

compared with the original code output:

query -0.00927197933197021
SELECT count() FROM pdf INNER JOIN categories ON (categories.id = pdf.categoryId) WHERE (pdf.languageId = '4')
query count = 88
SELECT pdf., categories.id, categories.name, categories.thumb, categories.orderid FROM pdf INNER JOIN categories ON (categories.id = pdf.categoryId) WHERE (pdf.languageId = '4')
for row in query -2.36225700378418

regarding sharing my code, I need to create a new project without all sensitive data and will email it to you.

from sqlite.swift.

stephencelis commented on May 18, 2024

Great, thanks! I have a few ideas already, but would be helpful to have a baseline to work with.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

I can also confirm it is really slow. To get 1000 rows from my database it takes over 5 seconds. When I use db.prepare then the performance is much better.

from sqlite.swift.

stephencelis commented on May 18, 2024

@TomasLinhart Can you give actual benchmarks? I'm under the impression that the difference is linear and about 4x slower at the moment, but any specifics to the contrary (with examples) would be helpful.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

@stephencelis Sure, I can provide you something. But it is not so hard to test. Just create a database with over 1000 rows and at least 15 columns and try to load everything.

from sqlite.swift.

stephencelis commented on May 18, 2024

@TomasLinhart Sure, but less time for me to have a reproducible problem the better. My personal use cases aren't dealing with this much data all at once, so it's been more of a back burner fix for me. (In fact, if you give me your use case that would be helpful knowledge, as well.) The easier you make it for me to take it off the back burner, the more likely fixes will come sooner when I have personal time to spare and allot to this part of the project. I'm also happy to take pull requests if you'd like to dig in yourself and spend time contributing to the project.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

@stephencelis Sure, I will prepare an example project. Currently I just switched into db.prepare but in future I might need more so I could dive in and do some pull requests but for now I just want to finish my hobby project.

My use case is I have data set that I want to have in memory so I can manipulate it easily.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

I have created a project with benchmark https://github.com/TomasLinhart/SQLitePerformance Just download it and run it and you will see difference in the performance.

from sqlite.swift.

stephencelis commented on May 18, 2024

Thanks! I'll take a look at this one—and @violabg's, above—when I have the chance.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

I have updated the benchmark with C API example which is 17x times faster than db.prepare and 225x faster than typed API.

from sqlite.swift.

stephencelis commented on May 18, 2024

@TomasLinhart db.prepare is fairly lightweight, but there is implicit type conversion happening there that isn't happening in your native example. Assuming your benchmarks are from a release build, we may hit the point where Swift becomes the bottleneck (and will hopefully be optimized automatically in the future). I'll check for optimizations that can be made there, as well, though.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

@stephencelis You are right, I checked your code for db.prepare and it is looking good. I tried to remove the logic for generator and then it is almost same speed as my C API example. So I guess biggest bottleneck in db.prepare is row.append but it is not possible to do much about it...

from sqlite.swift.

stephencelis commented on May 18, 2024

@TomasLinhart Were you running things in debug mode? My results:

using typed API took 6.32480198144913 seconds
using db.prepare took 0.377619981765747 seconds
using C native API took 0.0522429943084717 seconds

If I change your iteration logic to use sqlite3_column_type and sqlite3_column_{type} methods to determine the types and yield arrays of rows, there is negligible overhead:

using db.prepare took 0.388479948043823 seconds
using C native API took 0.382124006748199 seconds

        var results = [[Any?]]()
        if (sqlite3_prepare_v2(db, query.cStringUsingEncoding(NSUTF8StringEncoding)!, -1, &statement, nil)
            == SQLITE_OK) {
                let columnCount = sqlite3_column_count(statement)
                while (sqlite3_step(statement) == SQLITE_ROW) {
                    var row = [Any?]()
                    for idx in 0..<columnCount {
                        switch sqlite3_column_type(statement, Int32(idx)) {
                        case SQLITE_BLOB:
                            let bytes = sqlite3_column_blob(statement, Int32(idx))
                            let length = sqlite3_column_bytes(statement, Int32(idx))
                            row.append(Blob(bytes: bytes, length: Int(length)))
                        case SQLITE_FLOAT:
                            row.append(Double(sqlite3_column_double(statement, Int32(idx))))
                        case SQLITE_INTEGER:
                            let int = Int(sqlite3_column_int64(statement, Int32(idx)))
                            var bool = false
                            if let type = String.fromCString(sqlite3_column_decltype(statement, Int32(idx))) {
                                bool = type.hasPrefix("BOOL")
                            }
                            row.append(bool ? int != 0 : int)
                        case SQLITE_NULL:
                            row.append(nil)
                        case SQLITE_TEXT:
                            row.append(String.fromCString(UnsafePointer<CChar>(sqlite3_column_text(statement, Int32(idx))))!)
                        case let type:
                            assertionFailure("unsupported column type: \(type)")
                        }
                    }
                    results.append(row)
                }
                sqlite3_finalize(statement)
        }

Assuming you add logic to your C API example that handles the underlying types at all, I imagine there is little to no performance hit using SQLite.swift.

Given that, the typed interface is still weighing in at about 21x less performant in this case, which is better than 225x, but slow enough to warrant investigation and cleanup when I can.

Meanwhile, you should be able to batch/stream the results into your interface to avoid such big delays and still be able to use the typed interface.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

Yeah, I was, sorry about that. So db.prepare is fast enough, check my last comment, biggest overhead is the append but there is not much we can do about that.

So only problem is only the typed interface. 😄

from sqlite.swift.

stephencelis commented on May 18, 2024

@TomasLinhart Ah, good to hear you're seeing the same results as me now :)

In the end I think the typed interface will become more performant as Swift improves its own performance, but I do believe in making optimizations in SQLite.swift wherever possible, so I'll try to dig in sometime soon!

from sqlite.swift.

stephencelis commented on May 18, 2024

@TomasLinhart Actually, the point about append is a good one. I've seen cases where things like reduce and map are slower, but map actually improves the performance here quite a bit:

using db.prepare took 0.273506999015808 seconds
using C native API took 0.397485971450806 seconds

I'll commit this optimization (and more) when I have more time to dig in. Thanks!

from sqlite.swift.

stephencelis commented on May 18, 2024

Also quickly cached columnNames in the QueryGenerator and we're down to 3.4x the slowdown of the original db.prepare and 5x with the new improvements.

using typed API took 1.3848859667778 seconds
using db.prepare (old) took 0.388479948043823 seconds
using db.prepare took 0.279254972934723 seconds

This may be the easiest win for now. I'm not sure if you want to dig in for any more optimizations.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

Hehe, I was also digging down and discovered same problem 😄

from sqlite.swift.

stephencelis commented on May 18, 2024

One last improvement is an internal restructuring of how Row stores its data:

using typed API took 0.305689036846161 seconds
using db.prepare (new) took 0.307142019271851 seconds

Iteration should basically be identically performant, now.

OK! I'll push these fixes but then I really need to get back to work 💨

from sqlite.swift.

TomasLinhart commented on May 18, 2024

Good job, I am glad the problem was discovered. Looking forward to your commit! 😃

from sqlite.swift.

stephencelis commented on May 18, 2024

Closed by:

from sqlite.swift.

stephencelis commented on May 18, 2024

Thanks for your help, @TomasLinhart!

from sqlite.swift.

violabg commented on May 18, 2024

Hi Stephen,
I don't know if you had any chances to look at my sample code, but the last commit did not improve much for me.
Also when I try this:

let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table

let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid])
.join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId])
.filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

it breaks saying :
fatal error: no such table: "pdf": file /workspace/apple/ios8/SQLitePerformance-master/SQLite.swift/SQLite Common/Database.swift, line 335

with it didn't happen before

from sqlite.swift.

stephencelis commented on May 18, 2024

@violabg That sounds like an unrelated issue. What does pdfDb!.pdfsTb.table look like? If you run pdfDb!.run("select count(*) from pdfs") (or whatever your PDFs table name is) you should get the same error. It sounds like the table doesn't exist yet.

from sqlite.swift.

violabg commented on May 18, 2024

let PDFs = pdfDb!.pdfsTb.table
let categories = pdfDb!.categoriesTb.table
println(PDFs.count) //prints 467
println(categories.count) //prints 13
let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        var categoriesDictionary = [Int: PdfCategory]()
        var currentCategory:PdfCategory

        for row in query {

here I get the error, when it try to enter the for in loop
fatal error: no such table: "pdf": file /workspace/apple/ios8/SQLitePerformance-master/SQLite.swift/SQLite Common/Database.swift, line 335

wich is table:
let PDFs = pdfDb!.pdfsTb.table

but it works here
println(PDFs.count) //prints 467
println(categories.count) //prints 13

from sqlite.swift.

stephencelis commented on May 18, 2024

Very strange. The error is coming from SQLite, so the query must be getting malformed somehow. Can you email me the latest version of your code so that I can better troubleshoot?

from sqlite.swift.

kfmfe04 commented on May 18, 2024

Aside: just wanted to throw in another sample from my personal project.

+65% longer for a flat-out table scan when using typed versus raw (8.54s vs 5.16s).

FWIW, I have been using flat-out binary block dumps for large arrays (10k) of structs/fixed-data for speed (like a column-based DBMS), but I am migrating to sqlite for ease-of-maintenance and flexibility.

        /// typed: db_read -8.54326200485229
        let tbl = GGGTable.CardAnnotation
        for r in tbl {
            if var s = notes[ r[e_cid] ]
            {
                s.unixtime = r[e_show_me_on]
            } else {
                notes[ r[e_cid] ] = CardAnnotation( unixtime: r[e_show_me_on] )
            }
        }

        /// raw: db_read -5.16345697641373
        let db = GGGDatabase.db
        let stmt = db.prepare( "SELECT * FROM cardannotation" )
        for r in stmt {
            let idx = r[0] as Int
            let ut  = r[1] as Int
            if var s = notes[ idx ]
            {
                s.unixtime = ut
            } else {
                notes[ idx ] = CardAnnotation( unixtime: ut )
            }
        }

from sqlite.swift.

stephencelis commented on May 18, 2024

@kfmfe04 And this is in a release build, not debug?

Regardless, the difference isn't too bad, considering. Certain optimizations are going to come with time as Swift improves. In the meantime there will be a tradeoff in speed vs. safety and code clarity.

SQLite.swift's type-safe interface should be pretty speedy when you use SQLite itself for querying and scoping the data you're working with, but if you're loading a large dataset into memory, you're probably going to want to optimize as you did, dipping down into the raw API with fewer generics.

I'm hoping that Swift's performance improves over time and the gap you're seeing narrows quite a bit.

I'm also open to pull requests that continue to improve performance with the type-safe interface, so feel free to dig in if you have ideas! I'll also keep performance in mind as I work with things.

from sqlite.swift.

stephencelis commented on May 18, 2024

@violabg Looks like the issue you've stumbled upon is a table quoting bug. I'll try to fix it soon!

from sqlite.swift.

violabg commented on May 18, 2024

thanks Stephen,
once you fix it, could you test the performance of my code and see if they match your results?

from sqlite.swift.

stephencelis commented on May 18, 2024

@violabg Fix pushed here: 1f070f7

I'm not sure how to check the performance difference, though. Please let me know how it's improved!

from sqlite.swift.

violabg commented on May 18, 2024

Thanks Stephen, the problem is fixed now and this are my test results:

func getPdfWithCategory(language:Language) -> CategoriesForLanguage {
        let categoriesForLanguage = CategoriesForLanguage()
        categoriesForLanguage.laguangeId = language.id

        let PDFs = pdfDb!.pdfsTb.table
        let categories = pdfDb!.categoriesTb.table

        self.start = NSDate()
        let query = PDFs.select(PDFs[*], categories[pdfDb!.categoriesTb.id], categories[pdfDb!.categoriesTb.name], categories[pdfDb!.categoriesTb.thumb], categories[pdfDb!.categoriesTb.orderid]).join(categories, on: categories[pdfDb!.categoriesTb.id] == PDFs[pdfDb!.pdfsTb.categoryId]).filter(PDFs[pdfDb!.pdfsTb.languageId] == language.id)

        let timeInterval:NSTimeInterval = start!.timeIntervalSinceNow
        println("query \(timeInterval)")

        println("query.count \(query.count)")

        var categoriesDictionary = [Int: PdfCategory]()
        var currentCategory:PdfCategory

        self.start = NSDate()
        for row in query {
//            
        }
        let timeInterval2:NSTimeInterval = start!.timeIntervalSinceNow
        println("for \(timeInterval2)")

        return categoriesForLanguage
    }

on mac:
query -0.00147002935409546
query.count 87
for -0.021390974521637
timeInterval -0.0214470028877258

on ipad 4:
query -0.0244329571723938
query.count 87
for -0.350589990615845
timeInterval -0.350932002067566

iteration over 87 records fills a little slow, I have ported this project from Objective C and is much faster.
Probably as you said, Swift is the bottleneck.

from sqlite.swift.

kfmfe04 commented on May 18, 2024

that was a debug build - ok - I have a more pressing issue

I do a table scan something like

    let sentences = db[ "sentences" ]
    let id              = Expression<Int>("id")
    let entry     = Expression<String>("entry")
    let address   = Expression<String>("address")
    let tokens = Expression<String>("tokens")
    for r in sentences
    {
    // ....
    }

for about 8,000 records, this takes 3 minutes on an iPod touch. Parsing a pipe-delimited ASCII file equivalent takes 25 seconds. Is this within range of expectations or should a table-scan be faster?

Maybe I should try passing in a "SELECT" statement instead and see how fast it is?

from sqlite.swift.

stephencelis commented on May 18, 2024

@kfmfe04 What are you doing with the data as you iterate? If you're doing empty iteration, it's not a fair comparison because SQLite.swift has type-handling code built in. If you're working with such a large dataset all at once, you may need to optimize by ditching the type-safe interface for something lower-level. Or better yet, limit your queries to only the data you need, and use aggregate functions for calculations across the dataset.

from sqlite.swift.

androidcn commented on May 18, 2024

so slow ....
using
for about 2000rows

from sqlite.swift.

stephencelis commented on May 18, 2024

@androidcn Please try to constructively contribute to the conversation and read this thread thoroughly (check my questions to others throughout, including whether you've compiled for "Release" or not, and what device you're running on).

In short:

The basic iteration API should be as fast as interfacing with libsqlite3 directly, while the typed API is as fast as adding type-checking logic for each row. You're likely hitting a Swift or SQLite bottleneck.
In the end, iterating over 2,000 rows seems a bit heavy-handed when SQLite provides aggregation APIs and when you can scope/batch your queries.

from sqlite.swift.

androidcn commented on May 18, 2024

@stephencelis but i want to use typed API

from sqlite.swift.

stephencelis commented on May 18, 2024

@androidcn The typed API is as fast as manually-typed, so please do. It will improve with Swift. Scope your queries for performance.

from sqlite.swift.

stephencelis commented on May 18, 2024

An update to those listening: Swift 1.2 performs 2x+ faster for me. If you end up re-running your benchmarks (off the swift-1-2 branch), please share your experiences.

from sqlite.swift.

androidcn commented on May 18, 2024

i will check it out later
On Sat, 14 Feb, 2015 at 9:30 am Stephen Celis [email protected]
wrote:

An update to those listening: Swift 1.2 performs 2x+ faster for me. If you
end up re-running your benchmarks (off the swift-1-2 branch), please
share your experiences.

Reply to this email directly or view it on GitHub
#23 (comment)
.

from sqlite.swift.

violabg commented on May 18, 2024

I have tested my code with branche Swift 1.2.
I get 1,3 sec.
with Swift 1.1 I get 6,9 sec.

using raw sql I get 0,28 sec. with Swift 1.2
and 0,98 sec. with Swift 1.1

from sqlite.swift.

stephencelis commented on May 18, 2024

Thanks @violabg! Hopefully things will continue to improve steadily.

from sqlite.swift.

TomasLinhart commented on May 18, 2024

A little bit OT. When will you merge the Swift 1.2 branch into master, @stephencelis ?

from sqlite.swift.

stephencelis commented on May 18, 2024

@TomasLinhart per #62:

This PR is open to track changes against the Swift 1.2 beta. It won't be merged till Swift 1.2 goes GM, but will be continuously rebased onto master.

from sqlite.swift.

sehgalsaheel commented on May 18, 2024

Is the limit set to 1000? Or is there a way to do more than 1000 (8000 to be specific)

from sqlite.swift.

stephencelis commented on May 18, 2024

@sehgalsaheel Have you tried it? Have you had problems? You should have the same ability, limit-wise, as you'd have with raw sqlite3. If the device can't handle 8,000, you should be batching.

from sqlite.swift.

performance issue iterating over a query about sqlite.swift HOT 47 CLOSED

Comments (47)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent