Giter VIP home page Giter VIP logo

Comments (15)

arp242 avatar arp242 commented on June 21, 2024

How are you running the server instance? Are there any errors in the log for that?

from goatcounter.

ValdikSS avatar ValdikSS commented on June 21, 2024

How are you running the server instance?

./goatcounter-v2.4.1-linux-amd64 serve -tls http -listen :80, with only single site and api key added.

Are there any errors in the log for that?

There are errors, but they appear sporadically. It's either this one with the full sent buffer before these lines,

May 17 20:17:05 api-import: ERROR: Site.UpdateFirstHitAt: zdb.Exec: context canceled
        zgo.at/goatcounter/v2.(*Site).UpdateFirstHitAt()
                zgo.at/goatcounter/v2/site.go:297
        zgo.at/goatcounter/v2/handlers.api.count()
                zgo.at/goatcounter/v2/handlers/api.go:619
        zgo.at/goatcounter/v2/handlers.addcsp.func1.1()
                zgo.at/goatcounter/v2/handlers/mw.go:274
        zgo.at/goatcounter/v2/handlers.addctx.func1.1()
                zgo.at/goatcounter/v2/handlers/mw.go:205

or this one

May 17 20:08:33 api-import: ERROR: Site.UpdateFirstHitAt: zdb.Exec: database is locked
        zgo.at/goatcounter/v2.(*Site).UpdateFirstHitAt()
                zgo.at/goatcounter/v2/site.go:297
        zgo.at/goatcounter/v2/handlers.api.count()
                zgo.at/goatcounter/v2/handlers/api.go:619
        zgo.at/goatcounter/v2/handlers.addcsp.func1.1()
                zgo.at/goatcounter/v2/handlers/mw.go:274
        zgo.at/goatcounter/v2/handlers.addctx.func1.1()
                zgo.at/goatcounter/v2/handlers/mw.go:205
 {firstHitAt="2023-05-15 00:02:16 +0000 UTC" site=1}
May 17 20:08:33 cron: ERROR: site 1: cron.updateHitStats: zdb.TX fn: delete: zdb.Exec: data
base is locked

from goatcounter.

ValdikSS avatar ValdikSS commented on June 21, 2024

Check the email, I've mailed you the log file.

from goatcounter.

arp242 avatar arp242 commented on June 21, 2024

Thanks, that's helpful. I think I have an idea why this is happening, but it may be a few days before I can look at it.

In the meanwhile, setting the site's created_at to sometime before your first pageview should fix it (no guarantees but hopefully...) – stop the server and run something like:

./goatcounter-v2.4.1-linux-amd64 db query "update sites set created_at='2015-01-01 00:00:00'"

Not sure that will fix it, but there's a chance.

I'll take a proper look and fix after I've gone through some other things.

from goatcounter.

ValdikSS avatar ValdikSS commented on June 21, 2024

Nope, unfortunately still the same.

from goatcounter.

arp242 avatar arp242 commented on June 21, 2024

Okay, well, worth a punt 😅

from goatcounter.

arp242 avatar arp242 commented on June 21, 2024

Actually that should have been the first_hit_at column, not created_at.

./goatcounter-v2.4.1-linux-amd64 db query "update sites set first_hit_at='2015-01-01 00:00:00'"

from goatcounter.

ValdikSS avatar ValdikSS commented on June 21, 2024

No, still hangs :(

from goatcounter.

arp242 avatar arp242 commented on June 21, 2024

Should be fixed with the above commit: I can import the logfile you sent me without problem.

Note you may want to (temporarily) increase the ratelimit a bit when importing big files, for example 200 requests (100 pageviews/request)/5 seconds:

goatcounter serve -ratelimit api-count:200/5 [..]

from goatcounter.

ValdikSS avatar ValdikSS commented on June 21, 2024

Thanks, the patch seem to solve the issue. However, importing still takes quite a bit amount of time.
7m9.548s took import of a 162 MB file with 1 million access log lines to a db in tmpfs.

from goatcounter.

arp242 avatar arp242 commented on June 21, 2024

The current method is not super-optimized; it could be made a lot faster by omitting the whole HTTP API and inserting it directly in the database. All of that adds quite a bit of overhead. It also doesn't batch things too much.

The big advantage of the current method is that it requires very little code and handling surrounding database locks, and you can safely ingest pageviews from the regular /count endpoint at the same time without the batch import affecting anything else (which is important if you're running a bunch of sites, like goatcounter.com). This is why there is a ratelimit in the first place.

For your use case, you probably don't care about any of this though. A "batch import" functionality might be useful, but thus far no one really seemed to have needed it (or if they did, they never told me).

from goatcounter.

ValdikSS avatar ValdikSS commented on June 21, 2024

I have a fairly loaded web service, which is not quite a web site, that's why I'm looking for a web analytics system which can parse the access logs and does not only use js/pixel tracking, as it's both not possible to use it in my case and not really needed.

I know only several projects which analyze logs: AWstats, which is very old analytics-wise (could not detect modern versions of browsers, operating systems), goaccess, which is intended to be used for one-shot log analytics really (not for historical data), and your goatcounter, which I decided to give a try, because it has historical data and basically everything I need.

I have about 10-11 millions of daily hits in access log, which would take a fairly long amount of time to load to goatcounter. I don't need a realtime parsing and could run import once a day, but it would take quite amount of time, compared to my awk/shell aggregation scripts which take 3-5 minutes only but are not convenient.
If batch processing could be implemented, that would be great!

from goatcounter.

arp242 avatar arp242 commented on June 21, 2024

Is there are reason you're not just tailing the logfiles (goatcounter import -follow) other than that you don't need it? That's how it was kind of intended to be used; then the load is spread out over the entire day.

I can see if there's some low-handing fruit that can be optimized; but I don't expect it will be as fast 3-5 minutes as with your awk script.

from goatcounter.

arp242 avatar arp242 commented on June 21, 2024

It turned out the importer would sleep for a second every 5,000 pageviews; I think I put that in way early when I created this to prevent overloading the server, but I should have taken it out after I put in the ratelimiter 😅 Classic speed-up loop,

Anyway, I tested on the logfile you emailed me, I cat'd that to be a million lines:

% wc -l logfile.log
10001 logfile.log
% for i in {1..100}; cat logfile.log >>1mil.log
% wc -l 1mil.log
1000100 1mil.log
% ls -lh 1mil.log
-rw-r--r-- 1 martin martin 162M May 21 10:22 1mil.log

Without any changes it takes about 4.5 minutes:

19.47s user 3.19s system 8% cpu 4:36.68 total

And by just removing the sleep it's reduced to about a third:

20.63s user 3.07s system 21% cpu 1:48.01 total

So that's not a bad win.

I ran the server with -ratelimit api-count:1000/1; the syntax is "max number of requests / over period in seconds". This effectively disables the rate-limit; if you get timeouts you may want to set it a bit lower.

I looked at some other things, but most time is spent in encoding and decoding the JSON, checking for bots, parsing the User-Agent, GeoIP lookup, cleaning up the referrals, etc. There isn't anything super-obvious that can be improved there. goaccess does about 50k pageviews/second on my system, but it doesn't persist anything to disk so that's not comparable – if I exclude the time it takes to persist and store to disk it's about 35k/second for GoatCounter, so a bit slower but not too much.

I did test with PostgreSQL, rather than SQLite. With these numbers of pageviews I really recommend using PostgreSQL, as it's so much faster. Persisting 5,000 pageviews with SQLite takes about 10 seconds on my system (you can see the timings if you run the server with -debug=cron); on PostgreSQL it takes about half a second. Quite a big difference. There are some benchmarks at https://github.com/arp242/goatcounter/blob/master/docs/benchmark.markdown (which is just for the dashboard).

from goatcounter.

ValdikSS avatar ValdikSS commented on June 21, 2024

Is there are reason you're not just tailing the logfiles (goatcounter import -follow) other than that you don't need it?

The server is quite loaded during the day, I would like to add additional load only at night.

Thanks for the patches, I'll try it.

from goatcounter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.