Comments (12)
Just because it's not 100% reliable doesn't mean it's not useful. It's mostly accurate and gives a good indication of which browsers people are using, which is useful in making decisions about browser support and the like.
I don't know what the future will hold. I know about Google's recently announced plans (linked above) but older browsers won't implement that, and it's especially useful to see if people are using older browsers. I suspect it will still be useful for several years to come.
from goatcounter.
uasurfer also doesn't seem that great; on a few test runs I got a lot of wrong data; see: 4143a04
from goatcounter.
The problem here is a legal/ethical one @ptman, not a performance/space one. Storing the full User-Agent header makes it easier to identify persons based on the statistical data, and I'd like to make that harder when possible.
Just "Firefox 72" is both useful and quite anonymous, but Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0
leaks my OS as well, and especially a lot of mobile browsers send a ridiculous amount of information detailing the OS version, device model, device build version, and language. Here's an example of that:
Mozilla/5.0 (Linux; U; Android 9; fr-fr; Redmi Note 8 Pro Build/PPR1.180610.011) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/71.0.3578.141 Mobile Safari/537.36 XiaoMi/MiuiBrowser/11.1.7-g
It's ridiculous that they're sending this in the first place, but that's not something in my power to fix.
I considered normalizing as well; for example we can probably get away with removing the data between parent ((X11; Linux x86_64; rv:72.0)
) for all User-Agent strings, which would already be an improvement as a lot – though not all – excessive information is contained in there, but I need to run tests and see how well that works out.
So in short, I'm not 100% sure yet what the best solution is here yet.
from goatcounter.
Another possible project: https://github.com/ua-parser/uap-go
from goatcounter.
https://github.com/matomo-org/device-detector
from goatcounter.
I would seriously recommend still storing the full UA header/string. But maybe store it normalized. E.g. a reference from requests to ua table to save space. The UA table will probably end up fully cached.
from goatcounter.
UA strings are useful for debugging and also for grouping different clients that ignore cookies. It's data sent willingly from the browser, not something you have to go digging around to extract. Operating systems can make a huge difference in browser behaviour. And it's something that by default ends up in httpd logs. I understand the desire for privacy, but I would just store the whole UA string. Especially since they have been tricky to parse in the past and can be tricky to parse in the future.
from goatcounter.
Yeah, I appreciate there are advantages to storing it as well, which is why that is what GoatCounter is doing now. It's a bit of a tricky balancing act. Aside from that "the right thing" to do here, there is also the legal aspect to consider; the GDPR specifically mentions:
Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them
Does this cover these kind of User-Agent strings? Possibly.
data sent willingly from the browser
I don't think most users have knowledge that the full device info and language is being sent.
from goatcounter.
I'm not a GDPR lawyer, but UA strings are ok in logs, AFAIK. GDPR allows processing information for different purposes. One being consent. But logs aren't processed based on consent. It probably "for legitimate interests of data controller", i.e. technical maintenance, troubleshooting, debugging etc. One could argue that UA strings are an old technical debugging device that helps with maintenance. E.g. identifying scrapers etc.
from goatcounter.
Yeah, maybe. I think with the lack of case law and inconsistent interpretations right now no one can really tell how it applies here for sure.
from goatcounter.
https://groups.google.com/a/chromium.org/forum/m/#!msg/blink-dev/-2JIRNMWJ7s/yHe4tQNLCgAJ
I was aware of client hints, but no idea things were going to move this fast...
from goatcounter.
UA strings were never reliable and will not be very relevant in the near future.
from goatcounter.
Related Issues (20)
- db size over time HOT 5
- failed to migrate 2022-11-15-1-correct-hit-stats HOT 7
- Thousands separator setting doesn't seem to save HOT 3
- Chrome - /count is no longer considered an image HOT 3
- ERROR when setting up my page HOT 1
- API for per-day stats HOT 1
- navigator.sendBeacon can fail, should be handled HOT 1
- Cannot Self Host Locally HOT 4
- Unable to delete site HOT 4
- utm_campaign and campaign seem to not work
- Disabling Sessions breaks GoatCounter HOT 4
- Someone else can send traffic from their website HOT 4
- Sending events from a backend HOT 2
- Enable use of subpath rather than domains for sites HOT 2
- everything is Loading forever
- API returns unreliable number of records when using unknown query parameters HOT 7
- "campaigns" - How to? HOT 1
- GoatCounter CSV exports – what are your use cases? HOT 6
- function "is_inf" not defined HOT 2
- Delete site shows data after adding it back HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from goatcounter.