Giter VIP home page Giter VIP logo

Comments (12)

arp242 avatar arp242 commented on May 27, 2024 2

Just because it's not 100% reliable doesn't mean it's not useful. It's mostly accurate and gives a good indication of which browsers people are using, which is useful in making decisions about browser support and the like.

I don't know what the future will hold. I know about Google's recently announced plans (linked above) but older browsers won't implement that, and it's especially useful to see if people are using older browsers. I suspect it will still be useful for several years to come.

from goatcounter.

arp242 avatar arp242 commented on May 27, 2024 1

uasurfer also doesn't seem that great; on a few test runs I got a lot of wrong data; see: 4143a04

from goatcounter.

arp242 avatar arp242 commented on May 27, 2024 1

The problem here is a legal/ethical one @ptman, not a performance/space one. Storing the full User-Agent header makes it easier to identify persons based on the statistical data, and I'd like to make that harder when possible.

Just "Firefox 72" is both useful and quite anonymous, but Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0 leaks my OS as well, and especially a lot of mobile browsers send a ridiculous amount of information detailing the OS version, device model, device build version, and language. Here's an example of that:

Mozilla/5.0 (Linux; U; Android 9; fr-fr; Redmi Note 8 Pro Build/PPR1.180610.011) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/71.0.3578.141 Mobile Safari/537.36 XiaoMi/MiuiBrowser/11.1.7-g

It's ridiculous that they're sending this in the first place, but that's not something in my power to fix.

I considered normalizing as well; for example we can probably get away with removing the data between parent ((X11; Linux x86_64; rv:72.0)) for all User-Agent strings, which would already be an improvement as a lot – though not all – excessive information is contained in there, but I need to run tests and see how well that works out.

So in short, I'm not 100% sure yet what the best solution is here yet.

from goatcounter.

arp242 avatar arp242 commented on May 27, 2024

Another possible project: https://github.com/ua-parser/uap-go

from goatcounter.

arp242 avatar arp242 commented on May 27, 2024

https://github.com/matomo-org/device-detector

from goatcounter.

ptman avatar ptman commented on May 27, 2024

I would seriously recommend still storing the full UA header/string. But maybe store it normalized. E.g. a reference from requests to ua table to save space. The UA table will probably end up fully cached.

from goatcounter.

ptman avatar ptman commented on May 27, 2024

UA strings are useful for debugging and also for grouping different clients that ignore cookies. It's data sent willingly from the browser, not something you have to go digging around to extract. Operating systems can make a huge difference in browser behaviour. And it's something that by default ends up in httpd logs. I understand the desire for privacy, but I would just store the whole UA string. Especially since they have been tricky to parse in the past and can be tricky to parse in the future.

from goatcounter.

arp242 avatar arp242 commented on May 27, 2024

Yeah, I appreciate there are advantages to storing it as well, which is why that is what GoatCounter is doing now. It's a bit of a tricky balancing act. Aside from that "the right thing" to do here, there is also the legal aspect to consider; the GDPR specifically mentions:

Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them

Does this cover these kind of User-Agent strings? Possibly.

data sent willingly from the browser

I don't think most users have knowledge that the full device info and language is being sent.

from goatcounter.

ptman avatar ptman commented on May 27, 2024

I'm not a GDPR lawyer, but UA strings are ok in logs, AFAIK. GDPR allows processing information for different purposes. One being consent. But logs aren't processed based on consent. It probably "for legitimate interests of data controller", i.e. technical maintenance, troubleshooting, debugging etc. One could argue that UA strings are an old technical debugging device that helps with maintenance. E.g. identifying scrapers etc.

from goatcounter.

arp242 avatar arp242 commented on May 27, 2024

Yeah, maybe. I think with the lack of case law and inconsistent interpretations right now no one can really tell how it applies here for sure.

from goatcounter.

arp242 avatar arp242 commented on May 27, 2024

https://groups.google.com/a/chromium.org/forum/m/#!msg/blink-dev/-2JIRNMWJ7s/yHe4tQNLCgAJ

I was aware of client hints, but no idea things were going to move this fast...

from goatcounter.

DanielRuf avatar DanielRuf commented on May 27, 2024

UA strings were never reliable and will not be very relevant in the near future.

from goatcounter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.