Giter VIP home page Giter VIP logo

Comments (2)

hannosch avatar hannosch commented on June 15, 2024

A different example seen in the wild:
8a-5a-a3-30-a1-10 vs. 8a-5b-a3-30-a1-10
So we should likely filter out any WiFi networks with just a single-character difference. Potentially even two character differences in case the increment-by-one causes something like 1f to 20.

from ichnaea.

mvglasow avatar mvglasow commented on June 15, 2024

Following a discussion on the mailing list (which touched the same subject regarding a downloadable database):

sam tygier wrote:

One potential problem is hotspots with 2 BSSIDs. This is common dual band (2.4 + 5 GHz) routers. For mine the BSSIDs are the same apart from the last byte ('c' vs '8')

Which means we have to catch cases like xx:xx:xx:xx:xx:xc vs. xx:xx:xx:xx:xx:x8 (basically just one bit flipped, 1000 vs. 1100).

mvglasow wrote:

Subsequent/neighboring BSSIDs are a tricky matter: One is the case you describe, where a dual-technology device has two subsequent (or neighboring) BSSIDs. [...]

On the other hand, companies often purchase WiFi hardware in batches, resulting in multiple devices with adjacent BSSIDs in a single building or site. [...] these devices may be the only ones in range. Since the hardware in this case is operated by an entity rather than an individual, privacy is less of a concern.

When in doubt, privacy prevails, but we should keep in mind that there are cases in which it is desirable to get [a location supplying] two subsequent BSSIDs.

[D]evices with multiple, closely related BSSIDs are also a potential issue with online APIs. It would be interesting to know if they handle that case [..] and how they do it.

hannosch wrote:

Iā€™m leaning towards something like a Levenshtein distance check, considering anything with a distance of two or less to be too similar.

Hanno, your example of 8a-5a-a3-30-a1-10 vs. 8a-5b-a3-30-a1-10 is interesting since it is a locally administered address (indicated by the xxxxxx1x pattern in the first byte) and the changed bit is in the first three bytes, which in an officially assigned MAC address would form the Orgaizationally Unique ID (OUI, vendor ID) part. We probably need to include all six bytes in the check for locally administered addresses. OTOH, xxxxxx0x indicates an officially assigned OUI, thus for these MACs it might be sufficient to compare the last three (NIC-specific) bytes.

How about Hamming instead of Levenshtein distance? Hamming distance is basically a count of bits that need to be flipped to convert one strem of bits into another of the same length. The algorithm is simpler, and the additional transforms that would increase the Levenshtein distance by 1 (e.g. insert, delete, append) are probably not very relevant for MAC addresses.

Hamming distance won't catch the 1f to 20 case ( 011111 to 100000 has a Hamming distance of 6), hence we might want to check for an arithmetic difference in the last byte (or the last three bytes combined for official, or all bytes for locally administered addresses). Criteria could be the same as for the Hamming distance check, i.e. if any one of them yields two or less, the addresses are probably too similar.

from ichnaea.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.