Comments (2)
A different example seen in the wild:
8a-5a-a3-30-a1-10
vs. 8a-5b-a3-30-a1-10
So we should likely filter out any WiFi networks with just a single-character difference. Potentially even two character differences in case the increment-by-one causes something like 1f
to 20
.
from ichnaea.
Following a discussion on the mailing list (which touched the same subject regarding a downloadable database):
sam tygier wrote:
One potential problem is hotspots with 2 BSSIDs. This is common dual band (2.4 + 5 GHz) routers. For mine the BSSIDs are the same apart from the last byte ('c' vs '8')
Which means we have to catch cases like xx:xx:xx:xx:xx:xc
vs. xx:xx:xx:xx:xx:x8
(basically just one bit flipped, 1000
vs. 1100
).
mvglasow wrote:
Subsequent/neighboring BSSIDs are a tricky matter: One is the case you describe, where a dual-technology device has two subsequent (or neighboring) BSSIDs. [...]
On the other hand, companies often purchase WiFi hardware in batches, resulting in multiple devices with adjacent BSSIDs in a single building or site. [...] these devices may be the only ones in range. Since the hardware in this case is operated by an entity rather than an individual, privacy is less of a concern.
When in doubt, privacy prevails, but we should keep in mind that there are cases in which it is desirable to get [a location supplying] two subsequent BSSIDs.
[D]evices with multiple, closely related BSSIDs are also a potential issue with online APIs. It would be interesting to know if they handle that case [..] and how they do it.
hannosch wrote:
Iām leaning towards something like a Levenshtein distance check, considering anything with a distance of two or less to be too similar.
Hanno, your example of 8a-5a-a3-30-a1-10
vs. 8a-5b-a3-30-a1-10
is interesting since it is a locally administered address (indicated by the xxxxxx1x
pattern in the first byte) and the changed bit is in the first three bytes, which in an officially assigned MAC address would form the Orgaizationally Unique ID (OUI, vendor ID) part. We probably need to include all six bytes in the check for locally administered addresses. OTOH, xxxxxx0x
indicates an officially assigned OUI, thus for these MACs it might be sufficient to compare the last three (NIC-specific) bytes.
How about Hamming instead of Levenshtein distance? Hamming distance is basically a count of bits that need to be flipped to convert one strem of bits into another of the same length. The algorithm is simpler, and the additional transforms that would increase the Levenshtein distance by 1 (e.g. insert, delete, append) are probably not very relevant for MAC addresses.
Hamming distance won't catch the 1f
to 20
case ( 011111
to 100000
has a Hamming distance of 6), hence we might want to check for an arithmetic difference in the last byte (or the last three bytes combined for official, or all bytes for locally administered addresses). Criteria could be the same as for the Hamming distance check, i.e. if any one of them yields two or less, the addresses are probably too similar.
from ichnaea.
Related Issues (20)
- location.stage.mozaws.net has failed the web security baseline HOT 1
- `StationTest` fails around UTC midnight
- location.stage.mozaws.net has failed the web security baseline HOT 1
- location.stage.mozaws.net has failed the web security baseline HOT 1
- dependabot.yml
- Client headers are removed by gunicorn server
- location.stage.mozaws.net has failed the web security baseline HOT 1
- Tables datamap_* are empty HOT 8
- Time to update MLS database HOT 1
- `/v2/geosubmit` and other submission docs should say that API key is optional HOT 2
- CircleCI Access Issues HOT 3
- Client: Work on MLS data jumping huge distances? HOT 1
- Add 6GHz Wi-Fi
- Please assist me HOT 1
- Can you please tell me where to follow-up referencing these codes HOT 1
- Data loss when uploading measurements but API responds OK HOT 13
- Inaccurate GeoIP lookup (geoclue)
- 1
- Retiring the Mozilla Location Service HOT 69
- Save the Database HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ichnaea.