ignis-sec / pwdb-public Goto Github PK
View Code? Open in Web Editor NEWA collection of all the data i could extract from 1 billion leaked credentials from internet.
License: MIT License
A collection of all the data i could extract from 1 billion leaked credentials from internet.
License: MIT License
It would be very useful to have a list that showed the top words that are A) very common to a leak, and B) very uncommon in other leaks.
The most obvious entries would look redundant:
linkedin|linkedin
chegg|chegg
... etc., but there are less-obvious ones likely hiding in the data that you are very well-positioned to analyze.
The list is, indeed, mysterious. Interestingly, even though you had a huge dataset to start with, it is missing several passwords that match the pattern, and appear in a ton of records in HIBP, which means the 763K password list is hardly exhaustive.
"tgPw53j3kG" shows up 4354 times in HIBP
"odz1w1rB9T" appears 3769 times
"ZZ8807zpl" appears 7508 times
Any chance you could match the passwords to emails they were used with, to see if there's a pattern? E.g., in the case of the passowrds above the first one shows up primarily next to gmail.com addresses in my (very limited) dataset, whereas the other two belong to hotmail users with very similar usernames (but not always! there are exceptions, too). It hints me that these could be either mass account takeovers where the attackers woudl reset all passowrds to a single password, or auto-generated email accounts used for botfarms.
Merhabalar, Python öğrenmek icin ne tür kaynakları önerirsiniz?
ve temel seviyede php ve js bilgim var python ögrenme konusunda yardımcı olur mu ?
It would be interesting to know what the most common patterns are.
I'm thinking something like converting all the passwords to pattern masks and sorting them by most occurances.
For example, the following passwords:
123456
password
Password123
Summer2019!
would be translated to the following pattern masks:
dddddd
llllllll
ulllllllddd
ullllldddds
where:
lowercase letter is l
uppercase letter is u
digit is d
special character is s
the Spanish words seems ok You may run in some isues if the word "Ñ" is used, its an n with a litle ~ on top of it
If you can, please consider the country of Colombia, domain is ".co"
Javier
Would it be possible to dump password , ratio_of_occutrence_in_the_db
?
That would allow to see not only which passwords are popular but also know by how much.
Thank you!
Did you capture usernames / email addresses in your data set? Can you determine uniqueness or lack thereof by email addresses? For example, what fraction of the passwords associated with a specific username (email address if relevant) are unique, and how does that vary with the number of duplicates of the username (i.e., reuse of passwords vs # of times the username is matched in the data set). Thanks!
What are the usage rights of this work?
First of all, thank you for your work. What method or tool you used to process 1B data will be useful if you share the original methods you used. We are also very interested in this part of the study.
For a certain type of Dump,
Did you notice any sort of trend ?
Like, for Shopping -type dumps, a certain 'type' or 'format' which is specifically used or trending ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.