Giter VIP home page Giter VIP logo

pwdb-public's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pwdb-public's Issues

top 'leak fingerprint' words (frequent unique passwords per leak)

It would be very useful to have a list that showed the top words that are A) very common to a leak, and B) very uncommon in other leaks.

The most obvious entries would look redundant:

linkedin|linkedin
chegg|chegg

... etc., but there are less-obvious ones likely hiding in the data that you are very well-positioned to analyze.

The mystery list

The list is, indeed, mysterious. Interestingly, even though you had a huge dataset to start with, it is missing several passwords that match the pattern, and appear in a ton of records in HIBP, which means the 763K password list is hardly exhaustive.

"tgPw53j3kG" shows up 4354 times in HIBP
"odz1w1rB9T" appears 3769 times
"ZZ8807zpl" appears 7508 times

Any chance you could match the passwords to emails they were used with, to see if there's a pattern? E.g., in the case of the passowrds above the first one shows up primarily next to gmail.com addresses in my (very limited) dataset, whereas the other two belong to hotmail users with very similar usernames (but not always! there are exceptions, too). It hints me that these could be either mass account takeovers where the attackers woudl reset all passowrds to a single password, or auto-generated email accounts used for botfarms.

-bug veya sorun degildir-

Merhabalar, Python öğrenmek icin ne tür kaynakları önerirsiniz?
ve temel seviyede php ve js bilgim var python ögrenme konusunda yardımcı olur mu ?

Most common pattern masks

It would be interesting to know what the most common patterns are.

I'm thinking something like converting all the passwords to pattern masks and sorting them by most occurances.

For example, the following passwords:

123456
password
Password123
Summer2019!

would be translated to the following pattern masks:

dddddd
llllllll
ulllllllddd
ullllldddds

where:
lowercase letter is l
uppercase letter is u
digit is d
special character is s

Spanish 150 word list and a suggestion

  1. the Spanish words seems ok You may run in some isues if the word "Ñ" is used, its an n with a litle ~ on top of it

  2. If you can, please consider the country of Colombia, domain is ".co"

Javier

Occurrence ratios

Would it be possible to dump password , ratio_of_occutrence_in_the_db?

That would allow to see not only which passwords are popular but also know by how much.

Thank you!

Recurrence by email address / username

Did you capture usernames / email addresses in your data set? Can you determine uniqueness or lack thereof by email addresses? For example, what fraction of the passwords associated with a specific username (email address if relevant) are unique, and how does that vary with the number of duplicates of the username (i.e., reuse of passwords vs # of times the username is matched in the data set). Thanks!

License

What are the usage rights of this work?

Which methods & tools used?

First of all, thank you for your work. What method or tool you used to process 1B data will be useful if you share the original methods you used. We are also very interested in this part of the study.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.