Giter VIP home page Giter VIP logo

diminutives.db's Introduction

This project is a database of common English diminutives (nicknames and shortened forms) of formal given names. It is useful whenever you need to search among lists of people's names for matches in a way that is tolerant to common colloquial variation. For example, "Daniel" may appear in databases as "Danny" or "Dan", and "Catherine" as "Cathie" or "Kate".

Methodology

The databases of diminutives, male_diminutives.csv and female_diminutives.csv, are manually-edited versions of data that was automatically extracted from Wiktionary by the PHP script bin/generate_diminutives_csv.php. The reason for the manual editing is that although the PHP script does a good job of extracting information from Wiktionary, it is not able to process all pages on diminutives and is confused by various capitalized words that it interprets as proper nouns (e.g. "Sometimes" and "Popular"). Also, the script is designed to stop processing an article and simply print the article's title to the console if it detects irregularities.

The output of generate_diminutives_csv.php is stored in the gen folder for reference. Each time that the script is executed, any changes to the "gen" files need to be applied to male_diminutives.csv and female_diminutives.csv.

Format of the CSV files

Each line of male_diminutives.csv and female_diminutives.csv consists of a formal given name followed by common diminutives of that name. For example, the following line from male_diminutives.csv indicates that "Nat" and "Nate" are common diminutives of the given name "Nathaniel":

Nathaniel,Nat,Nate

The CSV files are encoded in UTF-8.

Special exceptions

You should be aware of the following special case which cannot be added to the databases:

  • When a man's initials are J.E.B., he may go by Jeb.

License

Scripts are licensed under the terms of the GNU General Public License version 3 or any later version.

The data in male_diminutives.csv and female_diminutives.csv are Public Domain.

diminutives.db's People

Contributors

dtrebbien avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.