Giter VIP home page Giter VIP logo

list-of-dirty-naughty-obscene-and-otherwise-bad-words's Introduction

Our List of Dirty, Naughty, Obscene, and Otherwise Bad Words

With millions of images in our library and billions of user-submitted keywords, we work hard at Shutterstock to make sure that bad words don't show up in places they shouldn't. This repo contains a list of words that we use to filter results from our autocomplete server and recommendation engine.

Please add to it as you see fit (particularly in non-English languages) or use it to spice up your next game of Scrabble :)

Obvious warning: These lists contain material that many will find offensive. (But that's the point!)

Miscellaneous caveat: Clearly, what goes in these lists is subjective. In our case, the question we use is, "What wouldn't we want to suggest that people look at?" This of course varies between culture, language, and geographies, so in the end we just have to make our best guess.

Languages

Name Code
Arabic ar
Chinese zh
Czech cs
Danish da
Dutch nl
English en
Esperanto eo
Filipino fil
Finnish fi
French fr
French (CA) fr-CA-u-sd-caqc
German de
Hindi hi
Hungarian hu
Italian it
Japanese ja
Kabyle kab
Klingon tlh
Korean ko
Norwegian no
Persian fa
Polish pl
Portuguese pt
Russian ru
Spanish es
Swedish sv
Thai th
Turkish tr

See also the list of projects, documents, and organizations that use these lists.

Node Module

If you are using the word lists as .json, or in an npmproject, you can install the word list using the naughty-words package.

npm install naughty-words

© 2012–2020 Shutterstock, Inc.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

list-of-dirty-naughty-obscene-and-otherwise-bad-words's People

Contributors

amtrivedi91 avatar antonrufino avatar ashbardhan avatar beardedgeek avatar cesine avatar christianbundy avatar dompie avatar edent avatar emmanuelrosa avatar immjs avatar imomen avatar leereilly avatar lovasoa avatar mancxvi avatar mirez avatar mohammedbelkacem avatar mte90 avatar muellermartin avatar patch avatar pawelpalka81 avatar phayes avatar ponicorn avatar prkassad avatar rolfbly avatar sbonaime avatar sloria avatar srghma avatar tomjn avatar valentinh avatar zvaehn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

list-of-dirty-naughty-obscene-and-otherwise-bad-words's Issues

Add a license?

Hi there, thanks for providing this dirty word list, and in several languages no less. Would it be possible to add an official LICENSE file to this repository? I'd be happy to send a pull request containing the right license, just let me know which license you prefer.

If you do not want to add the LICENSE file, would someone be able to describe the license in this issue? Thank you.

Remove words that are not offensive or proper names?

Well I am dutch and I notice after finding a place where a friend's name was blocked that you really have a list that is kind of.. Bad.

There are lots of words that are clearly not offensive except in an already really explicit use:

asbak - ashtray -> never ever used
aso - short for "asocial person" and only ever used like it would be in english
balen - I guess this is a translation by google translate from a version of "sucks", it is really just "to be fed/annoyed up with something" and is actually considered to be the proper formal way to say it. The other translation is a "bundle" in the farming sense (bundle of wheat).
bedonderen - literary to cheat, but hardly ever (never?) used in an obscene way.
belazeren - same as above, but even more formal. Used mostly by media to state "die politicus "belazerde" de boel" - that politician lied/cheated about things he said/did".
besodemieteren/besodemieterd zijn - to "bugger" or "screw" somebody over.. But again not used in obscene sense EVER.
beurt - Ok this can be used in an obscene sense. But it is also one of the 1000 most commonly used words in non obscene sense. "jij bent aan de beurt/het is jouw beurt" - it is your turn. "omstebeurt" - one after another.
de hond uitlaten - this is a joke right? It's a common version of "my dad went to the store and never came back". But it still has the normal meaning of "walking the dog" - and there is no other way to say that
dombo - lit. "dumb person" and in a joking not obscene manner (like one would say to a toddler after he did something stupid).
droogkloot - someone who makes silly jokes
gras maaien - mowing the lawn...???
hol - "cave" honestly used in the same sense as the english version.
hufter - asocial person, often used by politicians.
klootjesvolk - everyone who isn't a white collar worker.
nicht - niece
op z'n sodemieter geven - archaic to "blame someone and punish him for it" often as done by a parent/grandparent to a child
opzouten - to go (away) ???
ouwehoeren - to chat ?????????
publiciteitsgeil - yearning for public attention
teef - bitch (but only used for dogs)
vergallen - to mess up the atmosphere not used in obscene sense
verkloten - to mess up something not used in obscene sense
voor jan lul - saying that you went/are somewhere but have no purpose
voor jan-met-de-korte-achternaam - same as above

And theren the worst:

anita - That's a girl name, quite offensive that you consider it obscene.

pt_PT vs pt_BR

You should really think about separating "pt" in pt_PT and pt_BR.
As a Portuguese, that speaks, talks and reads in pt_PT, I don't know half of the words on this "pt" list. The other half, are almost all not offensive at all!
"aborto" means "miscarriage"
"amador" means "amateur"
"aranha" means "spider"
"burro" means "donkey"
"cerveja" means "beer"
"comer" means "to eat"
"frango assado" means "roasted chicken"
"heterosexual" well.. you know what that means
"inferno" I believe you now what that means too
"torneira" means "faucet"
... and so on.

If I were to put this list in a Portuguese discussion board's blacklist, people wouldn't be able to write a complete sentence.
For a Portuguese, this list is useless.

Why there is no entry for "idiot"?

I think that the first version of this repo should include the most obvious bad words like "idiot" but this word is not part of the list. Why is that?

Thank you.

Include uppercase version of words on German list

In German the first letter of nouns is generally a capital letter. The german bad word list contains mostly nouns which are written in lowercase. I think this might lead to many situations where bad words are missed because people are not aware of this aspect of German and don't check for the uppercase version of the bad word for some given text.

Maybe all nouns in the German bad word lists should begin with a capital letter?

Wording

Hey @patch,

is there a standard or defined way the words should be added? Singular or plural? Phrases as single words or as one word ("hello world" vs. "helloworld")? Maybe all versions to allow simpler matching? Interested to hear what your thoughts are @patch.

Cheers :)

Italian bad word omitted

Inculare (or inculato, inculata)
The slang verb for sodomizing, also commonly used to express being scammed or otherwise damaged such as "I bought this and the day after the price halved. Che inculata!"

New list format suggestion

I think the list format is too simple and is missing some features:

  1. weighting/score: some words are unmistakably insulting/bad/etc. but others are more inoffensive or even ambiguous, therefore there should be some weighting like a score from 1-10 (for very bad to slang)
  2. matching: most words have simple plural versions and some words have multiple variants (e.g. German umlauts ä, ö and u can be expressed as ae, oe and ue or the letter ß can be written ss) or letters can be left out. Maybe this should be left to the filter implementation, but this should be difficult, if you don't know the language
  3. grouping/categories: it would be nice to have some sort of grouping or categories like crime, violence, pornography, illegal drugs, insults etc. And it would be even nicer if words can be in multiple categories (because insults can be used in "dirty talk" or violence in crime…)

I'd suggest a rather simple format like CSV (comma separated values) with individual files for groups and the word lists, e.g.:
The groups file with unique group IDs:

#0 should be reserved for uncategorized
crime;1
violence;2
insult;3
# …

And the word list with regular expressions (you can check them with RegExr or similar tools) optionally followed by the group IDs (can be left blank or set to 0 for uncategorized, multiple groups separated by ,) and the score from 1–10 (0 or empty for unrated):

cock(?!pit);;7 # This is a nice one: matches 'cock' but not 'cockpit' (uncategorized)
idiots?;3;7 # matches 'idiot' and 'idiots'
motherfucker;3;10
rap(e|ist|ing);1,2;6 # matches 'rape', 'rapist' and 'raping' but NOT 'rap'
# …

A small issue in this format is that matches are weighted the same, maybe sub-pattern matching could be used to rate each, but I don't know if this is needed (e.g. the pattern ((ass)(hole)?) results in three groups: ass, asshole and hole and multiple comma separated ratings apply to each group in order: ((ass)(hole)?);3;4,7).

Some of the ideas (weighting and groups) were taken from this list: http://contentfilter.futuragts.com/phraselists/

What do you think?

P.S.: Somehow I feel guilty for contributing to a filter/censorship list, but I think it can be useful to some extend to keep trolls and unconstructive discussions away. I hope these lists will be used responsibly…

French words

Hi,

I've read the french list, some words are strangely part of it.

Allumé means alight
Bosser is a common word for work
Veuve means widow
teuf is a slang word for party but is not pejorative
folle means crazy
bourré can mean stuffed

Goal of this List/Normal german words

Hi,

I found some german words which seem rather normal to me. As I don't know in which direction this list is going I thought I provide some context for them:

geil: colloquially used to describe something good/awesome. Also used to describe the state of sexual arousal
Hupen: literally is just multiple horns. Is also used as a metaphor for female breasts.
Knackwurst: Is just a type of sausage. If it would be used as an insult, that insult would be very weak. Maybe you mean Kackwurst?
Latte: colloquial word for a type of wood. Also describes an erected penis.
Milchtüten: Is just the packaging of milk. I could imagine this could refer to female breasts, but I never heard someone call them this way.
Picheln: Is colloquial for drinking alcohol.

May be you should consider sorting the list in strong bad words and weak bad words. There are many figures of speech which describe the act of masturbation or are used to describe genitilia and female breasts. If you would all ban them you would cripple the german language.
If you could clarify the goals of this list, one would be able to better complete this list.

CS dršťka is not a swear word

This is actually a common word describing beef stomach as an ingredient in meals. See "dršťková polévka". It is sometimes being used in a similar form, but when it is used as a swear word, the form of "držka" is used most fo the time, because the people swearing actually do not waste time writing it properly. Including the full form in this list will falsely filter out a lot of texts containing information about cooking, eating or traditional czech meals.

Duplicated words in lists

(Edited due to some unicode weirdness)
Several files have duplicated words in their list.

$ uniq -d da
kussekryller

$ uniq -d zh
口交
性交
性爱
阴茎

add russian bad works (now only 151, but we have more!)

падла
нафиг
жлоб
мразь
нахуй
гад
хуйлуша
ебаный
идиот
козел
негодяй
дегенерат
трахнуть
мразота
нахуя
ебанутые
дурак
кретин
хуипи
падла
онанист
дифичент
мерзавец
говно собачье
хуя
хуй пинать
гнидр
блять
ебаный
впизду
мать твою
засранец вонючий
пидрила
ублюдок
мудак
чёртов
дибилы

About pull requests.

There are many PR here, Any reason why owner and collaborator not response PR here?

Some english bad word lists

ethical slut?

That's ... a book about open relationships and polyamory ... I ... wtf is that doing on there?

(note that I'm absolutely not intending to get into an argument about the reason, I just can't work out what the heck the reason is)

You forgot…

… chocolatine !
Well joke apart, sucer is not necessarily a bad word, a «pastille à *****» or « ***** la moëlle» would be weird.

Suggestion

I think you should include 1 guy 1 jar , of you have included 2 girls 1 cup. Just in case you know ... In the girls perspective

Milestone: Over 81 Welsh Swears Translated

My pull request on Welsh swear words (ISO cy) has now got over 81 words! This is more than the lists for Arabic, Czech, Danish, Esperanto, Filipino, Canadian French, Kabyle, Klingon, Korean, Norwegian, Persian, Polish, Portuguese, Spanish and Thai.

Potential poor choice of words

Bisexual, homosexual or similar words that are considered sexual orientation should not be considered dirty, naughty or obscene in any matter.

But maybe I misunderstood the purpose of this project.

[ITA] Adding

boccaciccio
meretrice
sise
soffocotto
uallera
zinne

Request for collaborator access

@jacobemerick apparently you can give people this access. Could you please give me this access?

Now the reason I’d like collaborator access is so I can add new language files. I’ve made lists for Welsh and Afrikaans but because I don’t have collaborator access they become pull requests.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.