Giter VIP home page Giter VIP logo

go-away's Introduction

go-away

go-away

build Go Report Card codecov Go Reference Follow TwinProduction

go-away is a stand-alone, lightweight library for detecting profanities in Go.

This library must remain extremely easy to use. Its original intent of not adding overhead will always remain.

Installation

go get -u github.com/TwinProduction/go-away

Usage

import (
	"github.com/TwinProduction/go-away"
)

goaway.IsProfane("fuck this shit")         // returns true
goaway.IsProfane("F   u   C  k th1$ $h!t") // returns true
goaway.IsProfane("@$$h073")                // returns true
goaway.IsProfane("hello, world!")          // returns false

By default, IsProfane uses the default profanity detector, but if you'd like to disable leet speak, numerical character or special character sanitization, you have to create a ProfanityDetector instead:

profanityDetector := goaway.NewProfanityDetector().WithSanitizeLeetSpeak(false).WithSanitizeSpecialCharacters(false).WithSanitizeAccents(false)
profanityDetector.IsProfane("b!tch") // returns false because we're not sanitizing special characters

By default, the NewProfanityDetector constructor uses the default dictionaries for profanities, false positives and false negatives. These dictionaries are exposed as goaway.DefaultProfanities, goaway.DefaultFalsePositives and goaway.DefaultFalseNegatives respectively.

If you need to load a different dictionary, you could create a new instance of ProfanityDetector on this way:

profanities    := []string{"ass"}
falsePositives := []string{"bass"}
falseNegatives := []string{"dumbass"}

profanityDetector := goaway.NewProfanityDetector().WithCustomDictionary(profanities, falsePositives, falseNegatives)

In the background

While using a giant regex query to handle everything would be a way of doing it, as more words are added to the list of profanities, that would slow down the filtering considerably.

Instead, the following steps are taken before checking for profanities in a string:

  • Numbers are replaced to their letter counterparts (e.g. 1 -> L, 4 -> A, etc)
  • Special characters are replaced to their letter equivalent (e.g. @ -> A, ! -> i)
  • The resulting string has all of its spaces removed to prevent w ords lik e tha t
  • The resulting string has all of its characters converted to lowercase
  • The resulting string has all words deemed as false positives (e.g. assassin) removed

In the future, the following additional steps could also be considered:

  • All non-transformed special characters are removed to prevent s~tring li~ke tha~~t
  • All words that have the same character repeated more than twice in a row are removed (e.g. poooop -> poop)
    • NOTE: This is obviously not a perfect approach, as words like fuuck wouldn't be detected, but it's better than nothing.
    • The upside of this method is that we only need to add base bad words, and not all tenses of said bad word. (e.g. the fuck entry would support fucker, fucking, etc.)

go-away's People

Contributors

twin avatar mattwhite180 avatar sp-adrian-perez avatar psidex avatar tommahle avatar finnbear avatar shane-od avatar three7six avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.