How to explain this behavior in a current version? <div class="highlight highlight

I fixed the formatting for you. Please, learn a Git Flavored Markdown. Unformatt

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

censor() and censor_word() give different results for profanity about profanity-filter HOT 3 CLOSED

nataliGitHub commented on May 29, 2024

censor() and censor_word() give different results for profanity

from profanity-filter.

Comments (3)

rominf commented on May 29, 2024

I fixed the formatting for you. Please, learn a Git Flavored Markdown. Unformatted code is hard to read.
You are doing this wrong: censor_word is designed to censor words, but you are giving this function a phrase. The function will not work as expected if you are using it wrong.
You've discovered an actual bug: I had to call str.lower in function, that extracts lemmas. I already fixed this. Please, try the latest version from PyPI.

Here is how it works now:

In [1]: from profanity_filter import ProfanityFilter

In [2]: pf = ProfanityFilter(censor_whole_words=False)

In [3]: pf.censor("FUKUHARA")
Detector is not able to detect the language reliably.
Out[3]: '******RA'

In [4]: pf.censor_word('FUKUHARA')
Out[4]: Word(uncensored='FUKUHARA', censored='******RA', original_profane_word='fukka')

In [5]: # Adding "FUKUHARA" to the dictionary to not censor it

In [6]: pf.spells['en'].add('FUKUHARA')
Out[6]: 0

In [7]: pf.clear_cache()  # Required, because `pf` remembers that 'FUKUHARA' is profane word

In [8]: pf.censor_word('FUKUHARA')
Out[8]: Word(uncensored='FUKUHARA', censored='FUKUHARA', original_profane_word=None)