profanity.dev's Issues
Feature Request: Add PDF File Input and API Support
Description
Currently, the website and API only supports text input in the form of JSON. However, it would be beneficial to have the ability to input PDF files and send them through the API for processing.
Proposed Solution
-
PDF File Input: Add a new file input field or drag-and-drop area to allow users to upload PDF files directly into the application.
-
API Support for PDF Files: Extend the existing API endpoints to accept PDF file uploads and process them accordingly. This could involve converting the PDF to text or images, depending on the use case.
@joschan21, I would like to work on this if you approve of this feature.
Creative slurs to consider
Every online FPS:
commit not live
Avatar the Last Air Bender:
I'll end you
Normalise score
After running this on many different pieces of text, I got no score lower than 0.7106113. So naturally it would make sense to assume that the output is between 0.7 and 1. If the score were calculated by Math.max((vector.score - 0.7) / 0.3, 0)
, the score would be a number between 0 and 1, and a threshold of 0.85 using the original formula would be 0.5 (50%) using this adaptation, which makes more sense. The score is also clamped at 0, so if somehow the score was less than 0.7, the score outputted would never be negative.
If you wanted 50% to be equivalent to the current 86% threshold, you would do Math.max((vector.score - 0.72) / 0.28, 0)
.
It may be useful to return this in a normalizedScore
/ normalisedScore
from the API so as to not break anything currently using the score
property.
Unclear LICENSE
I was unable to find a LICENSE file at the root or in the sub project folders. I would appreciate it if the license terms for this code were clarified in a standard LICENSE file 😜
Quote from https://choosealicense.com/no-permission/
No License
When you make a creative work (which includes code), the work is under exclusive copyright by default. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work without being at risk of take-downs, shake-downs, or litigation. Once the work has other contributors (each a copyright holder), “nobody” starts including you.
tldr nobody can use the code without you giving them permission via a LICENSE of some kind 🤷♂️ and right now there is no clear LICENSE 😭
fix case warnings for html attributes
word "suck" is flagged as highest profanity
Found a comment on the recent Youtube video that mentioned sending the input "mosquitoes suck blood" is flagged as 1.000 profanity. Tried this a few more times and noticed that the word "suck" is the culprit. Funnily enough, the word "sucks" is not flagged like this.
Should we whitelist this word along with the several others that are whitelisted right now?
Just some tests
pick a license
here are some workarounds that should be worked on!
swear words with spaces in between them, dont work.
the reason for this is that they naturally score lower, as shown here:
and also ascii text isnt supported:
etc etc... i used https://lingojam.com/SpecialText for this
other languages:
^ tbf turkish is obscure but- still...
random letters? honestly i dont know what the correlation is, but if i find one ill update it:
theres a million ways to cheese this, i think this project would be awesome if you kept on adding to it!
if nobody plans on implimenting it, i could give it a try :p
Removed <img/> element related warnings
[SUGGESTION] Allow configuration
Allow developers to configure it, such as allowing them to add custom allowed or blocked words, and disable specific categories from being flagged (as an example, allowing fuck this
, but not allowing racial slurs)
The letter f is flagged for profanity higher than 0.850
Use form for the demo
Use form for the demo so as to enable checkProfanity
mutation to be run when the user presses the enter
key.
Support for text symbol
When I use text symbol "f" which looks almost the same as "f" the app doesn't detect it.
Would be nice if App treat that symbol same as character f
feat: add portuguese language support
the formatted data getting undefined after running the for loop functiion
Add semicolons to end of line
bad practice to leave the end of lines empty without a semicolon (;)
some words getting detected for no reason
like "its h" gets for sh_t
same for n and more!
Repeating Consecutive Words causes False Positives
Phrases like:
- "dog dog dog"
- "monkey monkey monkey"
trick the algorithm into responding with false positives. Just remove repeating words, could improve API latency and accuracy.
"don't waste oxygen, just do it" is not detected
Multi-language support
While checking out PR #11 I realized that supporting multi languages should be fairly easy to implement, however I would allow an optional langs
parameter to pass a list of languages (eg eng,deu,ita
) and split each language in its own training data.
If no langs
param is passed, then all are checked.
Why? Well:
- we don't want to update/rebuild the whole vector/dataset if we add a new language or update an exisiting one
- we don't want to overload the server if we know that a certain site is going to use mostly one or two languages (eg, german and english)
- it makes the code more sustainable (not just a single huge .csv file with a bunch of commits)
Python library
Went ahead and make a Python library for this: https://pypi.org/project/profanity-api/
It's better use onnx instead tensorflow model?
Onnx can be perfomatic
https://onnxruntime.ai/docs/get-started/with-javascript/node.html
The API is down and responds back with an error.
I was testing this and found out that Cloudflare rate limited the api.
https://vector.profanity.dev/ currently sends you to a "Please check back later" screen instead.
This is probably a known issue, but its not on the issues page already so I'm putting it here for others trying to figure out why its not working.
Add a README.md file at root
I am a new contributors and it would be nice to others new newcomers if they could can quickly understand the project and get started without needing to ask questions or search through code.
I think it is a key tool for a project like this. Maybe someone is already working on this, but I didn't found it in my quick tour.
Thanks!
The letter "f" is flagged for profanity higher than 0.850
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.