Giter VIP home page Giter VIP logo

multilingual_fairness_lrec's People

Contributors

dependabot[bot] avatar franck-dernoncourt avatar hanxudong avatar xiaoleihuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

multilingual_fairness_lrec's Issues

What are the different directories in data folder mean ?

In rnn.py, cnn.py and lr.py, there are some hard-coded paths which does not exist in the repository, for example, ./data/encode, ./data/split, ./data/indices, ./resources/weights. Also, some paths in the preprocess.py also do not exist. I found that now the data in ./data/indices/ have been split into train/eval/test, so does it mean I don't need to run preprocess.py?

Issue regarding the data format

Hi,

I have encountered a serious issue in your data. When you read the data in pandas with a \t seperator (just like you did), several rows are combined into one for some tweets due to formatting issues of your data.

For instance:

data = pd.read_csv('anonymize/English/corpus.tsv', sep='\t', na_values='x')
print(data.shape)

would print
(83077, 11) which is the same number of docs value you reported in your manuscript in Table 1.

However, check this out:
print(data.iloc[623].text) results in

user : yes ! rt user : are you upset about tonight's elimination result ? hashtag 7 for the main ? i call rigged	2015-3-2	male	33.0	Melbourne	Victoria	Australia	white	neither
9119338424788223288	4926862304955734550	if you * really * don't like something a gamedev is doing , don't pirate their game . that makes you an asshole . just don't play it .	2015-5-11	female	15.0	Portland	Oregon	United States	white	neither
9175196123394046748	-2252236138639149560	user user all he does is attack black men . he hates himself and he doesn't even know it hashtag	2015-5-23	male	44.0	New Orleans	Louisiana	United States	black	racism
-7011258209083017299	4926862304955734550	user i hate numpads .	2015-2-19	female	15.0	Portland	Oregon	United States	white	neither
3133444189354840744	4926862304955734550	not sure about a stream , but i'll have at the very least a vine of zoe making the announcement . and i'll be livetweeting .	2015-3-4	female	15.0	Portland	Oregon	United States	white	neither
5487817575003690724	-5098803017287206708	automotive service manager - coon rapids , mn , 55433 hashtag hashtag rapids pls rt : * * overview : * * tires plus total car … url	2015-5-23	x	x	x	x	x	x	neither
-7442543494653349143	4926862304955734550	user i don't mention the name of the place i go to publicly :)	2015-5-3	female	15.0	Portland	Oregon	United States	white	neither
-7954606856165127796	4926862304955734550	this clan chat continues to be hilarious . url	2015-5-8	female	15.0	Portland	Oregon	United States	white	neither
9211584801926278231	5811374477814742037	hashtag hashtag hashtag hashtag hashtag hashtag hashtag hashtag hashtag hashtag url	2015-3-11	x	x	x	x	x	x	neither
-860627564535227330	4926862304955734550	a lot of women in tech have had to commit themselves so utterly to their work in order to be taken seriously . he's denying their identity .	2015-2-20	female	15.0	Portland	Oregon	United States	white	neither
-8609012007119515273	4926862304955734550	it is a * really bad thing * that now i know blackmilk swims fit well and are super comfy . really , really bad .	2015-2-10	female	15.0	Portland	Oregon	United States	white	neither
2032150622664019262	4926862304955734550	user user haha , how true .	2015-2-13	female	15.0	Portland	Oregon	United States	white	neither
-9121800105459111559	4926862304955734550	rt user : ok hearing this mask fucker saying the exact same shit that's been screamed at me for 6 months isn't fun anymore ...	2015-2-12	female	15.0	Portland	Oregon	United States	white	neither
1966015708529508302	5651239258254581284	catching up on hashtag did nikki & katie get a script to say the things they are saying because i wouldn't be caught dead saying any of that !	2015-3-2	x	x	Sydney	New South Wales	Australia	x	neither
-4550724697291005892	4926862304955734550	user i was somewhere . maybe ? pink pullover , pink backpack ?	2015-2-11	female	15.0	Portland	Oregon	United States	white	neither
147731823526758452	4926862304955734550	 on twitter , you don't know who you are talking to . " - oh , this woman couldn't be a software dev . oh lordy .

meaning that the tweet text is all messed up.

There are several examples like this that I can not list one by one.

I strongly suggest that you address this issue. Would be happy to help you.

Subcategory definition of hate speech

Thanks a lot for this dataset!

I have realized that your encoding considers subcategories of normal and neither as non-hate speech (at least for English). There are several label sub-categories so what exactly is neither?

Also how about subcategory link and spam? They are considered hate speech currently according to your encoding. Your publication does not mention this either.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.