Giter VIP home page Giter VIP logo

tattle-made / uli Goto Github PK

View Code? Open in Web Editor NEW
28.0 7.0 28.0 49.06 MB

Software and Resources for Mitigating Online Gender Based Violence in India

Home Page: https://uli.tattle.co.in

License: GNU General Public License v3.0

Jupyter Notebook 47.78% Python 6.90% JavaScript 27.65% CSS 0.21% Dockerfile 0.08% Shell 0.07% HTML 0.20% MDX 17.11%
ogbv ml nlp extension-chrome machine-learning gender-based-violence content-moderation browser-extension india indian-languages

uli's Introduction

Uli : Reclaim your Online Space

Uli is a browser plugin that :

  • De-normalizes the everyday violence that people of marginalized genders experience online in India
  • Provides tools for relief and collective response.

It is an attempt to invert the top-down logics of platform moderation and center the experiences of those subject to online gender-based violence.

🎉 Contribution Pathways 🎉

We cherish diversity of experiences and perspectives. It adds value to our work. To this end, we strongly encourage candidates who find alignment with the project and are driven to learn, to contribute to Uli. There are both code and no-code issues that you can contribute to.

To contribute effectively, we recommend doing some of these:

  • Peruse our Wiki. It will help you navigate our repository, and adhere to our standards for contributions.
  • We've labeled beginner frienly issues with good first issue.
  • Read our Setup Guides on the Uli Wiki or watch a video tutorial
  • Join the our Slack to interact with the team and get any clarificatios. Introduce yourself in the #introductions channel and feel free to discuss any Hacktoberfest-related questions in the #issue_uli_hacktoberfest channel.

Technologies we use

If you are new to any of these, we've created a learning guide for you.

Repository Structure

Directory Description
browser extension Uli browser extension that helps moderate and mitigate online gender based violence on twitter. All hacktoberfest work is limited to this directory
annotators a web app to annotate tweets, unmaintained
slur-replacement Python notebook that documents our exact and approximate slur replacement techniques
ogbv-ml-rest Hosted access for our machine learning model

Quick Guide

  1. Motivation
  2. Approach
  3. Roadmap
  4. Contribute
  5. Contact

Motivation

The graphic narrative titled ‘Personal (Cyber) Space’ published in 2016 by Parthasarthy and Malhotra narrates an experience of a young internet user. The animated short comic hosted by Kadak, a South Asian women collective, asks: ‘If one says something, there’s the fear of hateful response. But if one doesn’t say something, isn’t that silence counterproductive?’ only to end with the question, ‘so what does one say?’

Violence, abuse, and hate speech on the web has become pervasive to one’s experience of social media and the existing scholarship suggests that it is those situated at the margins who are worst affected. People of marginalized genders in India are disproportionately affected. Simultaneously, the business models of social media platforms skew the incentives against protecting users who use social media in non-dominant languages. Uli is an attempt to build tools to protect and enable collective response for social media users of marginalizd genders.

Approach

The problem of online violence encompasses within itself legal, political, social, cultural and technological complexities that make any easy solution impossible. This overdetermined nature mandates that we seek solutions from multiple avenues. As with all Tattle projects, we don't expect technology to provide all the answers, but for it to be intertwined in human action.

Specifically around redaction, the project borrows from feminist approaches to Machine Learning technology and aims to intervene into the ongoing debate around content moderation. The existing algorithmic approaches to automated content moderation strategies are generally biased towards English-language content paying very limited attention to social, cultural and linguistic diversity elsewhere. Moreover, the existing approaches understand moderation through a binary logic of: leave content up or remove it. With multiple political and legal implications emerging from these biases, the existing approaches threaten to pose more problems rather than solving them. With this tool, the project aims to redress these problems and find creative ways in which moderation can empower multiple users, especially the ones that are most affected.

We started the projeect through a period of qualitative data collection methods and participatory analysis. Based on the needs articulated by gender rights activists and researchers we focused on building the following features:

  • Detection/ filtering of abuse
  • Tools for archiving locally and through email.
  • Localized resources for understanding the effects of online gender based violence.
  • Invoking networks for action on abusive content.

Based on feedback from the beta on potential misuse of the feature, it has been removed from the current version till more checks and balances can be built in.

The ultimate aim of the project is to envision creative and collective responses to the structural problem of violence experienced online and help build solidarity and shared understanding while empowering users to take back control of their digital experience.

Situating machine learning:

Machine learning based approaches are a commonly used technique to automate decision making when the volume of data is large. To put it briefly, machine learning works by finding patterns in existing data to ascribe a value to future queries. Instead of telling an algorithm what to do, in machine learning, the algorithm figures out what to do based on the data it is fed. The data used to train a machine learning system as well as the algorithm used to classify the data, can encode social beliefs and values. These are perpetuated in the performance of the machine learning systems.

The moderation decisions of social media platforms often make international news. Some decisions can be attributed to error. Machine learning system, like every prediction system, makes errors. But some decisions reflect the social values in the data and algorithms behind the model. So, what many communities find harmful may not be harmful as per the guidelines set by social media platforms. Machine learning tools can also be designed to reflect the values of those at the forefront of tackling violence, to protect those who will be at the receiving end of the violence. This is precisely the goal of our project.

The ML model is based on needs articulated by communities, rather than the priorities of powerful institutions. We are working to publish our methodology, annotation guidelines, datasets, and the limitations in the dataset. Our goal remains to make the models interpretable to the users of the plugi. We believe this will help to raise awareness about content moderation systems as well as gender-based violence online.

Contributing

We've made a list of good first issues to get started on. You can also track the project here Find an issue or domain that interests you and reach out to us. Learn More

Contact

For more details on this project please send an email to one of the following email IDs:

Funding:

The pilot of the project was managed by the Centre for Internet and Society and Tattle Civic Tech. It was funded by Omidyar Network India as part of their Digital Society Challenge grant. In addition to revenue from Tattle's other projects, Uli is supported by Mozilla's Digital Society Challenge.

Website

Visit the Uli website at https://uli.tattle.co.in/

uli's People

Contributors

aatmanvaidya avatar abhishek-nigam avatar bhargav-dave avatar d80ep08th avatar dennyabrain avatar duggalsu avatar gagansankhla avatar iajaymk avatar kaustubhavarma avatar mahalakshmijinadoss avatar mlkorra avatar plon-susk7 avatar rishavt avatar siddhant-k-code avatar tarunima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

uli's Issues

Provide offline support for approximate slur replacement

@mlkorra has written python code that uses fuzzy search to detect approximate matches for slurs - https://github.com/tattle-made/OGBV/blob/main/slur-replacement/Slur%20Replacement%20-%20SCRIPT.ipynb

This code is in python and the way to integrate this functionality in our Chrome Plugin would be to expose this functionality via a REST API. This of course would add latency and network calls made from the extension.

I see a potential to incorporate this approximate slur replacement function into the extension code itself. This will make it so that the feature will be able to work without any internet.

The key to achieve this would be to use WebAssembly
Web assembly support is built into all modern browsers now. There are compilers that will compile languages like python, go, C, c++ to webassembly. Allowing web apps to rely not just on javascript but these other languages too.
This is a similar project for python - https://github.com/pyodide/pyodide

@mlkorra's code relies on python and 2 libraries - fuzzywuzzy, Levenstein. Levenstein's code is written in C. So I think theoretically it should be possible to compile his code to webassembly and bundle that with the chrome extension. The extension will communicate with the webassembly app to get the approximate slur replacement feature as opposed to a REST API hosted on cloud (associated issue).

Parse DOM and extract tweets

In our current implementation we fetch all spans from the DOM with a particular classname and run the text in them through a slur replacement function. We do not keep track of individual tweets and hence can't provide tweet level operations like - archive tweet or add inline buttons for every tweet to let user take some actions.

The scope of this task is to parse the DOM and extract tweets and maintain some kind of a data structure that will make adding or operating on tweets easier.

We should build and document a simplified API that other features could use to provide tweet level functionality.

Add npm audit pre-commit hook

This is essential with multiple code contributors. For example, the current chrome-plugin/ code has a package-lock.json with lockfileVersion 2.

npm install with an older node version may give the following warning -

read-shrinkwrap This version of npm is compatible with lockfileVersion@1, but package-lock.json was generated for lockfileVersion@2. I'll try to do my best with it!

The command may finish successfully but result in installing node-modules with security vulnerabilities and modifying package-lock.json with a downgraded lockfileVersion.

This will require fixing dependency issues as flagged by npm audit

See -
https://docs.npmjs.com/cli/v8/commands/npm-audit

	By default, the audit command will exit with a non-zero code if any vulnerability is found. It may be useful in CI environments to include the --audit-level parameter to specify the minimum vulnerability level that will cause the command to fail. This option does not filter the report output, it simply changes the command's failure threshold.

Add cypress end to end testing to Github Action

We use cypres for doing end to end UI testing for the annotation-ui.
Right now i run npm run e2e to run these tests locally and then push the code to github. Github actions then deploys the code to staging or production environment.

When i run npm run e2e, cypress opens a chrome instance and runs these tests on my machine and i can actually see the different links being clicked but cypress also allows running the browser in a headless mode which lets you run these tests in a CI/CD pipeline.

So I would like the test step npm run e2e to be added to the deploy-prod and deploy-staging workflow here
The expected result is that, what one is finished developing the software, they just push the code to github and we run the test as part of the deploy pipeline itself and deploy to the appropriate environment only if the test passes. It will help us avoid regression bugs for end to end features.

Allow unblurring blurred tweets

This task requires work on the data parsing and UI side. I'll explain the UI side of the requirement and we can figure out the parsing and storing logic backwards.

Current Feature :

Whenever the plugin encounters a slur within the text that is within the article tag it covers it with a black shade like this :
shade-slur

Proposed Improvement :

Users would like to remove the shade on a word by word basis. The rationale is that either they want to read the tweet properly with the slur or the extension has covered a word incorrectly and they want to see what the word under the shade is.

I mocked two user interactions to achieve this. Each with differing complexity so we'll pick one based on the time we allot to this task.

Stateless Implementation

unblur-slur-stateless
In this implementation the shaded word can be a component that is responsible to manage its own state. It removes the shade when the user hovers on it.

Stateful Implementation

unblur-slur-stateful

This implementation involves keeping track of the tweet ID and the word location in the tweet to keep track of all the words that a user has unblurred.

Dimension values should be stored in english in the db

@tarunima I just realized that I have been storing dimension values as string in the db. this string is the string thats returned by the UI. which is usually fine but I just realized that for the multilingual setup, if a user choses "religion" as the option for dimension, it would get stored as "religion", "धर्म" or "மதம்", depending on the language of the user.

I have two proposed solutions.

  1. Fix it on the db level so that regardless of language, the db stores a consistent value.
  2. We just deal with it on the sql level. anytime we need to group annotations by dimension value we'll have to use a bunch of OR commands.

[ogbv plugin] Accessibility

General

Currently tested on

  • Ubuntu 20.04, Orca screen reader, Gnome desktop
  • OGBV plugin - v.0.0.9

Languages tested

  • English
  • Hindi
  • Tamil

Currently NOT testing

  • Braille - Lack of hardware resource

Development and Testing
https://reactjs.org/docs/accessibility.html#development-and-testing-tools

References


TODO

Chrome Extensions

Only standard controls used

Set focus on the rightmost item in the Chrome toolbar	F10 

https://www.webnots.com/twitter-keyboard-shortcuts/

When you on the Twitter webpage, press the ? key (shift + ?) on your keyboard to get the full list of keyboard shortcuts.

TODO

  • 'Ask friends for help' - Popup navigation issue

[Edit 1] Refer - grommet/grommet#6045

If no options are selected in the popup and the popup loses focus

modal
Whether there should be an overlay preventing interaction underneath the layer.
When Layer component is set up as a modal, it still allows keyboard focus to go behind the modal.

Expected Behavior
I would expect that when a modal is open, keyboard navigation should not leave the modal (except Esc which should close the modal). This is supported by the description of the modal attribute in the doc: "Whether there should be an overlay preventing interaction underneath the layer."
I'd expect that continuing to Tab through the modal should loop through the focusable elements inside it.

Actual Behavior
When a modal is open, tabbing allows you to leave the modal and select elements behind it.

None implemented

Font choices and text size impact how readable an extension's content is. Users with sight issues may need to increase an extensions text size. If using keyboard shortcuts, make sure they do not interfere with the zoom shortcuts built into Chrome.

As an indicator of flexibility of an extension's UI, apply the 200% test; if the text size or page zoom is increased 200%, is it still usable?

Avoid baking text into images. Users are unable to modify the size and screen readers are unable to interpret images. Instead, opt for styled web font, such as one of the fonts found in the Google Font API. Web fonts can scale to different sizes and can be accessed by people using screen readers.

https://www.w3.org/TR/2008/REC-WCAG20-20081211/#visual-audio-contrast-scale
Testing instructions:

  • Settings (3 dots) > Font size (+/-)

    • Extension text size remains the same
    • Keyboard navigation works properly
  • Increase text size: Settings > Appearance > Page zoom - set to 200%

    • Increases text size in extension too
    • Keyboard navigation works properly
  • No images used with text

  • Color
    Currently being modified by designers

Test all colors with Twitter color schemes

  • Default background: #FFF
  • Dim background: #15202b
  • Lights out background: #000

https://developer.chrome.com/docs/devtools/accessibility/reference/#contrast

TODO

  • Verify how to handle this
    • Chrome dev tools dom for twitter webpage buttons - 'Archive' + 'Ask friends for help' - no contrast info available

https://webaim.org/articles/contrast/
https://webdesign.tutsplus.com/articles/how-to-use-the-contrast-checker-in-chrome-devtools--cms-31504
https://developer.chrome.com/docs/extensions/mv3/a11y/#colors
https://snook.ca/technical/colour_contrast/colour.html#fg=33FF33,bg=333333
http://www.color-blindness.com/coblis-color-blindness-simulator/

Provide informative alt text for images.

https://v1.grommet.io/docs/accessibility

Custom title
Grommet components can be read by screen readers. The default textual description varies for each component. In general, this description is generic and lacks contextual information that help users to understand the surrounding data. Most Grommet components support a property called a11yTitle that enables callers to provide a better description.

Example:
Using a11yTitle for icons to explain their purpose.

<Status a11yTitle='Server Down'
  value='critical' />
For this example, will read "Server Down, Image" instead of "Critical, Image".

Grommet

https://v1.grommet.io/docs/accessibility
https://v2.grommet.io/

  • Custom title
Using a11yTitle for icons to explain their purpose.

<Status a11yTitle='Server Down'
  value='critical' />
For this example, will read "Server Down, Image" instead of "Critical, Image".
  • Language Support
Example
<App lang='en-US'>
  Testing
</App>
  • Skip Links
Skip links will be automatically presented when pressing tab in a given page.

React

Use aria-label and aria-labelledby for non-text elements eg where Name attribute (as shown in developer tools Accessibility tab) is ""

General

Final

[Edit 2] Restructured above content. Added Sections
[Edit 3] Added references for precommit
[Edit 4] Added reasoning for not testing Braille

Change icon colors based on theme on twitter

Right now the icons for the plugin are black which aren't visible on twitter's dark theme. We need a way to detect the the twitter theme in the content script and theme our tweet control UI accordingly

Make it easy for admin to discuss annotation disagreements

Possible approaches :

  • Maybe show "e_twitter_id" in the UI next to the post ID.
    If this is field is clickable in a way that it opens the real tweet on twitter then that makes it easy for everyone to see the original tweet.

  • Make a dedicated page at /post/:postID
    When a user visits this (possibly upon clicking the post ID within the annotator UI), it shows all the data associated with the post.

Collect list of existing resources

For data sampling for annotation, as well as to potentially include them in our training set, we need a list of existing resources for our target languages. Specifically, we need :

  • Existing datasets and papers
  • Their statistics (label distribution etc.)
  • Pre-trained models
  • If there's some novelty in how they're modifying their approach for Indic languages

Some tweets are longer than 500

I found a tweet that seems to be longer than 500 characters.
I thought twitter had a tweet length limit of 480 characters. I'll have to reconfigure the database to support > 500 characters. which is not a problem, but i'd like to know what is the issue here. does twitter support longer tweets now and if yes, is there a character limit?

regex capture group not working with non english text

Task Description

This task is part of our slur replacement project. The basic requirement of the function is to take an input string and hide the problematic slurs within it.

Current Approach

I am pasting a simpler reproduction of my solution to demonstrate how things should work.

let input = "I have a cat, dog and a goat";

function hideSlurs(text) {
  return text.replace(/\b(?:cat|dog|goat)\b/gi, "----");
}

let output = hideSlurs(input);
console.log(output);

The output in this case becomes "I have a ----, ---- and a ----".
As you can see, the word cat, dog, goat are replaced by '----'.

Problem

The hideSlurs function fails with non english words.

let input = "मेरे पास एक बिल्ली, कुत्ता और एक बकरी है";

function hideSlurs(text) {
  return text.replace(/\b(?:बिल्ली|कुत्ता|बकरी)\b/gi, "----");
}

let output = hideSlurs(input);
console.log(output);

In this case the output is the same as the input. I get similar results trying with Tamil text.

How can we modify the hideSlurs function to make it work with non latin characters.

Note : although the project uses javascript and the code here is javascript too, if you can make it work in any language, that would be useful for us in debugging this too.

Environment management with Gatsby

Right now the way we manage environment on the frontend (a gatsby site) is very error prone.
Its a simple file that need to be manually edited depending on the target we are deploying to.

https://github.com/tattle-made/OGBV/blob/6549d4fb2893b2457a01b2450115f7abd1bd823b/annotators/annotation-ui/src/components/config.js#L1-L4

So its very possible to not have the right value for api_endpoint. One could set the value to point to the staging server and push to production.
The other limitation is that right now only one variable is environment specific (api_endpoint), but if more such variables were to be added, this would be very hard to get correct everytime.

What is the right way to do dynamic environment management in Gatsby?
I can think of a few theoretical solution.

  1. We could set an environment variable in the CD pipeline before we run the gatsby build command. so that ways the build function can then set the appropriate values in the config.js file depending on the environment variable.
  2. The other slightly strange solution I can think of is to have all possible config variables in a file and then choose the right ones based on the location.href value. This is not preferable because ideally i'd not like to expose data that is not supposed to be there in an environment in the website bundle.

Any suggestions or stab at solutions is welcome.

More than one Tweet Control UI per tweet

double-tweet-control

Occasionally more than one tweet control UI get injected per tweet. Although I have code that hashes the content of the tweet and uses the hash to uniquely identify a tweet, it seems that that logic isnt working here.

Figure out how to mock the database for cypress end to end testing

This is related to #47

the current end to end tests rely on having a local database with preseeded data for the tests to function correctly. It serves its purpose because it gives me a lot of confidence that the new code is not adding bugs and works as expected with database and all the API calls.
This way of testing is not suited for CI/CD pipelines where we wont be able to setup a local sql instance (or maybe we can, this needs to be investigated for). So we might have to mock some DB and API access functionality within the test itself.

So this issue will involve researching how mocking is done in cypress and how we can write such tests to include end to end testing within CI/CD workflow itself.

Add Metadata to a tweet while archiving

Improvement Request

User should be able to add additional metadata to the tweet or conversation they are archiving. Metadata fields proposed currently are - Title and tags. We should assume more fields could be added in the future.

Archived tweets are added to the db in the posts table. The sequelize model for the same is defined here - https://github.com/tattle-made/OGBV/blob/main/browser%20extension/api-server/db/models/post.js

We need to create a new Metadata Table that can store the title and tags. Title will be a string that can be capped at 100 chacters. A post can have multiple tags.

The proposed model relationships are as follows :
Post hasOne Metadata
Metadata hasMany Tag

This task has a backend and frontend component. The backend component will be done by @d80ep08th and @dennyabrain will work on the frontend component.

auto update or update an extension

Some features of the extension rely on parsing DOM and extracting data out of them. Since the underlying DOM structure can change at any point, it is essential to have some way to upgrade the plugin when that happens.

It looks like chrome auto-updates the extension for the user when a new version is available. So thats great!

There's also a process to manually upgrading the extension if chrome fails to do so. I found one such article

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.