Giter VIP home page Giter VIP logo

Niyati Bafna's Projects

100linesofcode icon 100linesofcode

Let's build something productive in less than 100 Lines of Code.

character-bert icon character-bert

Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"

cscbli icon cscbli

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

email-clustering-on-the-enron-dataset icon email-clustering-on-the-enron-dataset

Project for CS-303. Working with public available dataset Enron (https://www.cs.cmu.edu/~./enron/), that contains approximately 0.5 million messages collected from 150 users, to model a classification.

embeddings-transfer-indian-languages icon embeddings-transfer-indian-languages

Transferring embeddings to low resource Indian languages using close relationships to other higher resource languages such as Hindi, Bangla, Marathi, etc.

gina icon gina

Learning a Hindi lexicon from parallel corpora. Monsoon 2018. Google Cloud NLP API.

handling-english-vpe-for-english-hindi-mt icon handling-english-vpe-for-english-hindi-mt

English-Hindi machine translation systems have difficulty interpreting verb phrase ellipsis (VPE) in English, and commit errors in translating sentences with VPE. We present a solution and theoretical backing for the treatment of English VPE, with the specific scope of enabling English-Hindi MT, based on an understanding of the syntactical phenomenon of verb-stranding verb phrase ellipsis in Hindi (VVPE). We implement a rule-based system to perform the following sub-tasks: 1) Verb ellipsis identification in the English source sentence, 2) Elided verb phrase head identification 3) Identification of verb segment which needs to be induced at the site of ellipsis 4) Modify input sentence; i.e. resolving VPE and inducing the required verb segment. This system obtains 94.83 percent precision and 83.04 percent recall on subtask (1), tested on 3900 sentences from the BNC corpus [Leech, 1992]. This is competitive with state-of-the-art results. We measure accuracy of subtasks (2) and (3) together, and obtain a 91 percent accuracy on 200 sentences taken from the WSJ cor- pus[Paul and Baker, 1992]. We carried out a manual analysis of the MT outputs of 100 sentences after passing it through our system. We set up a basic metric (1-5) for this evaluation, where 5 indicates drastic improvement, and obtained an average of 3.55.

north-indian-dialect-modelling icon north-indian-dialect-modelling

Collecting data for "dialects" in the North Indian "Hindi belt". Modelling the dialect system to gain insight and to develop NLP research for low-resource languages.

political_health icon political_health

This is for measuring hate on Twitter against certain groups, and comparing these metrics over time

retaining-source-terms-nmt icon retaining-source-terms-nmt

When we are translating technical material from English to Hindi, we may often want to retain certain terminology for consistency and coherence in Hindi. This task deals with constrained decoding of English-Hindi NMT to accomplish this goal i.e. given source English text, and a list of English terms that we want to retain, we want the output in target language Hindi that uses the required English terminology.

xorqa icon xorqa

This is the official repository for NAACL 2021, "XOR QA: Cross-lingual Open-Retrieval Question Answering".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.