Giter VIP home page Giter VIP logo

extraction_pipelines_vaccine_hesitancy's Introduction

Extraction pipelines vaccine hesitancy

Repository for the Lagrange Scholarship Project about Vaccine Hesitancy - Extraction pipelines

In this project, we developed two high precision rule-based extraction pipelines able to classify text with respect to vaccination behaviors and experiences. The items we tracked are (i) adherence to the recommended or alternative vaccination schedule and (ii) mentions of positive or negative experiences with adverse events following immunization (AEFI).

The two pipelines share the same workflow and work at the level of sentences. They are made up by a filter and a classifier. The filter identifies sentences which contain information relevant to the item under consideration by using a combination of rules based on the occurrence of certain keyword with specific syntactic dependencies, while the classifier assigns the appropriate label to the sentence.

The rules of the pipelines are handcrafted and developed by inspecting a dataset composed by comments related to vaccination, collected from a popular parenting forum (BabyCenter.com https://community.babycenter.com/).

Due to the Terms of Use of the forum, we can not make the dataset of user posts and comments available. We release only the resulting interaction network.


Requirements

python   (3.7.4)

spacy    (2.2.3)
pandas   (0.25.1)
numpy    (1.17.2)
nltk     (3.4.5)
networkx (2.3)
pickle

To load spacy language model:

>>> python -m spacy download en_core_web_sm-2.2.5 --direct

Structure of the repository

  1. Experiences_AEFI contains the keywords used to filter sentences relevant to experiences with adverse events following immunization

  2. Vaccination_schedule contains the keywords used to filter sentences relevant to vaccination scheduling

  3. data contains the interaction network

  4. output contains the results of the two pipelines

  5. test contains a list of sentences and the corresponding dependency trees. It is useful to test if the dependency parser of SpaCy returns the expected parsing

  6. utils contains files useful for the pipelines

  • AEFI_pipeline_functions.py contains the script thad defines the extraction pipeline of experiences of adverse reactions following immunization.

  • Dependency_tree_functions.py contains the scripts to represent the dependency parser of a sentence trough a network (using the networkx library). In addition, there are functions to search information by naviganting the dependency tree

  • Experiences AEFI : commentclassification.ipynb is the notebook in which the pipeline of experiences of adverse reactions following immunization is applied to the sample of comments located in the data folder

  • Schedule_pipeline_functions.py contains the scripts defining the vaccination scheduling pipeline

  • Vaccination schedule : comment classification.ipynb is the notebook in which the pipeline is applied to the sample of comments located in the data folder

  • test_dependency_parsing.ipynb is the notebook in which the dependency parser is tested and compared with the expected behavior

  • text_elaboration.py contains the scripts for basic text preprocessing

extraction_pipelines_vaccine_hesitancy's People

Contributors

loreb92 avatar

Stargazers

Pietro Monticone avatar

Watchers

Gianmarco De Francisci Morales avatar  avatar Yelena avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.