Giter VIP home page Giter VIP logo

participapy / civic-crowdanalytics Goto Github PK

View Code? Open in Web Editor NEW
8.0 7.0 2.0 5.86 MB

Analytics tool that applies Natural Language Processing (NLP) and Machine Learning (ML), such as concept extraction, idea classification, and sentiment analysis to make sense of crowdsourced civic input.

License: MIT License

JavaScript 4.41% HTML 0.21% CSS 55.95% Vue 20.82% Python 18.61%
civic-tech machine-learning natural-language-processing e-participation crowdsourcing collective-intelligence

civic-crowdanalytics's Introduction

Civic CrowdAnalytics

Data analytics tool that applies Natural Language Processing (NLP) and Machine Learning (ML), such as concept extraction, idea classification, and sentiment analysis to make sense of crowdsourced civic input. This tool automatically organizes contributions into executive summaries and compelling visualizations, which are easy to comprehend, searchable, and interrelated. Civic CrowdAnalytics (CCA) is based on the scientific publication Civic CrowdAnalytics: Making sense of crowdsourced civic input with big data tools.

Civic CrowdAnalytics features a simple user-interface for submitting an unstructured dataset for analysis. The user can choose, for example, to organize ideas by pre-defined categories, visualize the frequency of recurring concepts, and sort the sentiments of related comments. The tool displays the results in both tabular summaries and interactive visualizations, which users can search and manipulate. Users can also choose to export the results in various formats, such as CSV, PNG, JPEG, SVG, or PDF.

Screenshots

dashboard

categorization

concept_extraction

Motivation

Civic technologies are currently bottlenecked by a common need for more effective processing of citizen contributions. Civic CrowdAnalytics provides a solution. By using innovative NLP and ML techniques, the tool automates the analysis and synthesis of key aspects of crowdsourced civic input. This automation will dramatically accelerate and improve the standard data management features that Civic Backoffice will also provide.

Features

In its first version the tool supports the following analytics features:

  1. Classification: This feature organizes the data into main- and subcategories by using well-known classifiers, such as Naive Bayes, Decision Tree, Random Forest, and Support Vector Machine. To train the classification algorithm, the user has first to code part of the dataset by labeling main categories and subcategories and then lets one of the algorithms to categorize the rest of the data. Texts written in any language supported by the NLTK library can be classified;
  2. Concept Extraction: Expressions and words are extracted from the data and displayed by frequency. Concept extraction provides lists of key terms and phrases, distributed by occurrence, which can then be further analyzed using statistical and qualitative methods. The user can specify a list of domain-specific words that should not be included in the analysis. The tool supports the extraction of three-words expressions at maximum;
  3. Sentiment Analysis: The data is analyzed in terms of positive, negative, or neutral sentiment which is assessed regarding established values of words and expressions. For example, words such as reduce, remove, and problem would show a negative sentiment, whereas increase, resolve, and good would show a positive sentiment. So far, CCA supports the analysis of sentiment of texts written in four languages: English, Spanish, Portuguese, and French. For English texts, CCA uses the Vader algorithm of the NLTK toolkit. In the case of Spanish, CCA has its implementation based on the algorithm ML-SentiCon. For the rest of the languages, the feature first translates the text into English by using the python package Googletrans and then employs the Vader algorithm;
  4. Text Similarity: This feature clusters together texts that are similar among them. The feature tokenizes and stemms the text, then it uses TF-IDF to transform the set of text into a Vector Space Model, and finally applies K-means algorithm to group texts represented by similar TF-IDF vectors.

Installation

Backend Installation

  1. Clone the repository git clone https://github.com/ParticipaPY/civic-crowdanalytics.git
  2. Get into the directory civic-crowdanalytics
  3. Create a virtual environment virtualenv env
  4. Activate the virtual environment source env/bin/activate
  5. Get into the directory backoffice
  6. Execute pip install -r requirements.txt to install dependencies. If an error occurs during the installation, it might be because some of these reasons: a) Package python-dev is missing b) Package libmysqlclient-dev is missing c) The environment variables LC_ALL and/or LC_CTYPE are not defined or don't have a valid value
  7. Create a mysql database. Make sure your database collation is set to UTF-8
  8. Rename the file backoffice/backoffice/settings.py.example as backoffice/backoffice/settings.py
  9. Set the configuration parameters of the database in backoffice/backoffice/settings.py
DATABASES = {
    ...
        'NAME': '',
        'USER': '',
        'PASSWORD': '',
        'HOST': '',
        'PORT': '',
    ...
}
  1. Run python manage.py migrate to set up the database schema
  2. Run python manage.py loaddata data.json to load configuration data
  3. Run python manage.py createsuperuser to create an admin user
  4. Run the Django server by running the following command python manage.py runserver 0:8000

Frontend Installation

  1. Install Node.js (version higher than 0.10.32) and update npm (version higher than 2.1.8). See here for an installation guide
  2. Get inside civic-crowdanalytics/frontoffice
  3. Install the project's dependencies by running npm install
  4. Set the backend server url, django user and password in frontoffice/src/Backend.vue
baseURL: 'http://localhost:8000/api',
username: '',
password: '',
  1. Start local server by running npm run dev
  2. Go to the following url http://localhost:8080 to access to the tool

Technologies

Backend Technologies

  1. Django Framework
  2. Django Rest Framework
  3. MySQL database (version 5.7 or higher) and its corresponding python package

Frontend Technologies

  1. Node.js and npm
  2. Vue.js
  3. CoreUI
  4. Chart.js

civic-crowdanalytics's People

Contributors

ana-cris avatar cdparra avatar guidonu91 avatar gvescu avatar jammily avatar joausaga avatar marcemmad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

civic-crowdanalytics's Issues

Add CORS (Cross-Origin Resource Sharing) headers to responses

This is needed in the API because we are getting this response in the client when calling the API:
"Response to preflight request doesn’t pass access control check: No ‘Access-Control-Allow-Origin’ header is present on the requested resource"

New project wizard mockup

Using CoreUI elements and light theming, develop the new project wizard mockup based on the images on #6.

  • Project information tab
  • Data set and format definition tab (only mockup, no import functionality yet)

Dataset import and preview functionality

  • Import and parse datasets in either CSV, XLS, XML or JSON format.
  • Add per-column controls for changing the column format (string, number or datetime) on the fly.
  • Preview the column data.

Add temporal function to call classification endpoint without having a configuration page

Add a temporal function to call classification endpoint without having a config page. In the config page the user should select the column of the dataset that should be used as a label for the classification analysis.

With this temporal function a column with name "label" in the dataset wil be used by default as the label column by the classification analysis.

Frontend upgrade

Push the latest changes from various issues of mockups and some of API integration.

Add state to dataset

Add state to dataset (DRAFT, COMPLETED).
When a dataset is created, it has a DRAFT state.
When we add all his related attributes and when we create the project, the dataset state is changed to COMPLETED

Integrate sentiment analysis module

Create a function in charge of:

  • Process a dataset

  • Call the sentiment analysis module with the correct parameters

  • Produce and save the correct output

Create classification module

Create first version of classification module withe methods for training the classifier and classify new documents

Optimize data uploading logic

Design and implement a two step logic in which first we simply load a preview (up to 5 rows) and only after deciding the columns types we upload the files together with the model

Refactor analysis module

Create different endpoints for each analysis (an endpoint for sentiment analysis, another endpoint for clustering, etc).
Group common code in methods.

Add status to analysis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.