Giter VIP home page Giter VIP logo

text_sentiment_analysis's Introduction

Text_Sentiment_Analysis

My first experience using NLP, working in nltk library of Python

Instructions:-

To run DataFrames.py:- delete positive_dictionary.xlsx and negative_dictionary.xlsx, the 'DataFrames.py' code will use openpyxl to open master dictionary and get the positive and negative words and create these 2 workbooks in the same folder Note: Workbook opening takes time as the workbook is of a medium large size

To run Main.py:- open the "Output Data.xlsx" file and select the 'blank copy' sheet and close the file. Run main.py to get the desired data in the blank copy

Compiler Output:- The compiler output will show program start/end, url from which the program is extracting information and the time taken by the program to process each url and store the desired variables and the total time taken by the code(approx 15-17 min on my pycharm application)

Contents of zip file:- 2- .ipynb Jupyter notebookks where codes were tried and worked 2- .py Python files, one used to get the positive and negative words dictionaries from the master dictionary 1- .txt Text file with stop words which are extracted for use 1- .txt Readme file with all instructions 7- .xlsx Excel workbooks, 4 workbooks contain dictionaries of positive, negative, constraining and uncertainty words, 1 workbook is the Master dictionary which was used to extract the positive and negative words dictionaries, a cik workbook, containing the original contents of the input xlsx file and a Output Data workbook containing the output file with all required variables filled for all the rows, the desired file

About the file:-

->The Desired Output can be found in Output Data with all the required variables calculated and textual analysis complete ->The .py files are extensively well commented with easily understandable variable names, most of the code is self-explanatory


Textual Analysis Process-

Libraries Used: pandas,openpyxl,regex,urllib,nltk,bs4(Beautiful Soup)

PreCode: Step1: All functions required are defined, global variables defined Step2: Input and Output notebooks are opened for use using openpyxl Step3: Dictionary of all stopwords, positive words, negative words etc is made. We use dictionry instead of list because the search complexity is of constant time compared to list which has a search complexty of linear time Step4: Data is extracted, cleaned and positive, negative etc scores are stored in global variables for further use

Cleaning Data: Data is cleaned by removing all numeric data, noise, xml codes etc. Function getGlobalVariables() perorms the task of extracting the data using urllib, cleaning all the data using nltk, BeautifulSoup and finally, it finds number of positive, negative etc. words which are stored in global variables so they can be used across functions

Textual Analysis: For a given row and a list of clean data, all the required calculations are performed by this fucntion and the output values are stored in their respective columns.

File Opening: Sometimes, the webpage denies access to urllib function, so a recursive function called open file keeps sending requests till accepted, we can add more headers to the rewuest to appear less bot and more human while making a request, but this way also works

Workbook Closing: In the end, we close the workbooks that were opened for our tasks

Main Function: It simply calls the above functions to get a clean list and to use that list to perform the calculations and store the values in the appropriate cells of the output file

text_sentiment_analysis's People

Contributors

anhad-shrivastava avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.