Text_Sentiment_Analysis

My first experience using NLP, working in nltk library of Python

Instructions:-

To run DataFrames.py:- delete positive_dictionary.xlsx and negative_dictionary.xlsx, the 'DataFrames.py' code will use openpyxl to open master dictionary and get the positive and negative words and create these 2 workbooks in the same folder Note: Workbook opening takes time as the workbook is of a medium large size

To run Main.py:- open the "Output Data.xlsx" file and select the 'blank copy' sheet and close the file. Run main.py to get the desired data in the blank copy

Compiler Output:- The compiler output will show program start/end, url from which the program is extracting information and the time taken by the program to process each url and store the desired variables and the total time taken by the code(approx 15-17 min on my pycharm application)

Contents of zip file:- 2- .ipynb Jupyter notebookks where codes were tried and worked 2- .py Python files, one used to get the positive and negative words dictionaries from the master dictionary 1- .txt Text file with stop words which are extracted for use 1- .txt Readme file with all instructions 7- .xlsx Excel workbooks, 4 workbooks contain dictionaries of positive, negative, constraining and uncertainty words, 1 workbook is the Master dictionary which was used to extract the positive and negative words dictionaries, a cik workbook, containing the original contents of the input xlsx file and a Output Data workbook containing the output file with all required variables filled for all the rows, the desired file

About the file:-

->The Desired Output can be found in Output Data with all the required variables calculated and textual analysis complete ->The .py files are extensively well commented with easily understandable variable names, most of the code is self-explanatory

Textual Analysis Process-

Libraries Used: pandas,openpyxl,regex,urllib,nltk,bs4(Beautiful Soup)

PreCode: Step1: All functions required are defined, global variables defined Step2: Input and Output notebooks are opened for use using openpyxl Step3: Dictionary of all stopwords, positive words, negative words etc is made. We use dictionry instead of list because the search complexity is of constant time compared to list which has a search complexty of linear time Step4: Data is extracted, cleaned and positive, negative etc scores are stored in global variables for further use

Cleaning Data: Data is cleaned by removing all numeric data, noise, xml codes etc. Function getGlobalVariables() perorms the task of extracting the data using urllib, cleaning all the data using nltk, BeautifulSoup and finally, it finds number of positive, negative etc. words which are stored in global variables so they can be used across functions

Textual Analysis: For a given row and a list of clean data, all the required calculations are performed by this fucntion and the output values are stored in their respective columns.

File Opening: Sometimes, the webpage denies access to urllib function, so a recursive function called open file keeps sending requests till accepted, we can add more headers to the rewuest to appear less bot and more human while making a request, but this way also works

Workbook Closing: In the end, we close the workbooks that were opened for our tasks

Main Function: It simply calls the above functions to get a clean list and to use that list to perform the calculations and store the values in the appropriate cells of the output file

anhad-shrivastava / text_sentiment_analysis Goto Github PK

text_sentiment_analysis's Introduction

Text_Sentiment_Analysis

text_sentiment_analysis's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent