Giter VIP home page Giter VIP logo

sms_analysis's People

Contributors

llu13701 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sms_analysis's Issues

Spacy package name has changed

The readme says to download from here : python -m spacy download en_vectors_web_lg
However I think Spacy has changed their naming convention according to this github post.

I think the download should be changed to : python -m spacy download en_core_web_lg

Analysis throws error "statistics.StatisticsError: variance requires at least two data points"

/usr/local/bin/python3.9 /Users/xxx/code/persoonlijk/sms_analysis/simple_stats.py
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForNextSentencePrediction: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForNextSentencePrediction from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

Enter your whatapp chat filename (ending in txt): WhatsApp-chat met xxx.txt
Please enter your whatsapp name: Xxxx
try whatsapp processing
file date_format is  %d/%m/%y, %I:%M:%S %p
try messenger processing
date_format for the file is  %d/%m/%y, %I:%M:%S %p
something is wrong with the file
cleaning messanger data
finish processing

Traceback (most recent call last):
  File "/Users/xxx/code/persoonlijk/sms_analysis/simple_stats.py", line 508, in <module>
    stats_collections()
  File "/Users/xxx/code/persoonlijk/sms_analysis/simple_stats.py", line 485, in stats_collections
    generate_master_summary(pd_text)
  File "/Users/xxx/code/persoonlijk/sms_analysis/simple_stats.py", line 283, in generate_master_summary
    custom_stopwords=identify_custom_stopwords(list_of_entire_text)
  File "/Users/xxx/code/persoonlijk/sms_analysis/incoming_outgoing_msg.py", line 52, in identify_custom_stopwords
    one_stdev=statistics.mean(diff)-0.7*statistics.stdev(diff)
  File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/statistics.py", line 797, in stdev
    var = variance(data, xbar)
  File "/usr/local/Cellar/[email protected]/3.9.0_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/statistics.py", line 739, in variance
    raise StatisticsError('variance requires at least two data points')
statistics.StatisticsError: variance requires at least two data points

Process finished with exit code 1

Broken URL in requirements.txt

Hello! The second github link in the read me seems to be broken.
Thanks for making this. I'm trying to wrap my head around it now.

Suggested fix for UnicodeDecodeError

If the user's conversation includes certain characters, the following error appears

UnicodeDecodeError: 'charmap' codec can't decode byte X in position X: character maps to < undefined >

The issue stems from the input_into_list(file_name) function in preprocessing_script.py.

Changing a_file = open(file_name, "r") to a_file = open(file_name, "r", encoding="utf8") fixes the issue.

Requirements.txt file?

For simpler installation, the repo should have a requirements.txt file that lists each non standard pip module needed, so that one could simply run pip install -r requirements.txt

the file I have in mind would look like

pandas
matplotlib
emoji
spacy
nltk
transformers
torch

https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz#egg=en_core_web_sm
https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.3.1/en_vectors_web_lg-2.3.1.tar.gz#egg=en_vectors_web_lg

https://spacy.io/usage/models#models-download

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.