Giter VIP home page Giter VIP logo

newsfluxus's Introduction

NewsFluxus

Tool for modelling change and persistence in newspaper content. For an exposition of the underlying method see Persistent News: The Information Dynamics of Nordic Newspapers and for design see News-fluxus design specification.

Publications:

  • K. L. Nielbo, R. B. Baglini, P. B. Vahlstrup, K. C. Enevoldsen, A. Bechmann, and A. Roepstorff, “News Information Decoupling: An Information Signature of Catastrophes in Legacy News Media,” arXiv:2101.02956 [cs].

Prerequisites

For running in virtual environment (recommended) and assuming python3.7+ is installed.

$ sudo pip3 install virtualenv
$ virtualenv -p /usr/bin/python3.7 venv
$ source venv/bin/activate

Installation

Clone repository and install requirements

$ git clone https://github.com/centre-for-humanities-computing/newsFluxus.git
$ pip3 install -r requirements.txt

GPU acceleration

Currently the requirements file installs torch and torchvision without support for GPU acceleration. If you want to use your accelerator(-s) comment out torch and torchvision in the requirements file, uninstall with pip (if relevant), and run pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html for your desired CUDA version (in this case 11.0+).

Install Mallet

Clone and install Mallet (plus dependencies)

$ sudo apt-get install default-jdk
$ sudo apt-get install ant
$ git clone [email protected]:mimno/Mallet.git
$ cd Mallet/
$ ant

Change path the local mallet installation in src/tekisuto/models/latentsemantics.py

Test Mallet wrapper

>>> from gensim.test.utils import common_corpus, common_dictionary
>>> from gensim.models.wrappers import LdaMallet

>>> path_to_mallet_binary = "/path/to/mallet/binary"
>>> model = LdaMallet(path_to_mallet_binary, corpus=common_corpus, num_topics=20, id2word=common_dictionary)

Download language resources

$ python downloader.py --langauge <language-code>
# ex. for Danish langauge resources
$ python downloader.py --language da

And you will be prompted for location to store data, just use default. To find language codes see Stanza

Test Stanza Installation

>>> import stanza

>>> nlp = stanza.Pipeline(lang="da")
>>> doc = nlp("Rap! rap! sagde hun, og så rappede de sig alt hvad de kunne, og så til alle sider under de grønne blade, og moderen lod dem se så meget de ville, for det grønne er godt for øjnene.")
>>> doc.sentences[0].print_dependencies()

Train model and extract signal

$ bash main.sh

And individually

$ python src/bow_mdl.py --dataset <path-to-dataset> --language <language-code> --bytestore <frequency-of-backup> --sourcename <name-of-dataset> --estimate "<start stop step>" --verbose <frequency-of-log>
$ python src/signal_extraction.py --model <path-to-serialized-model>
# ex. for Danish sample
$ python bow_mdl.py --dataset ../dat/sample.ndjson --language da --bytestore 100 --estimate "20 50 10" --sourcename sample --verbose 100
$ python python src/signal_extraction.py --model mdl/da_sample_model.pcl

Research use-case

Requires matplotlib

$ python src/news_uncertainty.py --dataset mdl/da_sample_signal.json --window 7 --figure "fig"

resulting visualizations in fig/

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request 😈

Versioning

Edition Date Comment
v1.0 June 04 2020 Launch
v1.1 January 14 2020 New NLP pipeline

Authors

Kristoffer L. Nielbo

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Stopwords ISO for their multilingual collection of stopwords.

newsfluxus's People

Contributors

hlasse avatar jankounchained avatar kennethenevoldsen avatar knielbo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.