Giter VIP home page Giter VIP logo

nicolay-r / arelight Goto Github PK

View Code? Open in Web Editor NEW
36.0 4.0 2.0 25.49 MB

Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts, powered by AREkit

Home Page: https://link.springer.com/chapter/10.1007/978-3-031-56069-9_23

License: MIT License

Python 50.89% Shell 0.03% Jupyter Notebook 49.08%
deep-learning machine-learning nlp tensorflow sentiment-analysis relation-extraction brat deeppavlov arekit attitudes

arelight's Introduction

Hi I'm Nicolay! 👋

  • My personal website at github for more information about me
  • Combine it with track-and-field 🏃‍♂️, ⛷️ and 🌊🏄‍♂️

The most recent

arelight's People

Contributors

guardeec avatar nicolay-r avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

arelight's Issues

Docker version

Issues that we are encountered with:

  • DeepPavlov resources might not unzipped (zipp==3.6.0 issue)
  • For pyMystem3: nlpub/pymystem3#21
  • Download AREkit data

Keep embedding locally

Since this project was a part of the AREkit previosly, the incomplete refactoring has been performed.

Download script could not be executed.

Clarification on language support

When coming to the readme, I'm presented with english language, though all screenshots of the tool showcase cyryllic characters. A search in the readme for "languages" does not yield any results.

It would be very benefitial if you could clarify in the readme which languages are supported. I'm sure I can dig this up reading through the research paper but IMO that's unnessary complex to users coming to this repo,

`infer_bert` -- raises "Attempt to free invalid pointer" on loading and inferring tensorflow model

When running

python infer_bert.py --from-files ../data/texts-inosmi-rus/e1.txt \
    --labels-count 3 \
    --terms-per-context 50 \
    --tokens-per-context 128 \
    --text-b-type nli_m \
    -o output/brat_inference_output

I get

...
INFO:tensorflow:Restoring parameters from /content/ARElight/data/models/ra-20-srubert-large-neut-nli-pretrained-3l-finetuned/ra-20-srubert-large-neut-nli-pretrained-3l
  0%|                                                                                       | 0/1253 [00:00<?, ?opins/s]src/tcmalloc.cc:283] Attempt to free invalid pointer 0x107e00000 

and the process freezes.

Google colab, Python 3.7, tensorflow 1.15.0, numpy 1.21.6, deeppavlov 0.11.0, arekit installed from git. Tried restarting the runtime, doesn't help.

SynonymsCollection -- missed element results in inference script exception

Китай все-таки намерен ввести санкционные меры против РФ и в дальнешем, Югославии.

Causes:

 File "/media/nicolay/96ed6537-b931-4f7e-8ac4-8407527ddbf9/proj/REmarker/venv/lib/python3.6/site-packages/arekit/common/news/entities_grouping.py", line 15, in apply_core
    group_index = self.__value_to_group_id_func(entity.Value)
  File "/media/nicolay/96ed6537-b931-4f7e-8ac4-8407527ddbf9/proj/REmarker/venv/lib/python3.6/site-packages/arekit/common/synonyms.py", line 57, in get_synonym_group_index
    return self.__get_group_index(value)
  File "/media/nicolay/96ed6537-b931-4f7e-8ac4-8407527ddbf9/proj/REmarker/venv/lib/python3.6/site-packages/arekit/common/synonyms.py", line 130, in __get_group_index
    return self.__by_sid[sid]
KeyError: 'китай'

Possible Solution: Considering Synoyms Collection Expansion!

Reference to AREkit constants

  • setup constants
  • provide batch-size as a parameter

data = {"text_a": [], "text_b": [], "row_ids": []}
for row_ind, row in samples:
# Considering unique rows only.
if row["id"] in used_row_ids:
continue
data["text_a"].append(row['text_a'])
data["text_b"].append(row['text_b'])
data["row_ids"].append(row_ind)
used_row_ids.add(row["id"])
batch_size = 10
for i in range(0, len(data["text_a"]), 10):
texts_a = data["text_a"][i:i + batch_size]
texts_b = data["text_b"][i:i + batch_size]
row_ids = data["row_ids"][i:i + batch_size]

Remove RuSentRel collection trainings

reason: this project is dedicated to the processing of a single file or a list of files. Hence there is a need to exclude collections.

Fix readme as well.

This is not functionality of 0.22.1

Feedback -- Serialization Generalization

Can you explain a little bit what kind of data and which kind of models need to be available in a language to apply your framework?

Present limitations:

  • Focused on the neural networks.
  • Frames annotation is hidden
  • Embedding -- only related to neural networks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.