NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.

License: MIT License

Makefile 5.85% Python 94.15%

airflow-pdf2embeddings's Issues

Allen NLP doesn't work with the Analytical Platform

For anyone at the Ministry of Justice trying to install this on the Analytical Platform: by default you can't.

Allen NLP uses jsonnet, which itself uses some C binaries that we don't have on the platform. Installing from pip doesn't include these dependencies. To get them you need a wheel that includes the C binaries already. Though I'm not sure if the platform will let you install that if you do find or make one.

To solve this you'll need to either:

investigate ways to get jsonnet to install on the platform
replace this package's use of Allen NLP with something else

Pinned dependency conflict

The dependencies have some conflicts, starting with Scipy 1.4.1 and Pyarrow 0.16.0

NLTK also seems to be a version old enough to have security vulnerabilities.

Both of these should be fixable, but quite a few of the dependencies have had a lot of updates since this was made, so it might be worth doing a full check and update of requirements.

Recommend Projects

moj-analytical-services / airflow-pdf2embeddings Goto Github PK

airflow-pdf2embeddings's Issues

Allen NLP doesn't work with the Analytical Platform

Pinned dependency conflict

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent