NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
For anyone at the Ministry of Justice trying to install this on the Analytical Platform: by default you can't.
Allen NLP uses jsonnet, which itself uses some C binaries that we don't have on the platform. Installing from pip doesn't include these dependencies. To get them you need a wheel that includes the C binaries already. Though I'm not sure if the platform will let you install that if you do find or make one.
To solve this you'll need to either:
investigate ways to get jsonnet to install on the platform
replace this package's use of Allen NLP with something else
The dependencies have some conflicts, starting with Scipy 1.4.1 and Pyarrow 0.16.0
NLTK also seems to be a version old enough to have security vulnerabilities.
Both of these should be fixable, but quite a few of the dependencies have had a lot of updates since this was made, so it might be worth doing a full check and update of requirements.