AIF2 - is fully language independent NLP library. AIF2 written in Java 8+.
For details see official AIF site.
There is also free NLP course that is based on the AIF.
AIF2 - is fully language independent NLP library. AIF2 written in Java 8+.
Home Page: http://aif.io/
License: MIT License
AIF2 - is fully language independent NLP library. AIF2 written in Java 8+.
For details see official AIF site.
There is also free NLP course that is based on the AIF.
Everything on wiki =)
Collect texts for 3+ languages, in each languages: 3+ text with size 10-15k words.
Each text should include sentence counter in it
If there is some text with spaces in the beginning and "-" character after it. Empty string will be saved as first token. To check this issue use should_get_tokens_from_text_file_with_space_in_the_begining_using_PREDEFINED_separator test for TokenSplitter.
We need to separate RAW tests and Functional tests (http://testng.org/doc/index.html)
Just a quality which shows how it work in each lib
org.apache.commons.lang3.ArrayUtils used in ProbabilityBasedTokenSeparatorExtractor
but commons-lang lib not included in dependencies
it will be used for a quality test etc.
Words in the text placed on two lines of text without spaces betwing splitted as one token.
Precondition:
Steps
Expected result:
Actual result
Always return empty list
to use Stanford library in unit tests
All splitters should have one same name (split?)
we should't splitt: it's => it s
we should split ONLY if characters at the end of token
we need to be consistent with world-wide terminology
All splitters should have one same name (split?)
For good naming example we can use: http://www.monlp.com/2012/03/13/segmenting-words-and-sentences/
So far we assume that all sentence split characters are end-sentence characters, this is good for Alpha1 but should be fixed in Alpha2
to use AIF2 library in unit tests
to use OpenNLP library in unit tests
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.