Comments (4)
Good night and thank you for your interest in Greeb! Natural language processing is more about pragmatic trade-offs rather than absolute parsing quality. At this moment, the text segmentation problem is not so important to me, but I promise that I'll improve the segmentation module within two weeks. FYI, the main challenge in statistical NLP is the lack of available datasets to be used for machine learning purposes. For instance, authors of the provided paper have access to НКРЯ — a huge linguistic dataset, but I'm not :)
Also, you can always send me a patch and I would like to accept it.
from greeb.
Hey hey hey. So, I have implemented a simple abbreviation recognizer inside of the Greeb::Parser
module. You can check your texts after updating to a new RC version of the gem.
>> text = 'Первое место завоевал Н.И.Иванов, второе К.Ф.Галиев, бронзовая ' \
'медаль у М.С.Абдуллина. 20 января в 14 часов в городском ' \
'шахматно-шашечном клубе (ул.Братская,9) стартует полуфинальный ' \
'турнир по русским (64-клеточным) шашкам, принять в нём участие ' \
'приглашаются все желающие. Domain: srgazeta.ru.';
>> pp Greeb::Parser.abbrevs(text).map { |e| [e, text[e.from...e.to]] }
=begin
[[#<struct Greeb::Entity from=22, to=26, type=:abbrev>, "Н.И."],
[#<struct Greeb::Entity from=41, to=45, type=:abbrev>, "К.Ф."],
[#<struct Greeb::Entity from=73, to=77, type=:abbrev>, "М.С."]]
=end
The used approach is very naïve, but useful in many practical situations.
from greeb.
Thanks, I'll try add it to my project!
from greeb.
Remember that you may combine tokenization output with abbreviation retrieval results like in https://github.com/ustalov/greeb/blob/master/bin/greeb. Hope that you'll find this useful.
from greeb.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from greeb.