kimryan / lingua-en-sentence Goto Github PK
View Code? Open in Web Editor NEWsplit text into sentences (a Perl module)
License: Other
split text into sentences (a Perl module)
License: Other
NAME Lingua::EN::Sentence - split text into sentences SYNOPSIS use Lingua::EN::Sentence qw( get_sentences add_acronyms ); add_acronyms('lt','gen'); ## adding support for 'Lt. Gen.' my $text = q{ A sentence usually ends with a dot, exclamation or question mark optionally followed by a space! A string followed by 2 carriage returns denotes a sentence, even though it doesn't end in a dot Dots after single letters such as U.S.A. or in numbers like -12.34 will not cause a split as well as common abbreviations such as Dr. I. Smith, Ms. A.B. Jones, Apr. Calif. Esq. and (some text) ellipsis such as ... or . . are ignored. Some valid cases canot be deteected, such as the answer is X. It cannot easily be differentiated from the single letter-dot sequence to abbreviate a person's given name. Numbered points within a sentence will not cause a split 1. Like this one. See the code for all the rules that apply. This string has 7 sentences. }; my $sentences=get_sentences($text); # Get the sentences. foreach my $sent (@$sentences) { $i++; print("SENTENCE $i:$sent\n"); } DESCRIPTION The C<Lingua::EN::Sentence> module contains the function get_sentences, which splits text into its constituent sentences, based on a regular expression and a list of abbreviations (built in and given). Certain well know exceptions, such as abbreviations, may cause incorrect segmentations. But some of them are already integrated into this code and are being taken care of. Still, if you see that there are words causing the get_sentences function to fail, you can add those to the module, so it notices them. Note that abbreviations are case sensitive, so 'Mrs.' is recognised but not 'mrs.' INSTALLATION To install this module, type the following: perl Makefile.PL make make test make install or perl Build.PL build build test build install MAINTAINER This project was originated by Shlomo Yona. Currently maintained by Kim Ryan
Module requires warnings 1.22 (from 5.19.9) limiting it to be being installable on perls 5.20 and higher, but it also depends on perl minimum version 5.10, where it is not installable due to the warnings version requirement.
use Lingua::EN::Sentence qw( get_sentences );
my $sentences = get_sentences("Donald Edmond Wahlberg Jr. (born August 17, 1969) is an American singer.");
print $sentences->[0], "\n";
Donald Edmond Wahlberg Jr.
$>```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.