Giter VIP home page Giter VIP logo

lingua-en-sentence's Introduction

NAME

  Lingua::EN::Sentence - split text into sentences
  
SYNOPSIS

	use Lingua::EN::Sentence qw( get_sentences add_acronyms );
	
	add_acronyms('lt','gen');		## adding support for 'Lt. Gen.'
	my $text = q{
	A sentence usually ends with a dot, exclamation or question mark optionally followed by a space!
	A string followed by 2 carriage returns denotes a sentence, even though it doesn't end in a dot
	
	Dots after single letters such as U.S.A. or in numbers like -12.34 will not cause a split
	as well as common abbreviations such as Dr. I. Smith, Ms. A.B. Jones, Apr. Calif. Esq.
	and (some text) ellipsis such as ... or . . are ignored.
	Some valid cases canot be deteected, such as the answer is X. It cannot easily be
	differentiated from the single letter-dot sequence to abbreviate a person's given name.
	Numbered points within a sentence will not cause a split 1. Like this one.
	See the code for all the rules that apply.
	This string has 7 sentences.
	};
	
	my $sentences=get_sentences($text);	# Get the sentences.
	foreach my $sent (@$sentences)
	{
		$i++;
		print("SENTENCE $i:$sent\n");
	}


DESCRIPTION

The C<Lingua::EN::Sentence> module contains the function get_sentences, which
splits text into its constituent sentences, based on a regular expression and a
list of abbreviations (built in and given).

Certain well know exceptions, such as abbreviations, may cause incorrect
segmentations. But some of them are already integrated into this code and are
being taken care of. Still, if you see that there are words causing the
get_sentences function to fail, you can add those to the module, so it notices them.
Note that abbreviations are case sensitive, so 'Mrs.' is recognised but not 'mrs.'

  

INSTALLATION

To install this module, type the following:

   perl Makefile.PL
   make
   make test
   make install
   
   or
   
   perl Build.PL
   build
   build test
   build install
   

MAINTAINER

This project was originated by Shlomo Yona. Currently  maintained
by Kim Ryan

lingua-en-sentence's People

Contributors

jraspass avatar kimryan avatar manwar avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

lingua-en-sentence's Issues

Jr and Sr should perhaps be added to PEOPLE array!?

use Lingua::EN::Sentence qw( get_sentences );

my $sentences = get_sentences("Donald Edmond Wahlberg Jr. (born August 17, 1969) is an American singer.");

print $sentences->[0], "\n";
Donald Edmond Wahlberg Jr.

$>```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.