Parse out complete sentences from html (file or remote website)
This program was developed as a proof of concept to parse complete sentences from HTML while discarding any incomplete sentences.
It is helpful if you plan on further processing the text. One example would be sentiment analysis.
Figure 1: This shows the parsing of sentences from an SEC Filing.
This engine uses a combination of Regular Expressions to parse out unwanted characters and the OpenNLP English Maximum Entropy Sentence Detector.