Giter VIP home page Giter VIP logo

spring-boot-opennlp's Introduction

Spring Boot OpenNLP Demo

  • This is a Spring Boot application that demonstrates the usage of Apache OpenNLP for natural language processing (NLP) tasks such as language detection, sentence detection, and tokenization. The service processes HTML text, converts it to plain text, detects the language, tokenizes the text, and maps tokens to their types and IDs using database data.
  • I had to implement this feature to enhance recommendations in my end of studies project so i decided to share it with you

Table of Contents

Features

  • Convert HTML text to plain text ( job-offer description is saved in a html format in the db)
  • Detect the language of the text
  • Detect sentences in the text
  • Tokenize the text into individual tokens
  • Map tokens to their types and IDs using data from a database

Getting Started

Prerequisites

To build and run this project, you will need:

  • Java 11 or higher
  • Maven
  • A database (configured in application.yml)
  • Configure your database connection in src/main/resources/application.yml:
    database:
      url: your_database_url
      user: your_database_username
      password: your_database_password
    

Installation

  1. Clone the repository:

    git clone https://github.com/Oussemasahbeni/spring-boot-openNLP.git
    cd spring-boot-opennlp-demo
  2. Build the project using Maven:

  mvn clean install

Configuration

Ensure you have the necessary model files for OpenNLP:

  • langdetect-183.bin
  • Sentence and tokenizer model files for the languages you want to support:
    • opennlp-en-ud-ewt-sentence-1.0-1.9.3.bin
    • opennlp-en-ud-ewt-tokens-1.0-1.9.3.bin.

Contributing

Contributions are welcome! Please fork the repository and create a pull request with your changes.

spring-boot-opennlp's People

Contributors

oussemasahbeni avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.