Giter VIP home page Giter VIP logo

athu7 / extracting-keywords-from-legal-text-documents- Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 14 KB

You have been provided legal text document in Train_docs. In Train_tags, the relevant tags have been manually created by human annotators respectively to the files of Train_docs. Your task is to create a solution that automatically generate tags for the Test_docs legal text document

Jupyter Notebook 100.00%

extracting-keywords-from-legal-text-documents-'s Introduction

Extracting-keywords-from-legal-text-documents-

1st approach(Using LSTM)

  1. Data cleaning/ Text preprocessing:

For data cleaning I used regular expressions library, nltk library. A dataframe is created with cleaned documents for the training purpose.

  1. Text embedding:

Fist the text is converted by using onehot encoding then an embedding layer in the network is used to obtain the embedding vectors which are further passed to the LSTM.

  1. Model:

1st layer: Embedding layer 2nd layer: LSTM layer conbtaining 100 cells output layer: Dense output layer with sigmoid activation Loss function: BinaryCrossEntropy Optimizer: Adam optimizer

  1. Accuracy: There is some error in my implementation so not getting the required accuracy. But I tried differnet approaches such as used distilBERT for obtaining the embeddings. Tried changing the number of cells in LSTM also tried using multiple LSTM layers but all resulted in the same. Which is why I tried the second approach that is using the pre trained model for feature extraction.

2nd approach(Using pre trained models)

  1. I used different pre trained models such as rake, multirake, KeyBert for this approach.
  2. I have demonstrated the rake approach in my code.

extracting-keywords-from-legal-text-documents-'s People

Contributors

athu7 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.