athu7 / extracting-keywords-from-legal-text-documents- Goto Github PK

View Code? Open in Web Editor NEW

You have been provided legal text document in Train_docs. In Train_tags, the relevant tags have been manually created by human annotators respectively to the files of Train_docs. Your task is to create a solution that automatically generate tags for the Test_docs legal text document

Jupyter Notebook 100.00%

extracting-keywords-from-legal-text-documents-'s Introduction

Extracting-keywords-from-legal-text-documents-

1st approach(Using LSTM)

Data cleaning/ Text preprocessing:

For data cleaning I used regular expressions library, nltk library. A dataframe is created with cleaned documents for the training purpose.

Text embedding:

Fist the text is converted by using onehot encoding then an embedding layer in the network is used to obtain the embedding vectors which are further passed to the LSTM.

Model:

1st layer: Embedding layer 2nd layer: LSTM layer conbtaining 100 cells output layer: Dense output layer with sigmoid activation Loss function: BinaryCrossEntropy Optimizer: Adam optimizer

Accuracy: There is some error in my implementation so not getting the required accuracy. But I tried differnet approaches such as used distilBERT for obtaining the embeddings. Tried changing the number of cells in LSTM also tried using multiple LSTM layers but all resulted in the same. Which is why I tried the second approach that is using the pre trained model for feature extraction.

2nd approach(Using pre trained models)

I used different pre trained models such as rake, multirake, KeyBert for this approach.
I have demonstrated the rake approach in my code.

Recommend Projects

athu7 / extracting-keywords-from-legal-text-documents- Goto Github PK