Project for Intelligent Systems course of the EIT Digital data science master at UPM
This project aims to give an overview about the basic steps to perform Natural Language Processing (NLP) with R programming language.
In the first part of the assignment, the aim is to process a corpus found in the data folder, use a POS tagger and manually check the results for some sentences. From the results, I could conclude the main error is that the POS tagger did not take into account was the proper nouns in both singular and plural forms, such as America or Americans.
In the second part, the goal is to optimize the previous naive POS tagger by adding custom patterns to match certain POS tags, also to study the effect of patterns in terms of precision and recall.
The tasks were developed using R programming language, in the format of R markdown to explain every step.
- Angel Igareta ([email protected])