PipeLines- >>
1.data-formation of stock releted headlines for prediction of stock(nifty-50) via web scrapping .
1.1 nifty data from ( timesofindia - website )
1.2 economy data from ( moneycontrol website )
1.3 mutualFund data from ( moneycontrol website )
1.4 ipo data from ( moneycontrol website )
1.5 personal_finance data from ( moneycontrol website )
1.6 business data from ( moneycontrol website )
1.7 indian_political data from ( moneycontrol website )
1.8 trends data from ( moneycontrol website )
1.9 election2024 data from ( moneycontrol website )
1.10 adding the random() dynamic generated depended target (y) variables
- DATA-PREPROCESSING
2.1 make the new list of data_review -> merge the first row all sentence into it and store it, into data_review (list)
2.2 insert this into column and after intersrting it into column , preporcessing the whole data_review(final data) -> with
(a) lower_case
(b) only aplebetical
(c) stopwords
(d) lemmatization
2.3 seperate depended varibles and independed variables
-
feature -extration part
3.1 train-test-split (4:1)
3.2 vectorizer (count and tfidf) -
apply NLP MODEL
4.1 logistic regression calssification model
(a) using count vectorizer - get accuracy - 50%
(b) using tfidf vectorizer - get accuracy - 55%4.2 NAIVE - BAYES MODEL
both count vectorize as well as tfidf vectorizer
4.3 PASSIVE AGGRESSIVE CLASSIFIER (tfidf vectorizer)
accuracy with bi-gram-56 %
accuracy with tri gram -60%
4.4 Random forest classifier
got accuracy with 60%