Giter VIP home page Giter VIP logo

capstone_deception's Introduction

Project Title

Deception detection using NLP

Author Kapil Chopra

Executive summary

Various methods are deployed by the companies to detect deception in a conversation. Using machine learning, computers can provide "better than a chance" of identifying a deception. This data can be another data point for a human to decide which path to take. This solution can be deployed for fraud detection in Insurance claims, analyzing interview scripts, analyzing written evidences such as emails, product reviews, fake news detection, hiring, criminal investigations and transcribed public speeches.

Rationale

Information is all around us. We constantly consume information from online videos, online text, blogs, presenatations, book and human interactions. How do we know the information we are consuming is legit and worth our time? Which cases should the investigatoes should spend their time on?

Research Question

Could we score all presented text, video with a reliability or a deception score so that the user can focus on what matters!

Data Sources

Data sources for this capstone

https://www.kaggle.com/datasets/rtatman/deceptive-opinion-spam-corpus

https://www.kaggle.com/datasets/rmisra/news-headlines-dataset-for-sarcasm-detection

https://www.kaggle.com/code/therealsampat/fake-news-detection

Methodology

I would be applying the basic NLP techniques learnt in the course.

Results

LogisticRegression + has very good success in predicting this NLP use case.

Next steps

It's important to note that building an effective fraud detection system for reviews is an iterative process that involves continuous monitoring and refinement.

  1. Complementing the automated fraud detection techniques with manual review, user feedback, and business rules is a smart approach. Human oversight and feedback can provide valuable insights and help identify patterns that might be challenging for the automated system to detect. Additionally, incorporating business rules and domain-specific knowledge can further enhance the fraud detection capabilities.
  2. Ensemble methods, such as bagging or boosting, can indeed be beneficial for improving the model's performance. By combining multiple models or predictions, ensemble methods can help increase the system's robustness and accuracy. As the next steps - We can experiment with different ensemble techniques to find the best combination that works well for your fraud detection system.
  3. Acquiring a wider range of data, both in terms of quantity and diversity. The more diverse and representative our training and testing data is, the better the model will be able to generalize and detect fraudulent patterns across different scenarios.

Overall, by combining automated techniques with human insights, leveraging ensemble methods, and enhancing data quantity and diversity, we should be well-equipped to build a robust and effective fraud detection system in a text.

Outline of project

https://github.com/kapch1980/Capstone_Deception/blob/main/Capstone.ipynb

Contact and Further Information

Kapil Chopra

capstone_deception's People

Contributors

kapch1980 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.