Giter VIP home page Giter VIP logo

databreaches's Introduction

DataBreaches

STUDY ON DATA BREACHES AND IMPACTS ON CUSTOMERS, USING R

This project is an attempt to analyze and study data breaches in general, and the customer reactions post the Organizations’ statement declaring about the data breaches. Data from two sources are being used here, the first is Privacy Rights Clearinghouse’s dataset about all the 8,638 DATA BREACHES made public since 2005. The other data we will be using for our analysis is Twitter’s user tweets about an Organization (here, “Equifax”). Analyzed and compared the user tweets exactly after the Organization declaring about the breach (Day-0) and the tweets from the present day. Also, those tweet’s sentiments were analyzed.

An Automation function was built that can be re-used to extract data and store in a dataframe. Data can be downloaded from either online, or loaded from the desktop folder, and file formats like .xlsx, .xls, .csv, .json can be extracted. The future scope for this function would be to include other sources like SAS files, SQL Server connection, etc.

Data Cleaning, Transformation, exploration, Visualization and Word Cloud were built on the Data Breaches data set. Following are the conclusions: i) 2016 had the highest number of individual records that were breached, ii) BSO (Other Business Organizations) have the highest share of records breached by organization type, iii) Most number of records breached by type of breach is HACK technique, iv) for organization type like BSO, HACK is the type of breach wherein highest number of records were lost and CARD type of breach has the lowest amongst Govt and NGOs, v) Word Cloud based on “Description of Incidents” is generated with significant words, vi) Yahoo has the highest number of records breached over all time and also faced multiple data breaches (2 times).

Textual Analysis on Customer reactions (tweets) was performed, and tweets were compared. The Day-0 tweets were highly negative compared to the present day’s tweets.

There needs to be a change in the attitudes of business and Government leaders internationally to take the threat of data breaches seriously by improving cybersecurity measures. Consumers who bear the brunt of all types of attacks need to be addressed and remedies provided by the companies. Organization’s top Management will be under heavy scrutiny, Healthcare will be the next big area for data breaches, IoT has increased the risk for data security (Experian, 2015)

I would like to give a special mention thanking my Professor Kishen Iyengar and my guide @aparnaadiraju92 for all the help and suggestions in completing my project

INTRODUCTION

DATA SCIENCE PROCESS

Data Science is the study of the generalizable extraction of knowledgeable information from data (Dhar, n.d.). Data Science involves techniques from multiple fields using scientific methods, algorithms and processes (“Data science,” 2018) which helps in transforming raw data into actionable insights for better decision making.

alt text

A typical Data Science model can be explained from the above figure. The first step as a part of any data analysis process would be to extract the data. Importing data into R is not complicated because of the standard libraries and other custom libraries that help in extracting data from multiple sources like files (.xls, .xlsx, .csv, .txt), Databases, JSON, SAS files, Web API, etc. The extracted data can be stored in a dataframe (or a list) for performing analysis.

After importing the data, we need to tidy (clean) it, to the required form for maintaining consistency. Operations like, removing duplicate information, removing blanks/NA’s, formatting the variables and ensuring each column is a variable and every row an observation. Transformation of data is also done where new variables can be created that can be calculated from the existing data.

Data Exploration through visualization is the next step in the process, as it can answer the basic questions we pose regarding the data. Visualization helps decision makers to identify patterns and guide us in the process of building a good model. The best way to represent visualization is through plots, charts and graphs.

The next stage in the analytical process is models, they can be custom built according to the requirement for answering the questions precisely. Multiple statistical and other techniques are used in model building.

The last stage in a data science project is communication to the stakeholders (marketing team, senior management, public, etc.) Data Scientists/ Analysts need to efficiently explain how a conclusion was reached through an easily understandable story and providing insights, highlighting the business impact and opportunity. And finally suggest the next course of action.

DATA BREACHES

According to International Standards Organization (ISO/IEC 27040) a data breach is defined as: a compromise of security that leads to the accidental or unlawful destruction, loss, alteration, unauthorized disclosure of, or access to protected data transmitted, stored or otherwise processed.

Data breach can occur by multiple means like: Credit card frauds, hacking, email phishing, hacking, insider attack, unintentional disclosure, physical loss of data, etc. Hacking is known to be the most popular technique and Financial Institutes and Businesses are targeted primarily since they have the highest number of user records.

Companies/Institutes tend to delay the communication to the consumers regarding any breach that occurs, this builds up a hysteria and negative sentiment towards the company when they finally reveal regarding the breach. By completing this project, we plan to measure the consumer responses and study in general the data breaches that have occurred in the past decade.

Some of the major data breaches that occurred in the recent past are of Yahoo (3 Billion user records), Equifax (143 Million credit records) and Target (50 Million credit card records).

databreaches's People

Contributors

mullapudirajaprashanth avatar

databreaches's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.