siddharthashandilya / insurance_fraud_detection Goto Github PK

The project will have a complete end-to-end pipeline for insurance fraud prediction model

License: MIT License

Shell 0.03% Python 8.12% Jupyter Notebook 91.85%

insurance_fraud_detection's Introduction

The project is about creating a model which predicts whether presnt insurance is fraud or not based on the various data provided by them in the input form

Usage
Installation
Recommended configurations
Custom configurations
Updating
Output
Uninstallation
Contributing
Future Scope
About Author

Usage

(Back to top)

Man pages have been added. Checkout man colorls.

Installation

(Back to top)

Install git (preferably, version >= 2.0) and python (preferably, version >=3.6) (windows) For Linux :

   sudo yum instal git -y
   sudo yum install python -y

Copy the github url from the repository :

https://github.com/SiddharthaShandilya/Dementia_detection_using_AI.git

Select a Directory in local system and use

git clone https://github.com/SiddharthaShandilya/Dementia_detection_using_AI.git

*Note for `git clone command`  Please make sure that you have proper internet connection. *

*Note for `python` Please try to anaconda for running the app.*

Create a seperate virtual environment to avoid conflict between python libraries :

    python3 -m venv new-env

In case for anaconda we can use below commands

    conda create --prefix ./env
    conda activate ./env

In case tou want to Activate the virtual env follow the given instructions: 👉 (click Here)
Install all the libraries for the application.

pip3 install -r requirements.txt

Once the environment is created use following commands to start

git init
dvc init
dvc dag
dvc repro

Have a look at Recommended configurations and Custom configurations.

Custom configurations

(Back to top)

In the project we are running flask application by using python3 which might not work so try below mentioned commands:
```
python/python3 app.py
```
or
```
flask run 
```

Recommended configurations

(Back to top)

You can overwrite the existing code according to your needs and changing them.

Note :

Please have a look at the dvc.yaml file , Here i have used python3.7 version so if your console takes python3 to run python make sure to change all the commands in dvc.yaml file.

If any change regarding the file are concerned you are advised to change the config file in location ' /config/config.yaml '. For eg.. the location of dataset in thsi code is of a lcoal storage ../dementia_dataset, this might not be in you case so change the code accordingly.

Updating

(Back to top)

Want to update to the latest version of dementia_detection? make the required change and give us a pull request

git push https://github.com/SiddharthaShandilya/Dementia_detection_using_AI.git

Output

(Back to top)

For Downloading Docker Image

docker pull centos104/dementia_detection_web_app

Below is some screenshot of the web application when it is successfully launched.

Uninstallation

(Back to top)

Want to uninstall ? No issues (sob). Please feel free to open an issue regarding how we can enhance dementia_detection app.

ctrl + A, ctrl + shift + delete

Contributing

(Back to top)

Your contributions are always welcome! Please have a look at the contribution guidelines first. 🎉

Future Scope

(Back to top)

we are currently working on the UI part as well as trying to make section in the web-app to upload a MRI scan to locate demtia. Feel free to improve the code or share some innovative ideas.

About Author

(Back to top)

Siddhartha Shandilya

insurance_fraud_detection's People

Contributors

Stargazers

Watchers

Forkers

weeshlow siddhartha-shandilya-infrrd

insurance_fraud_detection's Issues

[ID - 12] Inserting new data to the established database

once the database is created we will shift the input files from good data folder to that database and delete the good data folder

[ID-22] Data Clustering on Cleaned Data

[ID-8] Creating database for validated and transformed raw data

Once we are done replacing blank spaces present in the input raw file with "Null" keyword we will move on to put the values in database based on the updated schema.json file

[ID-14] perform EDA on the ingested data and clean the data for model training

once we are done with creating a data ingestion pipeline , we'll move towards creating a Data processing pipeline which will clean the ingested data and prepare it for model training

Data cleaning: Remove or impute missing values, handle duplicates and outliers, and address any other data quality issues.
Data transformation: Apply necessary data transformations like normalization, standardization, or encoding categorical variables.
Univariate analysis: Analyze the distribution and characteristics of individual variables using visualizations like histograms, density plots, box plots, and summary statistics.
Bivariate analysis: Examine the relationship between two variables using scatter plots, correlation coefficients, and other statistical tests.
Multivariate analysis: Analyze the relationship between three or more variables using techniques like heatmaps, cluster analysis, and principal component analysis (PCA).
Feature engineering: Create new features by combining or transforming existing ones based on insights gained from the earlier analysis.

[ID-17] Reformatting the Present Code using Black

[ID-24] Train Model for each cluster

[ID-3] Data Transformation - replace missing values with NULL/ ""

Once the Raw data file is validated we move on to the data transformation part.
this part involves changing the blank space present in the csv file to "NULL" values

the validated files are stored in artifacts/data/good-data dir

column name validation .

Provide a function "column_name_validation" which will match the columns present in the input files with the provided schema.json file .
Put the function under src/data_processing/data_validation.py inside Raw_Data_Validation class

[ID-6] validate missing values for whole column

This function validates if any column in the csv file has all values missing. If all the values are missing, the file is not suitable for processing. Such files are moved to bad raw data.

get value from schema.json file and use it to match the raw filename and shift it to god data folder else move the file to bad data folder

Created get_value_from_schema function and raw_file validation function inside src/data_processing/data_validation.py Raw_Data_Validation class

[ID-23] Implement code to update DB after data processing stage

[ID-15] Add readme file to the artifacts/data

Add readme file to the artifacts/data folder mentioning the link to raw data

create a README.md file inside the data directory
Update the readme file with the link to download the raw data

siddharthashandilya / insurance_fraud_detection Goto Github PK

insurance_fraud_detection's Introduction

Table of contents

Usage

Installation

Custom configurations

Recommended configurations

Updating

Output

Uninstallation

Contributing

Future Scope

About Author

insurance_fraud_detection's People

Contributors

Stargazers

Watchers

Forkers

insurance_fraud_detection's Issues

Recommend Projects

Recommend Topics

Recommend Org