To build a classification methodology to predict the quality of wafer sensors based on the given training data.
Manufacturer has deployed various wafers that are used in a semiconductor industry. Currently, if a wafer is faulty the entire manufacturing plant has to be stopped and each wafer has to be checked. In order to avoid shutting down all the lanes, sensors are deployed for each wafer. Sensors send signals that will help to identify the faulty wafer. A faulty wafer is labelled as -1 and intact one as +1. A machine learning algorithm needs to deployed to find the faulty wafer based on sensor signals.
courtesy: iNeuron Intelligence
Steps:
- Data Ingestion:
-
data for batch training
-
data validation
- create a schema json file for training and prediction
- sample file name "wafer_ddmmyyyy_hhmmss.csv"
- length of date stamp in file name = 8
- length of time stamp in file name = 6
- number of columns = 592
- column names and data types according to schema
- regex creation with file name syntax and compare against schema
- compare the length of date stamp and time stamp against schema
- compare the columns of file against the schema
- check if file contains all the columnns, else move to the good/bad folders
- check if any column contains only null values and move to bad files folder
- create a schema json file for training and prediction
-
data transformation
- check for missing values in all columns, replace NaN with 'NULL'
-
data insertion in db
-
- data processing
- export data from db to csv for training
- data preprocessing
- data clustering
- Model Training
- best model for each cluster
- hyperparameter tuning
- model saving
- Deployment
- Cloud setup
- Pushing app to cloud
- Application start
- Prediction
- data ingestion
- data processing
- model call for specific cluster
- prediction
- export prediction to csv
- Update config.yaml
- Update secrets.yaml [Optional]
- Update params.yaml
- Update the entity
- Update the configuration Manager in src config
- Update the components
- Update the pipeline
- Test run pipeline stage
- run tox for testing your package
- run "dvc repro" for running all the stages in pipeline