why test? confidence of the system을 측정할 수 있어야 한다. m

Testing ML Systems ML system은 더 세심하게 테스트되어야 한다. 왜? rules들이 더

testing theory <a target="_blank" rel="noopener noreferrer nofollo

test data schema <div class="highlight highlight-source-python notranslate posit

testing data engineering <div class="highlight highlight-source-python notransla

ML system testing concepts about til HOT 4 OPEN

yeomko22 commented on July 29, 2024

ML system testing concepts

from til.

Comments (4)

yeomko22 commented on July 29, 2024

Testing ML Systems

ML system은 더 세심하게 테스트되어야 한다. 왜? rules들이 더 모호하게 defined 되었기 때문이다.

Key testing principles for ML: pre-deployemnt

use a schema for features
model specification test: model config 변경은 unit test가 필요하다.
validate model quality: sudden degradation, slow degradation 테스트를 해야함
test input feature code
training is reproducible: random seed 고정 등
integration test the pipeline

from til.

yeomko22 commented on July 29, 2024

testing theory

unit test: 단일 로직, 단일 클래스에 대한 테스트
integration test: assembled component에 대한 테스트
system test: end to end test

How much testing?

prioritise the code base?
what is mission critical?
test reduce uncertainty about your system?

from til.

yeomko22 commented on July 29, 2024

test data schema

iris_schema = {
    'sepal length': {
        'range': {
            'min': 4.0,  # determined by looking at the dataframe .describe() method
            'max': 8.0
        },
        'dtype': float,
    },
    'sepal width': {
        'range': {
            'min': 1.0,
            'max': 5.0
        },
        'dtype': float,
    },
    'petal length': {
        'range': {
            'min': 1.0,
            'max': 7.0
        },
        'dtype': float,
    },
    'petal width': {
        'range': {
            'min': 0.1,
            'max': 3.0
        },
        'dtype': float,
    }
}

import unittest
import sys

class TestIrisInputData(unittest.TestCase):
    def setUp(self):
        
        # `setUp` will be run before each test, ensuring that you
        # have a new pipeline to access in your tests. See the 
        # unittest docs if you are unfamiliar with unittest.
        # https://docs.python.org/3/library/unittest.html#unittest.TestCase.setUp
        self.pipeline = SimplePipeline()
        self.pipeline.run_pipeline()
    
    def test_input_data_ranges(self):
        # get df max and min values for each column
        max_values = self.pipeline.frame.max()
        min_values = self.pipeline.frame.min()
        
        # loop over each feature (i.e. all 4 column names)
        for feature in self.pipeline.feature_names:
            
            # use unittest assertions to ensure the max/min values found in the dataset
            # are less than/greater than those expected by the schema max/min.
            self.assertTrue(max_values[feature] <= iris_schema[feature]['range']['max'])
            self.assertTrue(min_values[feature] >= iris_schema[feature]['range']['min'])
            
    def test_input_data_types(self):
        data_types = self.pipeline.frame.dtypes  # pandas dtypes method
        
        for feature in self.pipeline.feature_names:
            self.assertEqual(data_types[feature], iris_schema[feature]['dtype'])

schema를 미리 정의해놓고 inpute data가 스키마에서 주어진 min, max를 만족하는지, data type은 만족하는지 테스트를 한다.

from til.

yeomko22 commented on July 29, 2024

testing data engineering

import unittest


class TestIrisDataEngineering(unittest.TestCase):
    def setUp(self):
        self.pipeline = PipelineWithDataEngineering()
        self.pipeline.load_dataset()
    
    def test_scaler_preprocessing_brings_x_train_mean_near_zero(self):
        # Given
        # convert the dataframe to be a single column with pandas stack
        original_mean = self.pipeline.X_train.stack().mean()
        
        # When
        self.pipeline.apply_scaler()
        
        # Then
        # The idea behind StandardScaler is that it will transform your data 
        # to center the distribution at 0 and scale the variance at 1.
        # Therefore we test that the mean has shifted to be less than the original
        # and close to 0 using assertAlmostEqual to check to 3 decimal places:
        # https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmostEqual
        self.assertTrue(original_mean > self.pipeline.X_train.mean())  # X_train is a numpy array at this point.
        self.assertAlmostEqual(self.pipeline.X_train.mean(), 0.0, places=3)
        print(f'Original X train mean: {original_mean}')
        print(f'Transformed X train mean: {self.pipeline.X_train.mean()}')
        
    def test_scaler_preprocessing_brings_x_train_std_near_one(self):
        # When
        self.pipeline.apply_scaler()
        
        # Then
        # We also check that the standard deviation is close to 1
        self.assertAlmostEqual(self.pipeline.X_train.std(), 1.0, places=3)
        print(f'Transformed X train standard deviation : {self.pipeline.X_train.std()}')

데이터에 StandardScaler를 적용하는 전처리를 하였다.
이 전처리가 우리가 의도한 대로 적용이 되었는지를 테스트한다.
데이터 전처리 로직을 테스트 하기에 쉽데록 잘 쪼개는 것이 중요하다.
좋은 아이디어는 파이프라인 자체를 클래스로 만든 뒤, load_data까지만 수행하고 그 다음 스텝들을 차례로 수행하면서 테스트를 진행한다.
즉, 데이터 전처리를 수행하는 함수들은 리턴 값을 주지 않고 파이프라인 객체 내의 frame에만 동작을 수행한다. bigdata-platform 프로젝트에서도 이렇게 테스트를 짰더라면 훨씬 편했을 것이다.

from til.

ML system testing concepts about til HOT 4 OPEN

Comments (4)

Testing ML Systems

Key testing principles for ML: pre-deployemnt

testing theory

How much testing?

test data schema

testing data engineering

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent