dm4ml / gate Goto Github PK
View Code? Open in Web Editor NEWDrift detection module for machine learning pipelines.
Home Page: https://dm4ml.github.io/gate/
License: MIT License
Drift detection module for machine learning pipelines.
Home Page: https://dm4ml.github.io/gate/
License: MIT License
When embeddings drift, it would be useful to drill down into examples of embeddings that have drifted. This involves:
Interesting approach for drift detection! Can you please tell me if the partition summary in the case of embeddings is the same as below (https://dm4ml.github.io/gate/how-it-works/) or are you taking into account other factors:
coverage: The fraction of the column that has non-null values.
mean: The mean of the column.
p50: The median of the column.
num_unique_values: The number of unique values in the column.
occurrence_ratio: The count of the most frequent value divided by the total count.
p95: The 95th percentile of the column.
I am currently trying to run the codes for the Data Validation in Production ML Pipelines course, and I run into the following problem both on my local machine and on the modal remote. I think it is the latest version of python+the pyarrow that cause this problem.
running build_ext
creating /tmp/pip-install-y2x7mnua/pyarrow_11f173a8029a4b4aafed72e11e381502/build/temp.linux-x86_64-cpython-312
-- Running cmake for PyArrow
cmake -DCMAKE_INSTALL_PREFIX=/tmp/pip-install-y2x7mnua/pyarrow_11f173a8029a4b4aafed72e11e381502/build/lib.linux-x86_64-cpython-312/pyarrow -DPYTHON_EXECUTABLE=/usr/local/bin/python -DPython3_EXECUTABLE=/usr/local/bin/python -DPYARROW_CXXFLAGS= -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_SUBSTRAIT=off -DPYARROW_BUILD_FLIGHT=off -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_DATASET=off -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=off -DPYARROW_BUILD_PARQUET_ENCRYPTION=off -DPYARROW_BUILD_PLASMA=off -DPYARROW_BUILD_GCS=off -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off -DPYARROW_USE_TENSORFLOW=off -DPYARROW_BUNDLE_ARROW_CPP=off -DPYARROW_BUNDLE_BOOST=off -DPYARROW_BUNDLE_CYTHON_CPP=off -DPYARROW_BUNDLE_PLASMA_EXECUTABLE=on -DPYARROW_GENERATE_COVERAGE=off -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_PARQUET_USE_SHARED=on -DCMAKE_BUILD_TYPE=release /tmp/pip-install-y2x7mnua/pyarrow_11f173a8029a4b4aafed72e11e381502
error: command 'cmake' failed: No such file or directory
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
May be changing to fixed pyarrow might solve the problem as during installation I noticed that it was requiring a version between >= 11.0.0 and <12.0.0
Currently, there are a fixed number of clusters of embeddings identified per partition. We want to:
For optimization
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.