Giter VIP home page Giter VIP logo

tencent / fast-causal-inference Goto Github PK

View Code? Open in Web Editor NEW
130.0 7.0 25.0 74.19 MB

It is a high-performance causal inference (statistical model) computing library based on OLAP, which solves the performance bottleneck of the existing statistical model library (R/Python) under big data

License: Other

Shell 0.10% Python 3.94% C++ 3.88% Dockerfile 0.02% Kotlin 0.14% FreeMarker 0.11% Java 91.46% HTML 0.16% Batchfile 0.02% PigLatin 0.01% Ruby 0.01% SCSS 0.15% JavaScript 0.01% Makefile 0.01% CMake 0.01%

fast-causal-inference's Introduction

Fast-Causal-Inference

license Release Version PRs Welcome

Introduction

Fast Causal Inference is Tencent's first open-source causal inference project. It is an OLAP-based high-performance causal inference (statistical model) computing library, which solves the performance bottleneck of existing statistical model libraries (R/Python) under big data, and provides causal inference capabilities for massive data execution in seconds and sub-seconds. At the same time, the threshold for using statistical models is lowered through the SQL language, making it easy to use in production environments. At present, it has supported the causal analysis of WeChat-Search, WeChat-Video-Account and other businesses, greatly improving the work efficiency of data scientists.

Main advantages of the project:

  1. Provides the causal inference capability of second-level and sub-second level execution for massive data Based on the vectorized OLAP execution engine ClickHouse/StarRocks, the speed is more conducive to the ultimate user experience
    topology
  2. Provide basic operators, causal inference capabilities of high-order operators, and upper-level application packaging
    Support ttest, OLS, Lasso, Tree-based model, matching, bootstrap, DML, etc.
    topology
  3. Minimalist SQL usage SQLGateway WebServer lowers the threshold for using statistical models through the SQL language, and provides a minimalist SQL usage method on the upper layer, transparently doing engine-related SQL expansion and optimization
    topology

The first version already supports the following features:

Basic causal inference tools

  1. ttest based on deltamethod, support CUPED
  2. OLS, 100 million rows of data, sub-second level

Advanced causal inference tools

  1. OLS-based IV, WLS, and other GLS, DID, synthetic control, CUPED, mediation are incubating
  2. uplift: minute-level calculation of tens of millions of data
  3. Data simulation frameworks such as bootstrap/permutation are being developed to solve the problem of variance estimation without a displayed solution

Project application:

Already supported multiple businesses within WeChat, such as WeChat-Video-Account, WeChat-Search, etc.

Project open source address

github: https://github.com/Tencent/fast-causal-inference

Getting started

Preconditions
  1. The machine needs to install and start the docker service
    • Linux:

      • Centos:

        yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
        yum install docker-ce
        systemctl start docker

      • Ubuntu:

        sudo apt-get install docker-ce

      • verify docker service status:

        systemctl status docker

      • Install docker-compose container service orchestration tool

        pip3 install --upgrade pip && pip3 install docker-compose

    • MacOS:
      reference to https://docs.docker.com/desktop/install/mac-install/, Directly download the .dmg package and double-click to install it, Please make sure the docker service is running
      Add PATH:

      echo 'export PATH="/Applications/Docker.app/Contents/Resources/bin:$PATH"' >> ~/.bash_profile && . ~/.bash_profile

    • verify docker service status:

      docker ps

One-Click Deployment:

git clone https://github.com/Tencent/fast-causal-inference
cd fast-causal-inference && sh bin/deploy.sh
http://127.0.0.1

To start causal analysis, please refer to the built-in demo.ipynb

fast-causal-inference's People

Contributors

lw779861797-commits avatar huangyanyanyan avatar fffffffhhhhhhh avatar jixxiong avatar fhbai avatar

Stargazers

Ye Ding avatar baijing avatar ACNUL avatar lefay avatar  avatar crazyseabiscuit avatar  avatar joey7 avatar  avatar  avatar kun avatar  avatar  avatar xlows avatar  avatar Reliۣۖeved avatar Codievilky August avatar chuckiefan avatar  avatar Steven (Szu-Han) Chen avatar  avatar  avatar  avatar Maciej Beręsewicz avatar Alexander Fischer avatar zrg1048 avatar  avatar uniHk avatar  avatar Guyang Song avatar zhaomiao avatar  avatar Tom Jobs avatar Baden avatar  avatar Zhuoluo Yang avatar  avatar delongwu avatar Shixuan Sun avatar hongshi avatar Onebot avatar  avatar Zhaohu(Jonathan) Fan avatar  avatar  avatar  avatar Tevin avatar Sen avatar CCM avatar microxxx avatar  avatar  avatar Kangkona D.Han avatar  avatar be1be1 avatar arcosx avatar  avatar  avatar  avatar WuKongCoder avatar Sammy Credell avatar zhaodongsheng avatar Seven avatar  avatar Simon Cheung avatar  avatar Hertz avatar  avatar Pedro Correia avatar Shengmin Jin avatar FunAI avatar fan avatar Taketoday avatar cloud.eve avatar Athyrson Machado Ribeiro avatar BirdyLiu avatar Starcheus Sergey avatar Adi Lin avatar sayhi-x avatar Jeff avatar  avatar  avatar  avatar hmx avatar Sofia Faltenbacher avatar hero1122 avatar xiaolei565 avatar Nighthawk avatar  avatar Richard Ho avatar 周荷 avatar Alan Tang avatar Gavin avatar  avatar Lex.Chen avatar ShichaoHan avatar  avatar Wang Yong avatar  avatar FriendLey avatar

Watchers

TX avatar  avatar Zhaohu(Jonathan) Fan avatar  avatar 腾讯开源 avatar  avatar Jack_Gu avatar

fast-causal-inference's Issues

spark支持

您好!有没有考虑增加对spark的支持?如果有的话,考虑scala吗?

未看到StarRocks的demo

感谢开源,但是看介绍中提到:
“Based on the vectorized OLAP execution engine ClickHouse/StarRocks, the speed is more conducive to the ultimate user experience”
但实际demo中只看到clickhouse的配置,请问现在是否已经支持StarRocks,能否补上demo

spark

有没有spark上直接引入的jar包以及使用sql的文档说明呢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.