Giter VIP home page Giter VIP logo

rfcde's Introduction

RFCDE

This repository provides an implementation of random forests designed for conditional density estimation (https://arxiv.org/abs/1804.05753). R and python packages are available. For installation details and package-specific documentation see the subdirectories r and python. Both languages use a common C++ library which can be found in the cpp subdirectory.

Photo-Z Example

We apply RFCDE to photometric redshift estimation for the LSST DESC DC-1. For members of the LSST DESC, you can find information on obtaining the data here.

import numpy as np
import pandas as pd
import rfcde

# Read in data
def process_data(feature_file, has_z=False):
    """Processes buzzard data"""
    df = pd.read_table(feature_file, sep=" ")
    df["ug"] = df["u"] - df["g"]

    df.assign(ug = df.u - df.g,
              gr = df.g - df.r,
              ri = df.r - df.i,
              iz = df.i - df.z,
              zy = df.z - df.y,
              ug_err = np.sqrt(df['u.err'] ** 2 + df['g.err'] ** 2),
              gr_err = np.sqrt(df['g.err'] ** 2 + df['r.err'] ** 2),
              ri_err = np.sqrt(df['r.err'] ** 2 + df['i.err'] ** 2),
              iz_err = np.sqrt(df['i.err'] ** 2 + df['z.err'] ** 2),
              zy_err = np.sqrt(df['z.err'] ** 2 + df['y.err'] ** 2))

    if has_z:
        z = df.redshift.as_matrix()
        df.drop('redshift', axis=1, inplace=True)
    else:
        z = None

    return df.as_matrix(), z

x_train, z_train = process_data('buzzard_spec_witherrors_mass.txt', has_z=True)
x_test, _ = process_data('buzzard_phot_witherrors_mass.txt', has_z=False)

# Fit the model
n_trees = 1000
mtry = 4
node_size = 20
n_basis = 31

forest = rfcde.RFCDE(n_trees=n_trees, mtry=mtry, node_size=node_size, n_basis=n_basis)
forest.train(x_train, z_train)

# Make predictions
bandwidth = 0.005
n_grid = 200
z_grid = np.linspace(0, 2, n_grid)
density = forest.predict(x_test, z_grid, bandwidth)

fRFCDE

Functional RFCDE (fRFCDE) is a variant of RFCDE which can efficiently handle functional input (https://arxiv.org/abs/1906.07177). In this variant, functional covariates are grouped together according to a Poisson process with parameter λ. It is included within the r and python package and the parameter λ can be set as follows:

import numpy as np
import rfcde

# Parameters
n_trees = 1000     # Number of trees in the forest
mtry = 4           # Number of variables to potentially split at in each node
node_size = 20     # Smallest node size
n_basis = 15       # Number of basis functions
bandwidth = 0.2    # Kernel bandwith - used for prediction only
lambda_param = 10  # Poisson Process parameter

# Fit the model
functional_forest = rfcde.RFCDE(n_trees=n_trees, mtry=mtry, node_size=node_size, 
                                n_basis=n_basis)
functional_forest.train(x_train, y_train, flamba=lambda_param)

# ... Same as RFCDE for prediction ...
library(RFCDE)

# Parameters
n_trees <- 1000     # Number of trees in the forest
mtry <- 4           # Number of variables to potentially split at in each node
node_size <- 20     # Smallest node size
n_basis <- 15       # Number of basis functions
bandwidth <- 0.2    # Kernel bandwith - used for prediction only
lambda_param <- 10  # Poisson Process parameter

# Fit the model
functional_forest <- RFCDE::RFCDE(x_train, y_train, n_trees = n_trees, mtry = mtry, 
                                  node_size = node_size, n_basis = n_basis, 
                                  flambda = lambda_param)

# ... Same as RFCDE for prediction ...

Citation

@article{pospisil2018rfcde,
  title={RFCDE: Random Forests for Conditional Density Estimation},
  author={Pospisil, Taylor and Lee, Ann B},
  journal={arXiv preprint arXiv:1804.05753},
  year={2018}
}
@article{pospisil2019(f)rfcde,
title={(f)RFCDE: Random Forests for Conditional Density Estimation and Functional Data},
author={Pospisil, Taylor and Lee, Ann B},
journal={arXiv preprint arXiv:1906.07177},
year={2019}
}

Troubleshooting (Python)

  1. Make sure that statsmodels is updated at the latest version - there an issue with version 0.8.0 in which datetools is not correctly imported;
  2. There might be issues installing on Mac OS Mojave, as there are known issues with XCode 10.X (this Stack Overflow article gives a more in-depth explanations). If the pip installation does not work, try
    • Set the global variable export MACOSX_DEPLOYMENT_TARGET=10.X (where 10.X is the OS version - for Mojave is 10.14), and then re-run the installation
    • Include CFLAGS='-stdlib=libstdc++' before pip install command, so CFLAGS='-stdlib=libstdc++' pip install rfcde

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.