Giter VIP home page Giter VIP logo

clhs_py's Introduction

cLHS: Conditioned Latin Hypercube Sampling

Documentation Status GitHub binder license pepy

This Python package is based on the Conditioned Latin Hypercube Sampling (cLHS) method of Minasny & McBratney (2006). It follows some of the code from the R package clhs of Roudier et al.

  • It attempts to create a Latin Hypercube sample by selecting only from input data.
  • It uses simulated annealing to force the sampling to converge more rapidly.
  • It allows for setting a stopping criterion on the objective function described in Minasny & McBratney (2006).

You may reproduce the jupyter notebook example on Binder.

Please check online documentation for more information.

clhs_py's People

Contributors

wagoner47 avatar zhonghua-zheng avatar jiaweizhuang avatar

Stargazers

 avatar  avatar Nilovna Chatterjee avatar Matt Tankersley avatar Di WAN avatar Nikitin Artyom avatar  avatar verus avatar Alejandro © avatar Thomas G. avatar  avatar Mikhail Gasanov avatar  avatar Massimo Di Stefano avatar

Watchers

James Cloos avatar

clhs_py's Issues

Does not work if the variables are only categorical

If the variables are categorical only, the clhs function does not work since it is still looking to find continuous variables.

import pandas as pd
import itertools

# define the range and maximum value
values = range(0, 7)
max_value = 14

# generate all possible combinations of 7 integers from the defined range
combinations = itertools.product(values, repeat=7)

# filter the combinations that have a sum at most equal to the maximum value
result = [c for c in combinations if sum(c) <= max_value]
input_space = [list(x) for x in result]

columns=["X1", "X2", "X3", "X4","X5", "X6", "X7"]
df_input_space = pd.DataFrame(input_space,columns=columns) 

for j in df_input_space.columns:
    for i in range(len(df_input_space)):
        df_input_space[j][i] = str(df_input_space[j][i])                        

import clhs as cl
num_sample = 30


sampled_points=cl.clhs(df_input_space[["X1","X2","X3","X4","X5","X6","X7"]], num_sample, max_iterations=1000)

I get the following error message:

cLHS:  0%|          |0/1000 [Elapsed time: 0.0017247200012207031, ETA: 0, ?it/s]
Traceback (most recent call last):

  File "<ipython-input-20-cbd43074fe60>", line 40, in <module>
    sampled_points=cl.clhs(df_input_space[["X1","X2","X3","X4","X5","X6","X7"]], num_sample, max_iterations=1000)

  File "/home/antonis/anaconda3/lib/python3.8/site-packages/clhs/clhs.py", line 744, in clhs
    sample_indices, remaining_indices = resample(

  File "/home/antonis/anaconda3/lib/python3.8/site-packages/clhs/clhs.py", line 555, in resample
    return resample_worst(

  File "/home/antonis/anaconda3/lib/python3.8/site-packages/clhs/clhs.py", line 500, in resample_worst
    continuous_objective_values[idx_not_included], return_inverse=True)

IndexError: index 0 is out of bounds for axis 0 with size 0
Is this by design? Or is it a bug?

Numpy Error

Hi,

Not sure this is still being supported, but I was thrilled to find your python implementation of the R package clhs as it's exceedingly useful in my current work. And R is painful for the uninitiated. I've tested the library by following your example (https://clhs-py.readthedocs.io/en/latest/notebooks/quickstart.html)

I've downloaded a stack of soil attributes from GEE (https://developers.google.com/earth-engine/datasets/catalog/CSIRO_SLGA) and used gdalwarp to cut out pixels outside my shapefile for all 18 layers and then used georasters to convert them to pandas.

My initial goal was to use the 18 layers (AWC, Total N and SOC for 0-5,5-15,15-30,30-60,60-100) separately, but when that didn't work I just took the mean of the 3 variables over the 0-100cm profile as a test. This also didn't work. Strangely I can't see any difference in our dataframes (after checking the xarray tutorial data), except mine will have discontinuities in x and y as some cells were cut as outside the shapefile. Example dataframe (dfOut) attached and code below.

num_sample=50
sampled=clhs.clhs(dfOut[['AWC_0-100','NTO_0-100', 'SOC_0-100']],
num_sample, max_iterations=1000)
clhs_sample=dfOut.iloc[sampled["sample_indices"]]
df_testing_clhs.csv

Error below (looks like nothing has been added to the continuous_objective_values object):

IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_21692/2054753940.py in
4 # cLHS
5 sampled=clhs.clhs(dfOut[['AWC_0-100','NTO_0-100', 'SOC_0-100']],
----> 6 num_sample, max_iterations=1000)
7 clhs_sample=dfOut.iloc[sampled["sample_indices"]]

~\Anaconda3\envs\py3_geo\lib\site-packages\clhs\clhs.py in clhs(predictors, num_samples, good_mask, include, max_iterations, objective_func_limit, initial_temp, temp_decrease, cycle_length, p, weights, progress, random_state, **progress_kwargs)
746 previous_results["remaining_indices"],
747 previous_results["obj_continuous"],
--> 748 include)
749 x_continuous = continuous_predictors.iloc[sample_indices].copy()
750 x_categorical = categorical_predictors.iloc[sample_indices].copy()

~\Anaconda3\envs\py3_geo\lib\site-packages\clhs\clhs.py in resample(p, sampled_indices, remaining_indices, continuous_objective_values, include)
555 return resample_worst(
556 continuous_objective_values, sampled_indices, remaining_indices,
--> 557 include)
558
559

~\Anaconda3\envs\py3_geo\lib\site-packages\clhs\clhs.py in resample_worst(continuous_objective_values, sampled_indices, remaining_indices, include)
498 np.isin(new_sampled_indices, include_, assume_unique=True, invert=True)]
499 unique_obj_vals, inverse_indices = np.unique(
--> 500 continuous_objective_values[idx_not_included], return_inverse=True)
501 idx_worst = idx_not_included[
502 inverse_indices[(inverse_indices == unique_obj_vals.size - 1)]]

IndexError: index 0 is out of bounds for axis 0 with size 0

Thanks for your time
Ian

`bins` must increase monotonically, when an array

Hello,

Thank you for creating this implementation of cLHS in Python.

I ran it on my computer and I got an error. This happens when the number of samples is high and the cumulative distribution of my data has the same value for a long time.
I did a bit of research and found out that the get_strata function is not always returning a monotonic quantiles list which raises an error when used in the count_matrix function. I am not sure why this is the case but it seems that rounding the quantiles list fix the problem.
BR,
Alexandre

A bug when running clhs with dataframe

cLHS:  0%|          |0/10000 [Elapsed time: 0, ETA: 0, ?it/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-33-d013e88c7467> in <module>
----> 1 clhs.clhs(df_test[feature_ls].values, 10)

/opt/miniconda/envs/py3/lib/python3.6/site-packages/clhs/clhs.py in clhs(predictors, num_samples, good_mask, include, max_iterations, objective_func_limit, initial_temp, temp_decrease, cycle_length, p, weights, progress, random_state, **progress_kwargs)
    746             previous_results["remaining_indices"],
    747             previous_results["obj_continuous"],
--> 748             include)
    749         x_continuous = continuous_predictors.iloc[sample_indices].copy()
    750         x_categorical = categorical_predictors.iloc[sample_indices].copy()

/opt/miniconda/envs/py3/lib/python3.6/site-packages/clhs/clhs.py in resample(p, sampled_indices, remaining_indices, continuous_objective_values, include)
    555         return resample_worst(
    556             continuous_objective_values, sampled_indices, remaining_indices,
--> 557             include)
    558 
    559 

/opt/miniconda/envs/py3/lib/python3.6/site-packages/clhs/clhs.py in resample_worst(continuous_objective_values, sampled_indices, remaining_indices, include)
    498         np.isin(new_sampled_indices, include_, assume_unique=True, invert=True)]
    499     unique_obj_vals, inverse_indices = np.unique(
--> 500         continuous_objective_values[idx_not_included], return_inverse=True)
    501     idx_worst = idx_not_included[
    502         inverse_indices[(inverse_indices == unique_obj_vals.size - 1)]]

IndexError: index 0 is out of bounds for axis 0 with size 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.