wagoner47 / clhs_py Goto Github PK

Conditioned Latin Hypercube Sampling in Python - Docs:

Home Page: https://clhs-py.readthedocs.io/

License: Other

Python 100.00%

clhs_py's Introduction

cLHS: Conditioned Latin Hypercube Sampling

This Python package is based on the Conditioned Latin Hypercube Sampling (cLHS) method of Minasny & McBratney (2006). It follows some of the code from the R package clhs of Roudier et al.

It attempts to create a Latin Hypercube sample by selecting only from input data.
It uses simulated annealing to force the sampling to converge more rapidly.
It allows for setting a stopping criterion on the objective function described in Minasny & McBratney (2006).

You may reproduce the jupyter notebook example on Binder.

Please check online documentation for more information.

clhs_py's People

Contributors

Stargazers

Watchers

Forkers

jiaweizhuang zhonghua-zheng xiongxiongufl skylocust pysoilmap maximevalorhiz basaks

clhs_py's Issues

Does not work if the variables are only categorical

If the variables are categorical only, the clhs function does not work since it is still looking to find continuous variables.

import pandas as pd
import itertools

# define the range and maximum value
values = range(0, 7)
max_value = 14

# generate all possible combinations of 7 integers from the defined range
combinations = itertools.product(values, repeat=7)

# filter the combinations that have a sum at most equal to the maximum value
result = [c for c in combinations if sum(c) <= max_value]
input_space = [list(x) for x in result]

columns=["X1", "X2", "X3", "X4","X5", "X6", "X7"]
df_input_space = pd.DataFrame(input_space,columns=columns) 

for j in df_input_space.columns:
    for i in range(len(df_input_space)):
        df_input_space[j][i] = str(df_input_space[j][i])                        

import clhs as cl
num_sample = 30


sampled_points=cl.clhs(df_input_space[["X1","X2","X3","X4","X5","X6","X7"]], num_sample, max_iterations=1000)

I get the following error message:

cLHS:  0%|          |0/1000 [Elapsed time: 0.0017247200012207031, ETA: 0, ?it/s]
Traceback (most recent call last):

  File "<ipython-input-20-cbd43074fe60>", line 40, in <module>
    sampled_points=cl.clhs(df_input_space[["X1","X2","X3","X4","X5","X6","X7"]], num_sample, max_iterations=1000)

  File "/home/antonis/anaconda3/lib/python3.8/site-packages/clhs/clhs.py", line 744, in clhs
    sample_indices, remaining_indices = resample(

  File "/home/antonis/anaconda3/lib/python3.8/site-packages/clhs/clhs.py", line 555, in resample
    return resample_worst(

  File "/home/antonis/anaconda3/lib/python3.8/site-packages/clhs/clhs.py", line 500, in resample_worst
    continuous_objective_values[idx_not_included], return_inverse=True)

IndexError: index 0 is out of bounds for axis 0 with size 0
Is this by design? Or is it a bug?

Numpy Error

Hi,

Not sure this is still being supported, but I was thrilled to find your python implementation of the R package clhs as it's exceedingly useful in my current work. And R is painful for the uninitiated. I've tested the library by following your example (https://clhs-py.readthedocs.io/en/latest/notebooks/quickstart.html)

I've downloaded a stack of soil attributes from GEE (https://developers.google.com/earth-engine/datasets/catalog/CSIRO_SLGA) and used gdalwarp to cut out pixels outside my shapefile for all 18 layers and then used georasters to convert them to pandas.

My initial goal was to use the 18 layers (AWC, Total N and SOC for 0-5,5-15,15-30,30-60,60-100) separately, but when that didn't work I just took the mean of the 3 variables over the 0-100cm profile as a test. This also didn't work. Strangely I can't see any difference in our dataframes (after checking the xarray tutorial data), except mine will have discontinuities in x and y as some cells were cut as outside the shapefile. Example dataframe (dfOut) attached and code below.

num_sample=50
sampled=clhs.clhs(dfOut[['AWC_0-100','NTO_0-100', 'SOC_0-100']],
num_sample, max_iterations=1000)
clhs_sample=dfOut.iloc[sampled["sample_indices"]]
df_testing_clhs.csv

Error below (looks like nothing has been added to the continuous_objective_values object):

IndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_21692/2054753940.py in
4 # cLHS
5 sampled=clhs.clhs(dfOut[['AWC_0-100','NTO_0-100', 'SOC_0-100']],
----> 6 num_sample, max_iterations=1000)
7 clhs_sample=dfOut.iloc[sampled["sample_indices"]]

~\Anaconda3\envs\py3_geo\lib\site-packages\clhs\clhs.py in clhs(predictors, num_samples, good_mask, include, max_iterations, objective_func_limit, initial_temp, temp_decrease, cycle_length, p, weights, progress, random_state, **progress_kwargs)
746 previous_results["remaining_indices"],
747 previous_results["obj_continuous"],
--> 748 include)
749 x_continuous = continuous_predictors.iloc[sample_indices].copy()
750 x_categorical = categorical_predictors.iloc[sample_indices].copy()

~\Anaconda3\envs\py3_geo\lib\site-packages\clhs\clhs.py in resample(p, sampled_indices, remaining_indices, continuous_objective_values, include)
555 return resample_worst(
556 continuous_objective_values, sampled_indices, remaining_indices,
--> 557 include)
558
559

~\Anaconda3\envs\py3_geo\lib\site-packages\clhs\clhs.py in resample_worst(continuous_objective_values, sampled_indices, remaining_indices, include)
498 np.isin(new_sampled_indices, include_, assume_unique=True, invert=True)]
499 unique_obj_vals, inverse_indices = np.unique(
--> 500 continuous_objective_values[idx_not_included], return_inverse=True)
501 idx_worst = idx_not_included[
502 inverse_indices[(inverse_indices == unique_obj_vals.size - 1)]]

IndexError: index 0 is out of bounds for axis 0 with size 0

Thanks for your time
Ian

Remove if name == "main"

To make this more user friendly, don't keep the if __name__ == "__main__" section in clhs.py, which starts at

clhs_py/clhs.py

Line 786 in 8307715

if __name__ == "__main__":

`bins` must increase monotonically, when an array

Hello,

Thank you for creating this implementation of cLHS in Python.

I ran it on my computer and I got an error. This happens when the number of samples is high and the cumulative distribution of my data has the same value for a long time.
I did a bit of research and found out that the get_strata function is not always returning a monotonic quantiles list which raises an error when used in the count_matrix function. I am not sure why this is the case but it seems that rounding the quantiles list fix the problem.
BR,
Alexandre

A bug when running clhs with dataframe

cLHS:  0%|          |0/10000 [Elapsed time: 0, ETA: 0, ?it/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-33-d013e88c7467> in <module>
----> 1 clhs.clhs(df_test[feature_ls].values, 10)

/opt/miniconda/envs/py3/lib/python3.6/site-packages/clhs/clhs.py in clhs(predictors, num_samples, good_mask, include, max_iterations, objective_func_limit, initial_temp, temp_decrease, cycle_length, p, weights, progress, random_state, **progress_kwargs)
    746             previous_results["remaining_indices"],
    747             previous_results["obj_continuous"],
--> 748             include)
    749         x_continuous = continuous_predictors.iloc[sample_indices].copy()
    750         x_categorical = categorical_predictors.iloc[sample_indices].copy()

/opt/miniconda/envs/py3/lib/python3.6/site-packages/clhs/clhs.py in resample(p, sampled_indices, remaining_indices, continuous_objective_values, include)
    555         return resample_worst(
    556             continuous_objective_values, sampled_indices, remaining_indices,
--> 557             include)
    558 
    559 

/opt/miniconda/envs/py3/lib/python3.6/site-packages/clhs/clhs.py in resample_worst(continuous_objective_values, sampled_indices, remaining_indices, include)
    498         np.isin(new_sampled_indices, include_, assume_unique=True, invert=True)]
    499     unique_obj_vals, inverse_indices = np.unique(
--> 500         continuous_objective_values[idx_not_included], return_inverse=True)
    501     idx_worst = idx_not_included[
    502         inverse_indices[(inverse_indices == unique_obj_vals.size - 1)]]

IndexError: index 0 is out of bounds for axis 0 with size 0

wagoner47 / clhs_py Goto Github PK

clhs_py's Introduction

cLHS: Conditioned Latin Hypercube Sampling

clhs_py's People

Contributors

Stargazers

Watchers

Forkers

clhs_py's Issues

Does not work if the variables are only categorical

Numpy Error

Remove if name == "main"

`bins` must increase monotonically, when an array

A bug when running clhs with dataframe

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent