sundar0989 / xuniverse Goto Github PK
View Code? Open in Web Editor NEWxverse (XuniVerse) is collection of transformers for feature engineering and feature selection
License: MIT License
xverse (XuniVerse) is collection of transformers for feature engineering and feature selection
License: MIT License
Is it possible to use continuous outcome variable?
Can someone please explain how a vote is assigned or not assigned for each of those techniques used?
Is there a threshold in the feature importance or coefficients where it will render a vote to be assigned to a feature?
Thank you
I am trying to perform feature selection using xverse's VotingSelector on Titanic data set, which is a binary classification problem, and the dataset also contains categorical variables whcih i have one-hot-encoded. I am repeatedly facing this error. I am using xverse version 1.0.5 and the Python version is 3.6. Kindly help.
Would be nice if you could specify n_jobs=-1 to f.ex. to the voting selector so that it wouldn't just run in single core.
I can work around this problem by passing the entire dataframe in... along w/ the y, but I'm not sure if this is expected behavior. X implies passing dataframe without y included.
UnboundLocalError: local variable 'bins_X_grouped' referenced before assignment
df = pd.DataFrame({'x1': list(range(100)),
'x2': list(range(100)),
'x3': list(range(100)),
'y': list(range(100))})
df['y'] = np.where(df['x1']>50,1,0)
from xverse.transformer import MonotonicBinning
clf = MonotonicBinning()
X = df[['x1','x2','x3']]
y = df[['y']]
if not isinstance(X, pd.DataFrame):
print("Not a dataframe")
else:
print("Is a DataFrame")
clf.fit(X, y)
UnboundLocalError Traceback (most recent call last)
in
17 print("Is a DataFrame")
18
---> 19 clf.fit(X, y)
~/opt/anaconda3/lib/python3.7/site-packages/xverse/transformer/_binning.py in fit(self, X, y)
120
121 #apply the monotonic train function on dataset
--> 122 fit_X.apply(lambda x: self.train(x, y), axis=0)
123 return self
124
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
~/opt/anaconda3/lib/python3.7/site-packages/pandas/core/apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
~/opt/anaconda3/lib/python3.7/site-packages/xverse/transformer/_binning.py in (x)
120
121 #apply the monotonic train function on dataset
--> 122 fit_X.apply(lambda x: self.train(x, y), axis=0)
123 return self
124
~/opt/anaconda3/lib/python3.7/site-packages/xverse/transformer/_binning.py in train(self, X, y)
172 We still want our code to produce bins.
173 """
--> 174 if len(bins_X_grouped) == 1:
175 bins = algos.quantile(X, np.linspace(0, 1, force_bins)) #creates a new binnning based on forced bins
176 if len(np.unique(bins)) == 2:
UnboundLocalError: local variable 'bins_X_grouped' referenced before assignment
Hi,
Custome binning is not working , kindly advise.
can supply a parameter to control each category rate of the train samples?
Because when the category rate is very few,the binning will be meaningless.
ValueError: The input data must be pandas dataframe. But the input provided is <class 'str'>
I tried to debug but have been unable to figure out the reasoning behind this. Here's some sample code that recreates it:
df = pd.DataFrame({'x1': list(range(100)),
'x2': list(range(100)),
'y': list(range(100))})
df['y'] = np.where(df['x1']>50,1,0)
from xverse.transformer import MonotonicBinning
clf = MonotonicBinning()
X = df[['x1','x2']]
y = df[['y']]
if not isinstance(X, pd.DataFrame):
print("Not a dataframe")
else:
print("Is a DataFrame")
clf.fit(X, y)
ValueError Traceback (most recent call last)
in
10 print("Is a DataFrame")
11
---> 12 clf.fit(X, y)
~/opt/anaconda3/lib/python3.7/site-packages/xverse/transformer/_binning.py in fit(self, X, y)
76
77 #check datatype of X
---> 78 self.check_datatype(X)
79
80 #The length of X and Y should be equal
~/opt/anaconda3/lib/python3.7/site-packages/xverse/transformer/_binning.py in check_datatype(self, X)
62
63 if not isinstance(X, pd.DataFrame):
---> 64 raise ValueError("The input data must be pandas dataframe. But the input provided is " + str(type(X)))
65 return self
66
ValueError: The input data must be pandas dataframe. But the input provided is <class 'str'>
Hi Sundar,
After running following code :
from xverse.transformer import MonotonicBinning
clf = MonotonicBinning()
clf.fit(X, y)
I am getting AttributeError: module 'pandas.core.algorithms' has no attribute 'quantile' error. Any help will be appreciated.
Thanks
TypeError Traceback (most recent call last)
in
----> 1 clf.transform(X).head()
~/.conda/envs/most-reg/lib/python3.7/site-packages/xverse/transformer/_woe.py in transform(self, X, y)
308 Estimator has to be fitted to apply transformations.")
309
--> 310 outX[new_column_name] = tempX.replace(self.woe_bins[original_column_name])
311
312 #transformed dataframe
~/.local/lib/python3.7/site-packages/pandas/core/series.py in replace(self, to_replace, value, inplace, limit, regex, method)
4567 limit=limit,
4568 regex=regex,
-> 4569 method=method,
4570 )
4571
~/.local/lib/python3.7/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
6490
6491 return self.replace(
-> 6492 to_replace, value, inplace=inplace, limit=limit, regex=regex
6493 )
6494 else:
~/.local/lib/python3.7/site-packages/pandas/core/series.py in replace(self, to_replace, value, inplace, limit, regex, method)
4567 limit=limit,
4568 regex=regex,
-> 4569 method=method,
4570 )
4571
~/.local/lib/python3.7/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
6536 dest_list=value,
6537 inplace=inplace,
-> 6538 regex=regex,
6539 )
6540
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in replace_list(self, src_list, dest_list, inplace, regex)
612 mask = ~isna(values)
613
--> 614 masks = [comp(s, mask, regex) for s in src_list]
615
616 result_blocks = []
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in (.0)
612 mask = ~isna(values)
613
--> 614 masks = [comp(s, mask, regex) for s in src_list]
615
616 result_blocks = []
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in comp(s, mask, regex)
606
607 s = com.maybe_box_datetimelike(s)
--> 608 return _compare_or_regex_search(values, s, regex, mask)
609
610 # Calculate the mask once, prior to the call of comp
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in _compare_or_regex_search(a, b, regex, mask)
1966 result = tmp
1967
-> 1968 _check_comparison_types(result, a, b)
1969 return result
1970
~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in _check_comparison_types(result, a, b)
1934
1935 raise TypeError(
-> 1936 f"Cannot compare types {repr(type_names[0])} and {repr(type_names[1])}"
1937 )
1938
TypeError: Cannot compare types 'ndarray(dtype=object)' and 'Interval'
pandas version - 1.1.0
how to choose binning borders and number of bins
how better / different this from
https://github.com/airysen/caimcaim
It's better to add random_state param to VotingSelector for ExtraTreesClassifier and RandomForestClassifier, to be able reproduce results.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.