Comments (3)
Similar problem with FPS.
import numpy as np
from skcosmo.selection import FPS
X = np.ones((10,3))
n_features = 1
fs = FPS.FeatureFPS(X)
fs.select(n_features)
print("X.shape", X.shape)
print("n_features", n_features)
print("len(fs.idx)", len(fs.idx))
print("len(np.unique(fs.idx))", len(np.unique(fs.idx)))
print()
n_features = 3
fs = FPS.FeatureFPS(X)
fs.select(n_features)
print("X.shape", X.shape)
print("n_features", n_features)
print("len(fs.idx)", len(fs.idx))
print("len(np.unique(fs.idx))", len(np.unique(fs.idx)))
Output
X.shape (10, 3)
n_features 1
len(fs.idx) 1
len(np.unique(fs.idx)) 1
X.shape (10, 3)
n_features 3
len(fs.idx) 2
len(np.unique(fs.idx)) 2
FPS does only select features until the distance is close to zero and then stops. This could be FPS default behaviour, but it is really not clear from the python doc and the naming of the variables, and not consistent with CUR at the moment. I would rather have "select me exactly this amount, does not matter if these are crappy features" as default behaviour since PCA also does it this way and gives me an option to select the number of features in a smart way.
I also had the issue that one index was two times in fs.idx
, but this was in an older version of skcosmo, so could have already been fixed
from scikit-matter.
Also there is somewhere an idx check wrong for FPS such that one index is included twice. I haven been able to reproduce this for a small script, but here a snippet of a larger part of the code
fs = skcosmo.selection.FPS.FeatureFPS(features, tol=1e-50)
fs.select(self.n_features)
self.selected_idx_ = fs.idx
print("len(self.selected_idx_)",len(self.selected_idx_))
print("len(np.unique(self.selected_idx_))",len(np.unique(self.selected_idx_)))
Output
len(self.selected_idx_) 391
len(np.unique(self.selected_idx_)) 390
EDIT1: just checked, it is the first selected feature which is included twice in the idxs, one time at the beginning, and one time at the end. I think this is because at this point the distance is equal to 0, but an idx is added one last time because breaking from the for loop over n_select
EDIT2: Code example for one idx is included twice.
import numpy as np
from skcosmo.selection import FPS
X = np.array([[0, 100, 100, 100]])
n_features = 3
fs = FPS.FeatureFPS(X, idxs=[0])
fs.select(n_features)
print("X.shape", X.shape)
print("n_features", n_features)
print("len(fs.idx)", len(fs.idx))
print("len(np.unique(fs.idx))", len(np.unique(fs.idx)))
With output
X.shape (1, 4)
n_features 3
len(fs.idx) 3
len(np.unique(fs.idx)) 2
from scikit-matter.
Addressed in #58
from scikit-matter.
Related Issues (20)
- PCovR is not centering like PCA HOT 3
- Moving the paper-ore branch to a fork or another repo HOT 2
- Negative distances for fitted points with the DirectionalConvexHull HOT 1
- From docs it is not super clear that sample selection works analogously to feature selection HOT 2
- Move notebooks to sphinx gallery python scripts
- Interactive example of the 3d directional convex hull using chemiscope widget HOT 1
- Create a CONDA forge recipe HOT 3
- Tests are running slow HOT 2
- PCovR-WHODataset takes super long to compute HOT 3
- What should be number of characters/line HOT 2
- Give contributors more visibility HOT 1
- WHO dataset missing function call section in doc HOT 1
- Set up a doc formatter
- Still need a logo
- Implementation of local prediction rigidity HOT 2
- Small typo on PCovR documentation
- Consistent validation HOT 2
- Zero scores result in repeated selection and wrong scores at least for FPS
- Switch to sphinx doctest to avoid implicit import problem HOT 1
- Rank-one updates and other potential performance gains for CUR HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scikit-matter.