mljs / pca Goto Github PK
View Code? Open in Web Editor NEWPrincipal component analysis
Home Page: https://mljs.github.io/pca/
License: MIT License
Principal component analysis
Home Page: https://mljs.github.io/pca/
License: MIT License
I'm getting error while trying to call new PCA(Matrix(dataset))
TypeError: Class constructors cannot be invoked without 'new'
It should be new PCA(new Matrix(dataset))
What engine (node, iojs, babel) is this package targetting?
I ran getExplainedVariance
on my dataset:
const pca = new PCA(students);
console.log(pca.getExplainedVariance());
It worked fine, and I got a list of variances back - however, how do I tell which variance corresponds to which feature?
I don't understand... we don't choose the number of components of our PCA ?
See how we do with SKlearn :
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
A PCA enables dimensionality reduction, so we have to choose our new dimensionality.
When I construct the PCA object from a dataset and run getExplainedVariance() I get a number[] of variance values, sorted in descending order. I'd like to know which feature relates to each variance value. I could do this if the report specified the original index of the feature in the sample or allowed me to pass in labels.
I've read the API documentation and pulled down the source code, but I really haven't found much explanation there. Am I missing how to do this? I'd be happy to help improve documentation or add this code if you agree it's missing and necessary.
When the package is installed it push to docs, so it throws an error in TravisCI fatal: empty ident name (for <travis@testing-worker-linux-docker-d81d9b9d-3397-linux-15.prod.travis-ci.org>) not allowed
Before you start please have a loot at:
Thank you for ml-pca, we are using it in a browser project and so far looks great. One issue we are having, however, is that the JS files in ml-pca and ml-matrix use ES6 features within a CJS file. When using JSPM/SystemJS in the browser this means that the files are detected as CJS and not transpiled.
Here is a relevant SystemJS issue: systemjs/systemjs#811
One your end I can see three approaches:
Trying to visualize eigenvectors of a point set:
I compared the results with numeric.js using this code:
http://davywybiral.blogspot.co.uk/2012/11/numeric-javascript.html
For instance check out how this simple ruby library works.
https://github.com/gbuesing/pca
It outputs the dataset in the desired number of dimensions, ready to graph. Is this easy with your library and I am just missing something obvious?
Thanks!
From the documentation, I understand specifying the nComponents only works for the options.method = 'NIPALS'.
However
options.method='NIPALS'
options.nCompNIPALS = 2
pca = new PCA(dataset, options)
embedding = pca.predict(dataset, options)
always returns arrays of NaNs. [NaN,NaN]. I've tried it with 'SVD' and there are no NaNs, however the other methods don't support nComponents. So how do you use this for dimensionality reduction?
It would be more desirable if it can skip features with zero variance instead of crashing.
I have written some code to do the check myself, but it feels very inefficient. Suggestion on improvement is welcome
let mat = new Matrix(input).transpose();
let mat2 = [];
mat.forEach((vec, idx) => {
let mean = this.mean(vec);
let variance = this.variance(vec, mean);
if (variance > 1e-7) {
let svec = this.standardize(vec, mean, variance);
mat2.push(svec);
} else {
// consider 0 variance
}
})
mat2 = new Matrix(mat2);
mat2 = mat2.transpose();
// scaled myself to avoid 0-division (caused by 0-variance) problem;
let pca = new Stat.PCA(mat2, {mean: false, scale: false});
I have an array of features with size N with M samples dataset. This is a vector N-dimensional of hashes representing a fingerprint of an audio file. I would like to run PCA on it using this library. So assumed my dataset size is M x N, how to run against this library?
Hi, by advance sorry if my question is stupid as I'm not en expert in statistics
I'm trying to implement a SOM algorithm and the best way to initiate neurons' vectors is by using PCA,
I'm trying to understand the behavior of the generated eigenvectors but can't figure why I've got this on my unit tests:
const dataSetSize = 10;
const numDimensions1 = 3;
const numDimensions2 = 11;
// Matrix 1 generation
const data = _.range(0, dataSetSize).map(vec => _.range(0, numDimensions1).map(val => Math.random());
// Matrix 2 generation
const data2 = _.range(0, dataSetSize).map(vec => _.range(0, numDimensions2).map(val => Math.random());
const pca = new PCA(data, {
center: true,
scale: false,
});
const pca2 = new PCA(data2, {
center: true,
scale: false,
});
describe('eigenvectors:', () => {
const eigenvectors = pca.getEigenvectors();
const eigenvectors2 = pca2.getEigenvectors();
it('should have as many eigenvectors than the num of dimensions on the dataset', () => {
assert.strictEqual(eigenvectors.length, numDimensions1);
assert.strictEqual(eigenvectors2.length, numDimensions2);
});
it('an eigenvector should have as many dimensions than a vector from the dataset', () => {
assert.strictEqual(eigenvectors[0].length, numDimensions1);
// FAIL:
// assert.strictEqual(eigenvectors2[0].length, numDimensions2);
});
});
Hello, first of all I want to give my thanks for the existence of this library. I don't have a lot of experience with multivariate analysis so forgive me if this is a dumb question but I was wondering if it's possible to plot the data generated from the analysis? For example, if you take a look at this Python Plotly graph, they are plotting using the data generated from sklearn. Is this something that is achievable with the data generated from this library? Thank you.
The documentation is published but can be improved (before release maybe):
Usage:
const pca = new PCA([vector]);
const result = pca.predict([vector], { nComponents: 2 });
...where vector
is an array with 1,536 elements a la an Open AI embedding. E.g. const vector = [0.00728, -0.0181, 0.014, ....]
The goal is to reduce/project the 1,536-element vector into 2d space for graphing related vectors.
However, when I call pca.predict
, it throws this error:
(PID 60093) 2023-07-03T22:51:51.304Z [ERROR] script-cli.js:127: RangeError: Submatrix indices are out of range
at checkRange (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-matrix/matrix.js:1038:11)
at Matrix.subMatrix (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-matrix/matrix.js:2455:5)
at PCA.predict (/Users/josiahbryan/devel/rubber/backend/node_modules/ml-pca/lib/pca.js:123:28)
[snipped]
A getScores()
method would be useful.
To do this I used:
let scores = pca.predict(dataset)
I made several attempts before being able to get what I needed (the scores) but I'm not a statistician, it's true. Even a reference to the scores also in the documentation can help those who are not in the field, like me.
Check the view
And use as data
1,1
Hello,
When using new PCA(dataset[])
, regardless the order of my observations, (for instance dataset = [[1, 2], [100, 2000]]
or dataset = [[2, 1], [2000,100]]
, the method getExplainedVariance
always seems to returnthe same array of values. if there a way to know the reference of the column/feature the score refers to ?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.