Giter VIP home page Giter VIP logo

generate-dataset's Introduction

ml-generate-dataset

NPM version build status Test coverage npm download

.

Installation

npm install --save ml-generate-dataset

Example

var generateDataset = require('ml-generate-dataset');
/*
 * this options will be used to create the dataSet, thus permites to create several classes where markers will be
 * the elements with differents distributions between the classes, pay attention to the element with index 1, it has
 * a distribution with means 9.4 and 10.3 for the first and second classes respectivelly
 */
var options = {
    keepDataClass: true,
    keepCompositionMatrix: true,
    dummyMatrix: true,
    seed: 22,
    classes: [
        {
            nbSample: 500,
            elements: [
                {
                    index: 0,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDesviation: 0.1
                        }
                    }
                },
                {
                    index: 1,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDesviation: 0.1
                        }
                    }
                },
                {
                    index: 2,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDesviation: 0.1
                        }
                    }
                }
            ]
        },
        {
            nbSample: 500,
            elements: [
                {
                    index: 0,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDesviation: 0.1
                        }
                    }
                },
                {
                    index: 1,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 10.3,
                            standardDesviation: 0.15
                        }
                    }
                },
                {
                    index: 2,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDesviation: 0.1
                        }
                    }
                }
            ]
        }
    ]
};
// the pureElements matrix could be whatever you want like NMR or IR spectra.
var pureElements = [
    [0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 0]
];
/* the rows of pureElements matrix will be linear combined like:
 * var pureElements = [
 *  element A,
 *  element B,
 *  element C,
 *      .
 *      .
 *      .
 *  ];
 *  so each element of dataset is  AA = aA + bB + cC + ...
 *  and the matrix composition contain those percentages
 */
var dataset = generateDataset(pureElements, options);
// now you have an object with the dataset, matrix composition and dataClass matrix to do a statistical procedure and debug

License

MIT

generate-dataset's People

Contributors

jeffersonh44 avatar jobo322 avatar josoriom avatar lpatiny avatar targos avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jobo322

generate-dataset's Issues

format of the config file

in my opinion the config file is too complex to handle large cases. If we have a system with 1000 molecules, the file will become very big.

The variable pureElement should come first and may be a filename as well (if imported it should be able to read a csv file with each element on a new line)

Then because we don't want to fill in information for 1000 molecules we should be able to define a default behaviour.

// the pureElements matrix could be whatever you want like NMR or IR spectra.
var pureElements = [
    [0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 0]
];

// this vector is optional, it allows to tune the composition if not already in the pureElements. Each row of pureElements will be multiplied by the corresponding element of meanComposition. (min 0, max 100)
var meanComposition = [10, 15, 70]; // for 3 classes


var options = {
    keepDataClass: true,
    keepCompositionMatrix: true,
    dummyMatrix: true,
    seed: 22,
    defaultBehavior: {
		distribution: {
			name: ['normal', 'normal', 'normal'], // one for each class
			parameters: {
				standardDeviation: [0.1, 0.1, 0.2], // one for each class
				meanType: 'sd/diff/absolute',
				mean: [0, -0.1, 0.1] // if 'sd' then defined as X times the sd, if 'diff' then defined as  difference with respect to meanComposition, if 'absolute' then overwrite the meanComposition entry.
			}	
		}
	}
	classes: [
        {
            nbSample: 500,
            elements: [
                {
                    index: 0,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            meanType: 'sd/diff/absolute',
                            mean: 9.4, 
                            standardDeviation: 0.1
                        }
                    }
                },
                {
                    index: 1,
                    distribution: {
                        name: 'normal',
                        parameters: {
                        	meanType: 'sd/diff/absolute',
                            mean: 9.4,
                            standardDeviation: 0.1
                        }
                    }
                },
                {
                    index: 2,
                    distribution: {
                        name: 'normal',
                        parameters: {
                        	meanType: 'sd/diff/absolute',
                            mean: 9.4,
                            standardDeviation: 0.1
                        }
                    }
                }
            ]
        },
        {
            nbSample: 500,
            elements: [
                {
                    index: 0,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDeviation: 0.1
                        }
                    }
                },
                {
                    index: 1,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 10.3,
                            standardDeviation: 0.15
                        }
                    }
                },
                {
                    index: 2,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDeviation: 0.1
                        }
                    }
                }
            ]
        },
        {
            nbSample: 400,
            elements: [
                {
                    index: 0,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDeviation: 0.1
                        }
                    }
                },
                {
                    index: 1,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 11,
                            standardDeviation: 0.15
                        }
                    }
                },
                {
                    index: 2,
                    distribution: {
                        name: 'normal',
                        parameters: {
                            mean: 9.4,
                            standardDeviation: 0.1
                        }
                    }
                }
            ]
        }
    ]
};

command line

I hope to be able to use the npm something like this...

node generate.js -dataType=nmr fromSDF=pathtosdf outputType=csv outputPrefix='blablah'
node generate.js -dataType=nmr fromCSV=pathtocsv outputType=csv outputPrefix='blablah'
node generate.js -dataType=nmr fromCSV=pathtocsv outputType=jcamp outputPrefix='blablah'

the output should contain the class vector and the composition matrix and the dataset

random noise

we should be able to add random noise on rows of the compositionMatrix after the "mixing" is done. This would simulate experimental noise when data are acquired and should be applied after mixing. Random noise is the same for all class, since it only depends on the experiments but not on the sample preparation.

compositionMatrix

We should be able to output a composition matrix in csv that can be used to multiply the pureElement matrix outside the npm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.