Giter VIP home page Giter VIP logo

larmachinelearningdata's Introduction

Instructions for creating e.g. Pfo Characterisation SVM model

Using implementation in SvmPfoCharacterisationAlgorithm

  1. Separate your sample of events in two, one for training and one for testing (not necessarily 50% each, but representative of the spectrum of events for your problem)

  2. Run over your training subsample with the option TrainingSetMode = true. To do this, edit the corresponding PandoraSettings file (e.g. LArReco/scripts/uboone/PandoraSettings_MicroBooNE_Neutrino.xml) and provide the output training file name like: ... true OUTPUT_NAME This will create a txt file (note that the .txt will be appended to the OUTPUT_NAME provided) with the features calculated for the events in your input file. You can see an example in the file SVM_training_data_pfocharacterisation_example.txt, containing lines like: 05/30/17_16:10:06,98.7135,0.00741252,0.00355123,0.00949093,0.00658028,0.0324565,0.0307609,0.000491683,1 which are a list of the features starting with a timestamp and finishing with the true value to train for latter classification. In this example, the true value is 1 for a track and 0 for a shower, and the features are the variables computed by the tools:

    <FeatureTools>
        <tool type = "LArLinearFitFeatureTool"/>
	<tool type = "LArShowerFitFeatureTool"/>
	<tool type = "LArVertexDistanceFeatureTool"/>
    </FeatureTools>

which are added in this order: 1) straight line length, 2) mean of difference with straight line, 3) sigma (standard deviation) of difference with straight line, 4) dTdL width, 5) max gap length, 6) RMS of linear fit, 7) shower fit width, 8) vertex distance. Expect the first feature (straight line length) the other ones are normalized divided by the straight line length if the option RatioVariables = true. The implementation of the tools and available variables can be found in TrackShowerIdFeatureTool

*** Note: the next two steps are specific for the rbf (radial basis function) kernel option. For other options, check scikit

  1. Use the python script rbf_gridsearch_test.py to search for the optimal values of C and gamma for your training data. Edit rbf_gridsearch_test.py and give the text file calculated in step 1 as trainingFile. This script will do a grid search which is time and memory intense, consider sampling your training data accordingly (for example, the SVM_training_data_pfocharacterisation_example.txt contains randomly selected 1000 training examples from the entire training data for this step). This script will report at the end that "The best parameters are C: and gamma: " with a given score. The score is a measurement of the classification, for example in the track-shower characterisation it would be: ntracks * tracks_eff + nshowers * showers_eff

The python script produces also a plot like, with indicative values of the score in the searched grid, which is to be checked to ensure that it is smooth and the selected grid was enough to find a reliable best score (otherwise, if the selected point is at an edge of the grid, consider extending the grid extremes and running again this step).

  1. With the values of C and gamma obtained in the previous step, run example.py. Edit it and change C and gamma, and give the appropriate trainingFile name. This step is less time and memory consuming, so the input data can be scaled (for example using 100k training examples). This will give another score, which is to be checked against the one in the previous step. If it is very different, it could mean that the sampled training examples used in step 2 were not representative enough of the entire training data, and you might consider running again from step 2 with a larger training sampled input. The output of step 3 will be a .xml file (as well as a .pkl file) with the model, i.e. the SVs, to be used for solving the problem afterwards in your testing data (separated in setp 0), i.e. the input file to be given to the algorithm using it, like in this case:
    <algorithm type = "LArSvmPfoCharacterisation">
	...
	<SvmFileName>PandoraSvm_PfoCharacterisation_MicroBooNE_mcc7.xml</SvmFileName>
	<SvmName>FinalPfoCharacterisation</SvmName>
	...
    </algorithm>

The SvmName is the one added in the output .xml file, and can be changed in example.py

larmachinelearningdata's People

Contributors

a-d-smith avatar andychappell avatar etyley avatar loressa avatar mousam-rai avatar pandorapfa avatar phuncr avatar stevengreen1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.