Giter VIP home page Giter VIP logo

Nitinguptadu

Breast cancer detection

Problem 1 )Data Consists of Outlier in several Columns Problem 2) Data consists of Left sweeknews in several Columns Problem 3) Data consists of Zero or Missing values in several Columns Problem 4) Data is unbalanced in terms of Cancer(“0”) counts 234 and Not Cancer (“1”) counts 42 [ Ratio 5:1]

Machine Learning Prediction Results

I have applied 6 different Machine learning model with Unbalanced and Balanced data

Machine learning models

RandomForest Naive Bayes SVM KNN Logistic Regression Xgboost

Data Balance Techniques

Downsampling Techniques Near Miss

Upsampling Techniques 1) SMOTETomek 2) RandomOverSampler

In last slide I have shown the result of all machine learning models with unbalanced and balanced data

Red colour represent Highest accuracy on the basis of F1 score with existing models and techniques

Blue colour represent Second Highest accuracy on the basis of F1 score with existing models and techniques

Nitin's Projects

tsfresh icon tsfresh

Feature extraction settings When starting a new data science project involving time series you probably want to start by extracting a comprehensive set of features. Later you can identify which features are relevant for the task at hand. In the final stages, you probably want to fine tune the parameter of the features to fine tune your models. You can do all those things with tsfresh. So, you need to know how to control which features are calculated by tsfresh and how one can adjust the parameters. In this section, we will clarify this. For the lazy: Just let me calculate some features So, to just calculate a comprehensive set of features, call the tsfresh.extract_features() method without passing a default_fc_parameters or kind_to_fc_parameters object, which means you are using the default options (which will use all feature calculators in this package for what we think are sane default parameters). For the advanced: How do I set the parameters for all kind of time series? After digging deeper into your data, you maybe want to calculate more of a certain type of feature and less of another type. So, you need to use custom settings for the feature extractors. To do that with tsfresh you will have to use a custom settings object: >>> from tsfresh.feature_extraction import ComprehensiveFCParameters >>> settings = ComprehensiveFCParameters() >>> # Set here the options of the settings object as shown in the paragraphs below >>> # ... >>> from tsfresh.feature_extraction import extract_features >>> extract_features(df, default_fc_parameters=settings) The default_fc_parameters is expected to be a dictionary, which maps feature calculator names (the function names you can find in the tsfresh.feature_extraction.feature_calculators file) to a list of dictionaries, which are the parameters with which the function will be called (as key value pairs). Each function parameter combination, that is in this dict will be called during the extraction and will produce a feature. If the function does not take any parameters, the value should be set to None. For example fc_parameters = { "length": None, "large_standard_deviation": [{"r": 0.05}, {"r": 0.1}] } will produce three features: one by calling the tsfresh.feature_extraction.feature_calculators.length() function without any parameters and two by calling tsfresh.feature_extraction.feature_calculators.large_standard_deviation() with r = 0.05 and r = 0.1. So you can control, which features will be extracted, by adding/removing either keys or parameters from this dict. It is as easy as that. If you decide to not calculate the length feature here, you delete it from the dictionary: del fc_parameters["length"] And now, only the two other features are calculated. For convenience, three dictionaries are predefined and can be used right away: tsfresh.feature_extraction.settings.ComprehensiveFCParameters: includes all features without parameters and all features with parameters, each with different parameter combinations. This is the default for extract_features if you do not hand in a default_fc_parameters at all. tsfresh.feature_extraction.settings.MinimalFCParameters: includes only a handful of features and can be used for quick tests. The features which have the “minimal” attribute are used here. tsfresh.feature_extraction.settings.EfficientFCParameters: Mostly the same features as in the tsfresh.feature_extraction.settings.ComprehensiveFCParameters, but without features which are marked with the “high_comp_cost” attribute. This can be used if runtime performance plays a major role. Theoretically, you could calculate an unlimited number of features with tsfresh by adding entry after entry to the dictionary. For the ambitious: How do I set the parameters for different type of time series? It is also possible, to control the features to be extracted for the different kinds of time series individually. You can do so by passing another dictionary to the extract function as a kind_to_fc_parameters = {“kind” : fc_parameters} parameter. This dict must be a mapping from kind names (as string) to fc_parameters objects, which you would normally pass as an argument to the default_fc_parameters parameter. So, for example using kind_to_fc_parameters = { "temperature": {"mean": None}, "pressure": {"max": None, "min": None} } will extract the “mean” feature of the “temperature” time series and the “min” and “max” of the “pressure” time series. The kind_to_fc_parameters argument will partly override the default_fc_parameters. So, if you include a kind name in the kind_to_fc_parameters parameter, its value will be used for that kind. Other kinds will still use the default_fc_parameters. A handy trick: Do I really have to create the dictionary by hand? Not necessarily. let’s assume you have a DataFrame of tsfresh features. By using feature selection algorithms you find out that only a subgroup of features is relevant. Then, we provide the tsfresh.feature_extraction.settings.from_columns() method that constructs the kind_to_fc_parameters dictionary from the column names of this filtered feature matrix to make sure that only relevant features are extracted. This can save a huge amount of time because you prevent the calculation of uncessary features. Let’s illustrate that with an example: # X_tsfresh containes the extracted tsfresh features X_tsfresh = extract_features(...) # which are now filtered to only contain relevant features X_tsfresh_filtered = some_feature_selection(X_tsfresh, y, ....) # we can easily construct the corresponding settings object kind_to_fc_parameters = tsfresh.feature_extraction.settings.from_columns(X_tsfresh_filtered) this will construct you the kind_to_fc_parameters dictionary that corresponds to the features and parameters (!) from the tsfresh features that were filtered by the some_feature_selection feature selection algorithm.

visualizing-decision-trees-random-forest-with-python icon visualizing-decision-trees-random-forest-with-python

How to Visualize Decision Trees using Matplotlib How to Visualize Decision Trees using Graphviz (what is Graphviz, how to install it on Mac and Windows, and how to use it to visualize decision trees) How to Visualize Individual Decision Trees from Bagged Trees or Random Forests

yolo-v3- icon yolo-v3-

Yolo V 3 network from scratch in pytorch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.