Giter VIP home page Giter VIP logo

bigml-swift's Introduction

BigML Swift Bindings

In this repository you'll find an open source Swift library that gives you a simple binding to interact with BigML <https://bigml.com>.

BigML makes machine learning easy by taking care of the details required to add data-driven decisions and predictive power to your company. Unlike other machine learning services, BigML creates beautiful predictive models that can be easily understood and interacted with.

The BigML @(language) bindings allow you to interact with BigML.io <https://bigml.io/>, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and, predictions, and many more <https://bigml.com/developers/>). Additionally, they also provide a few ML algorithms that can be run locally, i.e. offline, such as to make a prediction from a model, calculate an anomaly score, etc.

This module is licensed under the Apache License, Version 2.0.

Support

Please report problems and bugs to our BigML.io issue tracker.

Discussions about the different bindings take place in the general BigML mailing list.

Requirements

bigml-swift is compatible with Swift 2.0 and requires Xcode 7.0+.

Importing the module

To use BigML Swift SDK you can drag the bigml-swift folder on to your Xcode project.

Then, when you need to call a method from bigml-swift inside a file of yours, put the following directive on top of it:

@import bigmlSwift

Authentication

All the requests to BigML.io must be authenticated using your username and API key and are always transmitted over HTTPS.

Knowing that, connecting to BigML is a breeze. You just need to execute:

let api = BMLConnector(username:"your-username-here",
    apiKey:"your-api-key-here")

Quick Start

Imagine that you want to use this csv file containing the Iris flower dataset_ to predict the species of a flower whose sepal length is 5 and whose sepal width is 2.5. A preview of the dataset is shown below. It has 4 numeric fields: sepal length, sepal width, petal length, petal width and a categorical field: species. By default, BigML considers the last field in the dataset as the objective field (i.e., the field that you want to generate predictions for).

sepal length,sepal width,petal length,petal width,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
...
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
...
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica

You can easily generate a prediction following these steps::

let api = BMLConnector(username:"your-username-here",
    apiKey:"your-api-key-here")

let source = BMLMinimalResource(name:"My Data source",
    type:BMLResourceType.File,
    uuid:"./tests/data/iris.csv")

// All requests are asynchronous and have a completion block
api!.createResource(BMLResourceType.Source,
     name: "testCreateDatasource",
     from: source) { (resource, error) -> Void in

         let myDatasource = resource!
         api!.createResource(BMLResourceType.Dataset,
           name: "My first dataset",
           from: myDatasource) { (resource, error) -> Void in

              let myModel = resource!
              api!.createResource(BMLResourceType.Model,
                name: "My first model",
                from: myModel) { (resource, error) -> Void in

                   let pModel = Model(jsonModel: resource!.jsonDefinition)
                   let prediction = pModel.predict([
                        "sepal width": 3.15,
                        "petal length": 4.07,
                        "petal width": 1.51],
                     options: ["byName" : true])

                   print("Final Prediction: \(prediction)")
                }
           }
     }
}

Dataset

If you want to get some basic statistics for each field you can retrieve the fields from the dataset as follows to get a dictionary keyed by field id:

api.getResource(BMLResourceType.Dataset,
   uuid: datasetUuid) { (resource, error) -> Void in
      print_r(resource!.jsonDefinition["fields"])
}

The field filtering options are also available using a query string expression, for instance:

api.listResources(BMLResourceType.Dataset,
   filters: ["limit" : "5"]) {  (resource, error) -> Void in
      //-- process resource
}

limits the number of fields that will be included in dataset to 20.

Model

One of the greatest things about BigML is that the models that it generates for you are fully white-boxed. To get the explicit tree-like predictive model for the example above:

api.getResource(BMLResourceType.Model,
   uuid: modelUuid) { (resource, error) -> Void in
      //-- process resource
}

Again, filtering options are also available using a query string expression, for instance:

api.listResources(BMLResourceType.Model,
   filters: ["limit" : "5"]) {  (resource, error) -> Void in
      //-- process resource
}

limits the number of fields that will be included in model to 5.

Projects

A special kind of resource is project. Projects are repositories for resources, intended to fulfill organizational purposes. Each project can contain any other kind of resource, but the project that a certain resource belongs to is determined by the one used in the source they are generated from. Thus, when a source is created and assigned a certain project_id, the rest of resources generated from this source will remain in this project.

The REST calls to manage the project resemble the ones used to manage the rest of resources. When you create a project:

api!.createResource(BMLResourceType.Project,
     name: "my first project") { (resource, error) -> Void in
        //-- process resource
}

the resulting resource is similar to the rest of resources, although shorter::

{'code': 201,
 'resource': 'project/54a1bd0958a27e3c4c0002f0',
 'location': 'http://bigml.io/andromeda/project/54a1bd0958a27e3c4c0002f0',
 'object': {'category': 0,
            'updated': '2014-12-29T20:43:53.060045',
            'resource': 'project/54a1bd0958a27e3c4c0002f0',
            'name': 'my first project',
            'created': '2014-12-29T20:43:53.060013',
            'tags': [],
            'private': True,
            'dev': None,
            'description': ''},
 'error': None}

and you can use its project id to get, update or delete it:

Important: Deleting a non-empty project will also delete all resources assigned to it, so please be extra-careful when doing it.

Creating sources

To create a source from a local data file, you can use the create_source method. The only required parameter is the path to the data file (or file-like object). You can use a second optional parameter to specify any of the options for source creation described in the BigML API documentation.

Here’s a sample invocation::

let source = BMLMinimalResource(name:"My Data source",
    type:BMLResourceType.File,
    uuid:"./tests/data/iris.csv")

api!.createResource(BMLResourceType.Source,
     name: "testCreateDatasource",
     options: ["source_parser" : ["header" : false, "missing_tokens" : ["x"]]]
     from: source) { (resource, error) -> Void in
        //-- process resource
     }

or you may want to create a source from a file in a remote location::

api!.createResource(BMLResourceType.Source,
     name: "testCreateDatasource",
     options: ["remote" : "s3://bigml-public/csv/iris.csv"]) {
     (resource, error) -> Void in
        //-- process resource
     }

Creating datasets

Once you have created a source, you can create a dataset. The only required argument to create a dataset is a source id. You can add any of the optional arguments accepted by BigML and documented in the Datasets section of the Developer’s documentation <https://bigml.com/developers/datasets>.

For example, to create a dataset named “my dataset” with the first 1024 bytes of a source, you can execute the following call:

api!.createResource(BMLResourceType.Dataset,
     name: "testCreateDataset",
     options: ["size" : 1024]
     from: source) { (resource, error) -> Void in
        //-- process resource
     }

You can also extract samples from an existing dataset and generate a new one with them with the following call:

api!.createResource(BMLResourceType.Dataset,
     name: "testCloneDataset",
     options: ["sample_rate" : 0.8],
     from: originDataset) { (resource, error) -> Void in
        //-- process resource
     }

Creating models

Once you have created a dataset you can create a model from it. If you don’t select one, the model will use the last field of the dataset as objective field. The only required argument to create a model is a dataset id. You can also include in the request all the additional arguments accepted by BigML and documented in the Models section of the Developer’s documentation.

For example, to create a model only including the first two fields and the first 10 instances in the dataset, you can use the following invocation::

api!.createResource(BMLResourceType.Model,
     name: "testCreateModel",
     options: ["name" : "my model",
         "input_fields" : ["000000", "000001"],
         "range" : [1, 10]],
     from: dataset) { (resource, error) -> Void in
              //-- process resource
}

the model is scheduled for creation.

Creating clusters

If your dataset has no fields showing the objective information to predict for the training data, you can still build a cluster that will group similar data around some automatically chosen points (centroids). Again, the only required argument to create a cluster is the dataset id. You can also include in the request all the additional arguments accepted by BigML and documented in the Clusters section of the Developer’s documentation <https://bigml.com/developers/clusters>.

Let’s create a cluster from a given dataset:

api!.createResource(BMLResourceType.Cluster,
     name: "testCreateCluster",
     options: ["k" : 5],
     from: dataset) { (resource, error) -> Void in
              //-- process resource
}

that will create a cluster with 5 centroids.

Creating anomaly detectors

If your problem is finding the anomalous data in your dataset, you can build an anomaly detector, that will use iforest to single out the anomalous records. Again, the only required argument to create an anomaly detector is the dataset id. You can also include in the request all the additional arguments accepted by BigML and documented in the Anomaly detectors section of the Developer’s documentation <https://bigml.com/developers/anomalies>_.

Let’s create an anomaly detector from a given dataset:

api!.createResource(BMLResourceType.Anomaly,
     name: "testCreateAnomaly",
     from: dataset) { (resource, error) -> Void in
              //-- process resource
}

Creating associations

To find relations between the field values you can create an association discovery resource. The only required argument to create an association is a dataset id. You can also include in the request all the additional arguments accepted by BigML and documented in the Association section of the Developer's documentation_.

For example, to create an association only including the first two fields and the first 10 instances in the dataset, you can use the following invocation:

api!.createResource(BMLResourceType.Association,
     name: "testCreateAssociation",
     options: ["input_fields" : ["000000", "000001"],
               "range" : [1, 10]],
     from: dataset) { (resource, error) -> Void in
              //-- process resource
}

Associations can also be created from lists of datasets. Just use the list of ids as the first argument in the api call:

api!.createResource(BMLResourceType.Association,
     name: "testCreateAssociation",
     options: ["input_fields" : ["000000", "000001"],
               "range" : [1, 10]],
     from: dataset) { (resource, error) -> Void in
              //-- process resource
}

Creating predictions

You can now use the model resource identifier together with some input parameters to ask for predictions, using the create_prediction method. You can also give the prediction a name:

api!.createResource(BMLResourceType.Prediction,
     name: "testCreatePrediction",
     options: ["input_data" : ["sepal length" : 5,
               "sepal width" : 2.5]],
     from: model) { (resource, error) -> Void in
              //-- process resource
}

Creating centroids

To obtain the centroid associated to new input data, you can now use the create_centroid method. Give the method a cluster identifier and the input data to obtain the centroid. You can also give the centroid predicition a name:

api!.createResource(BMLResourceType.Centroid,
     name: "testCreatePrediction",
     options: [ "input_data" : ["pregnancies" : 0,
                                            "plasma glucose" : 118,
                                            "blood pressure" : 84,
                                            "triceps skin thickness" : 47,
                                            "insulin" : 230,
                                            "bmi" : 45.8,
                                            "diabetes pedigree" : 0.551,
                                            "age" : 31,
                                            "diabetes" : "true"]],
     from: cluster) { (resource, error) -> Void in
              //-- process resource
}

bigml-swift's People

Contributors

sdesimone avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.