Giter VIP home page Giter VIP logo

azure-machinelearning-datascience's Introduction

Azure-MachineLearning-DataScience

NOTE This content is no longer maintained. Visit the Azure Machine Learning Notebook project for sample Jupyter notebooks for ML and deep learning with Azure Machine Learning.

This repository contains walkthroughs, templates and documentation related to Machine Learning & Data Science services and platforms on Azure. Services and platforms include Data Science Virtual Machine, Azure ML, HDInsight, Microsoft R Server, SQL-Server, Azure Data Lake etc. It also hosts materials related to Team Data Science Process (TDSP, https:aka.ms/tdsp)

There are also materials from tutorials we have delivered at various conferences including KDD, Strata etc., using the above services and platforms.

For walkthroughs and templates, the primary documentation is on Microsoft documentation sites, with links back to this GitHub repository for the templates and code etc.

NOTE:

Any screenshots of RStudio are from the Open Source Edition.

azure-machinelearning-datascience's People

Contributors

brettcannon avatar coromt avatar crwilcox avatar deguhath avatar girishnathan avatar gopitk avatar hangzh-msft avatar inchiosa avatar jeannt avatar mezmicrosoft avatar michhar avatar msolhab avatar paulshealy1 avatar pechyony avatar rloutlaw avatar sfahad46 avatar sunliangms avatar tigerfisher avatar vapaunic avatar wguo123 avatar xue001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-machinelearning-datascience's Issues

Problem in installing sparklyr on R server in HDinsight - Azure

Goodmorning,

I am currently using R server on HDinsight in Azure.

I ask you for some help since I can not install sparklyr. When executing devtools::install_github("rstudio/sparklyr") I get the following error message:

ERROR: dependencies ‘tibble’, ‘rprojroot’ are not available for package ‘sparklyr’

  • removing ‘/home/etignone/R/x86_64-pc-linux-gnu-library/3.2/sparklyr’

Unfortunately I can not install neither ‘tibble’ nor ‘rprojroot’. Do you have any suggestion?

Thank you very much,

Edoardo Tignone

Database query as input does not seem to work on non-default endpoints as Request-Response?

Hi. I'm not sure if this is the correct repository to post to, but it looked the closest.

I'm experiencing what appears to be a bug with the endpoints exposed by Azure ML. I've created a very simple experiment that takes a db query as input to an "Import Data"-Module:
model

When deployed as a web service (classic), it works with both response-request and batch calls. However, if I create a new endpoint for the web service, Response-Request no longer works, and I get the following error. Note that Batch requests do work on this new endpoint.

capture

I'm not entirely sure if there's something I'm missing or if there's just a bug here, but it seems to be the latter. Any input would be greatly appreciated.

(Corresponding SO thread here: http://stackoverflow.com/questions/41595002/azure-machine-learning-endpoint-sql-access-fails-works-in-experiement)

JupyterHub extension does not deploy

Attempting to deploy the DSVM for Linux with the JuypterHub extension at https://github.com/Azure/Azure-MachineLearning-DataScience/tree/master/Data-Science-Virtual-Machine/Linux/extensions/JupyterHub does not succeed.

Clicking on the "Deploy to Azure" link and filling out the information on the first page results in the following error:

The template deployment 'a001b153-1426-4670-8d9c-3857dfc160f8' is not valid according to the validation procedure. The tracking id is 'cc596c6e-8cdc-4adb-8939-cdd7801b6ab2'. See inner errors for details. Please see https://aka.ms/arm-deploy for usage details.

ByodService doesn't compile cleanly when opened in VS 2015 Enterprise

Build fails with the errors listed the dotted lines, but only a summary, since there are lots of related errors.

Updating the Nuget reference to Microsoft.AspNet.WebApi from the installed version of 5.2.2 to 5.2.3 solves the issue.

The following frameworks are updated, all from 5.2.2 to 5.2.3
Microsoft.AspNet.WebApi
Microsoft.AspNet.WebApi.Client
Microsoft.AspNet.WebApi.Core
Microsoft.AspNet.WebApi.WebHost

1>------ Rebuild All started: Project: ByodService, Configuration: Debug Any CPU ------
1>C:\Program Files (x86)\MSBuild\14.0\bin\Microsoft.Common.CurrentVersion.targets(1819,5): warning MSB3245: Could not resolve this reference. Could not locate the assembly "System.Net.Http.Formatting". Check to make sure the assembly exists on disk. If this reference is required by your code, you may get compilation errors.
1>C:\Program Files (x86)\MSBuild\14.0\bin\Microsoft.Common.CurrentVersion.targets(1819,5): warning MSB3245: Could not resolve this reference. Could not locate the assembly "System.Web.Http". Check to make sure the assembly exists on disk. If this reference is required by your code, you may get compilation errors.
1>C:\Program Files (x86)\MSBuild\14.0\bin\Microsoft.Common.CurrentVersion.targets(1819,5): warning MSB3245: Could not resolve this reference. Could not locate the assembly "System.Web.Http.WebHost". Check to make sure the assembly exists on disk. If this reference is required by your code, you may get compilation errors.
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\App_Start\WebApiConfig.cs(6,23,6,30): error CS0234: The type or namespace name 'Hosting' does not exist in the namespace 'System.Web.Http' (are you missing an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\App_Start\WebApiConfig.cs(7,23,7,30): error CS0234: The type or namespace name 'WebHost' does not exist in the namespace 'System.Web.Http' (are you missing an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\AuthorizeApiKeyAttribute.cs(9,23,9,34): error CS0234: The type or namespace name 'Controllers' does not exist in the namespace 'System.Web.Http' (are you missing an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\AuthorizeApiKeyAttribute.cs(10,23,10,30): error CS0234: The type or namespace name 'Filters' does not exist in the namespace 'System.Web.Http' (are you missing an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\AzureMLClient\ClientExtensions.cs(6,23,6,33): error CS0234: The type or namespace name 'Formatting' does not exist in the namespace 'System.Net.Http' (are you missing an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\OdataRestApiBuilder.cs(10,23,10,33): error CS0234: The type or namespace name 'Formatting' does not exist in the namespace 'System.Net.Http' (are you missing an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\OdataRestApiBuilder.cs(13,23,13,34): error CS0234: The type or namespace name 'Controllers' does not exist in the namespace 'System.Web.Http' (are you missing an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\App_Start\WebApiConfig.cs(46,43,46,70): error CS0246: The type or namespace name 'WebHostBufferPolicySelector' could not be found (are you missing a using directive or an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\App_Start\WebApiConfig.cs(14,37,14,54): error CS0246: The type or namespace name 'HttpConfiguration' could not be found (are you missing a using directive or an assembly reference?)
1>E:\git\Azure-MachineLearning-DataScience\Apps\ByodService\App_Start\WebApiConfig.cs(48,30,48,52): error CS0115: 'NoBufferPolicySelector.UseBufferedInputStream(object)': no suitable method found to override

Data Science VM not available in Azure CSP subscriptions

We are providing Azure to Customer as Cloud Solution Partner. They are very much interested in both Windows/Linux Data Science VMs - however, it seems to be impossible to deploy images to CSP subscriptions.
This is really a pain point for us, and I could not imagine a reason why this shouldn't be allowed by Microsoft.
When in Azure Marketplace, and trying to deploy - it says "Not allowed" when I want to deploy to CSP subscription.
As our all data sources are under this subscription, it would make no sense for us to create DS VM in another one.

Looking very much forward to help / suggestion here.
Wlodek

azureml module is unstable

when i am importing
from azureml.train.automl import AutoMLRunx sometimes its getting imported and sometime i amgetting error ImportError: cannot import name 'AutoMLRunx'

same with reading file from blob ,sometime i can run the code successfully but sometime it gives error while read blob file itself
"pipeline is broken"

all the module are behaving weird

Plotly python package is broken in current conda version in Azure ML

Hi guys,

Firstly, I have to say that your Azure ML platform is just awesome and your decision to include Jupyter support is quite useful, thanks a lot for that!

Next, I wish to highlight a small issue that, in my humble opinion, is related with the Anaconda version that runs in the docker instances. It happens that when plotly package is imported in a Jupyter notebook the next error is obtained:

import plotly
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-c27a4132ad2e> in <module>()
  ----> 1 import plotly

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/__init__.py in <module>()
     29 from __future__ import absolute_import
     30 
---> 31 from plotly import (plotly, dashboard_objs, graph_objs, grid_objs, tools,
     32                     utils, session, offline, colors)
     33 from plotly.version import __version__

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/plotly/__init__.py in <module>()
      8 
      9 """
---> 10 from . plotly import (
     11     sign_in,
     12     update_plot_options,

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/plotly/plotly.py in <module>()
     27 from requests.compat import json as _json
     28 
---> 29 from plotly import exceptions, files, session, tools, utils
     30 from plotly.api import v1, v2
     31 from plotly.plotly import chunked_requests

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/tools.py in <module>()
     58 
     59 ipython_core_display = optional_imports.get_module('IPython.core.display')
---> 60 matplotlylib = optional_imports.get_module('plotly.matplotlylib')
     61 sage_salvus = optional_imports.get_module('sage_salvus')
     62 

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/optional_imports.py in get_module(name)
     21     if name not in _not_importable:
     22         try:
---> 23             return import_module(name)
     24         except ImportError:
     25             _not_importable.add(name)

/home/nbuser/anaconda3_23/lib/python3.4/importlib/__init__.py in import_module(name, package)
    107                 break
    108             level += 1
--> 109     return _bootstrap._gcd_import(name[level:], package, level)
    110 
    111 

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/matplotlylib/__init__.py in <module>()
     12 from __future__ import absolute_import
     13 
---> 14 from plotly.matplotlylib.renderer import PlotlyRenderer
     15 from plotly.matplotlylib.mplexporter import Exporter

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/matplotlylib/renderer.py in <module>()
     11 import warnings
     12 
---> 13 import plotly.graph_objs as go
     14 from plotly.matplotlylib.mplexporter import Renderer
     15 from plotly.matplotlylib import mpltools

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/graph_objs/__init__.py in <module>()
     12 from __future__ import absolute_import
     13 
---> 14 from plotly.graph_objs.graph_objs import *  # this is protected with __all__

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/graph_objs/graph_objs.py in <module>()
     32 import six
     33 
---> 34 from plotly import exceptions, graph_reference
     35 from plotly.graph_objs import graph_objs_tools
     36 

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/graph_reference.py in <module>()
    230 
    231 
--> 232 @utils.memoize()
    233 def _get_valid_attributes(object_name, parent_object_names):
    234     attributes = get_attributes_dicts(object_name, parent_object_names)

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/plotly/utils.py in memoize(maxsize)
    490         return result
    491 
--> 492     return decorator(_memoize)

/home/nbuser/anaconda3_23/lib/python3.4/site-packages/decorator.py in decorator(caller, func)
    226             callerfunc = caller
    227             doc = caller.__doc__
--> 228             fun = getfullargspec(callerfunc).args[0]  # first arg
    229         else:  # assume caller is an object with a __call__ method
    230             name = caller.__class__.__name__.lower()

IndexError: list index out of range

The Criteo Dataset is not Accessible

Trying to run

CREATE DATABASE IF NOT EXISTS criteo;
DROP TABLE IF EXISTS criteo.criteo_count;
CREATE TABLE criteo.criteo_count (
col1 string,col2 double,col3 double,col4 double,col5 double,col6 double,col7 double,col8 double,col9 double,col10 double,col11 double,col12 double,col13 double,col14 double,col15 string,col16 string,col17 string,col18 string,col19 string,col20 string,col21 string,col22 string,col23 string,col24 string,col25 string,col26 string,col27 string,col28 string,col29 string,col30 string,col31 string,col32 string,col33 string,col34 string,col35 string,col36 string,col37 string,col38 string,col39 string,col40 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION 'wasb://azuremlsampleexperiments.blob.core.windows.net/raw/count';

and getting
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException org.apache.hadoop.fs.azure.AzureException: Container criteo in account azuremlsampleexperiments.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.)
Am I missing something in the tutorial ?

Both JupyterLab and R-studio server have authentication errors

Hi,
I set up a DSVM (ubuntu) and logged into JupyterLab, I got an 'Insecure Connection' message on my browser with the following text:

my-dsvm.westeurope.cloudapp.azure.com:8000 uses an invalid security certificate. The certificate is not trusted because it is self-signed. The certificate is not valid for the name my-dsvm-r.westeurope.cloudapp.azure.com. Error code: SEC_ERROR_UNKNOWN_ISSUER

On R-studio-server, I get a message saying 'The connection is not secure. Login entered here may be compromised' and a link to here:
https://support.mozilla.org/en-US/kb/insecure-password-warning-firefox?as=u&utm_source=inproduct

Is there an option to use r studio server with https instead of http?

I'm using Firefox but this occurs on Chrome as well.

Thanks! keep up the good work.
Omri

Web Service Input and Output?

Do you guys have a sample -curl request for dealing with the Azure API? I know it would be a post request, but don't know how to send the input params

H2O performance comparison has major flaws

Hi, I saw some of the benchmarks blogged about here from a recent Strata presentation slidedeck.

There are major flaws in your benchmarking of H2O:

The point of using H2O's Sparkling Water (and rsparkling if you are using R) is to interact with data already in the Spark cluster. When you have data on disk, then you should be using the h2o.importFile() function (to do a parallel read from disk into the H2O cluster) and the h2o package for modeling. There is no need to use rsparkling at all.

Loading to disk into Spark, then from Spark into H2O is an unnecessary task and doing so misrepresents the computational efficiency of H2O as compared to the other tools in this benchmark. In the interest of honest & accurate benchmarking practices, it would be great if you could revise the benchmark to reflect this. If you have any questions on how to do this, please let me know.

All you need to do is load the data from disk using h2o.importFile() and then execute these rows
of the benchmark. You can also compute performance directly in H2O using h2o.performace() rather than generating predicted values using h2o.predict(), however there is nothing wrong with generating the predictions and calculating performance metrics manually, it's just faster if you use H2O's h2o.performance() function. To most efficiently write the predictions back to disk, you should be using the h2o.exportFile() function.

Where is source code for Microsoft implementation for Machine Learning Algorithms?

Where is the source code for the standard Machine Learning algorithms like you can use in java for WEKA? I am a .NET developer and a graduate student. I would like to use Microsoft tools both for software development and algorithm research in Machine Learning, but I have not seen where, if anywhere, the source code is for your standard Machine Learning algorithms (C4.5, I3, Forest, SVM, K-Means, Knn, and etc). I think this is a completely valid question especially in the light of open source software releases by other major companies such as Google.

Jupyter Notebook / Hub non-functional upon deployment

After deploying a DSVM for Linux via the Azure portal, I remoted into the machine via the x2go client. On the desktop, I double clicked the Jupyter icon which took me to https://localhost:8000/hub/login. The login screen appeared, but after logging in I was greeted by at 500 Internal Server Error.

The same issue occurs trying to access the VM remotely (via chrome on my local desktop.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.