Giter VIP home page Giter VIP logo

synapse's Introduction

page_type languages products description urlFragment
sample
csharp
dotnet
Samples for Azure Synapse AnalyticS
update-this-to-unique-url-stub

Samples for Azure Synapse Analytics

Resources

Contents

Scenario-based Samples

Tweet Analysis

Shows .NET for Spark and shared metadata experience between Spark created tables and SQL.

ADF to Synapse Migration Tool

The ADF to Synapse Migration Tool (currently PowerShell scripts) enables you to migrate Azure Data Factory pipelines, datasets, linked service, integration runtime and triggers to a Synapse Analytics Workspace.

Contributing

This project welcomes contributions and suggestions. See the Contributor's guide

synapse's People

Contributors

bamurtaugh avatar charithcaldera avatar chugugrace avatar coolswatish avatar fonsecasergio avatar hristinajilova avatar jocapc avatar jovanpop-msft avatar kaiyuezhou avatar krutikasheth1029 avatar laserljy avatar lijing29 avatar matt1883 avatar microsoftopensource avatar mikerys avatar mlevin19 avatar nelgson avatar niharikadutta avatar nirav2 avatar rapoth avatar roalexan avatar rodrigossz avatar ruixinxu avatar saveenr-msft avatar shunderpooch avatar silanwang avatar snehagunda avatar tomtal avatar yaelschuster avatar yifansongms avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

synapse's Issues

ModuleNotFoundError: No module named 'org.apache.spark.sql.SqlAnalyticsConnector'

Hi There,
I was following the Notebook
Code


%%pyspark 

import org.apache.spark.sql.SqlAnalyticsConnector._
import com.microsoft.spark.sqlanalytics.utils.Constants
spark_read = spark.read.sqlanalytics("Built-in.dbo.LogisticsPostalAddress")
spark_read.show(5, truncate = false)

Output -
image

Note - I am using Azure Synapse Notebooks. Isn't this module should be already installed in the notebook or if not then how i can install it?

Spark Job in Synapse cannot be viewed in monitoring portal - Error Message is Fetching Failed

It is rare and intermittent but there are times when the monitoring portal in Azure Synapse will misbehave and will not show me the details about a completed spark job. Instead, it displays an error message that says "Fetching Failed". Screenshot.

enter image description here

I have not yet found a pattern or explanation. I reported the problem to CSS support but they are not yet familiar with the error. I suspect it is a timeout on an internal resource, like a spark history server or something like that.

I realize that some parts of the Synapse platform are proprietary but it borrows significantly from OSS spark. Does anyone have an idea what might take so long, when retrieving the U/I for a completed livy batch? Is it Azure storage accounts that are performing badly? Or is it a "spark history server"? Is there any reason why they wouldn't wait indefinitely for a response (eg. ten mins)? Whenever this happens the U/I seems to fail after a short period of time (only ~60 seconds or so). I haven't found any other patterns. As you can see above, the error message is nothing more than a small tooltip shown in the upper right of the screen; when I shared with CSS they weren't able to provide any additional guidance or explanation. So I'm hoping there are synapse users on stack overflow who have encountered this.

Side: When things are working properly, the spark job is 
presented with the related jobs/stages/tasks/logs like so:

enter image description here

Write to existing Synapse table

From Synapse's Apache Spark pool, Is it possible to write to an existing internal/external table? All examples are related to creating and loading the data to a new table.  Even though the Synapse pipeline runs on spark, how does it manage to select/update? To have the same list of features through Pyspark/Scala, do we need to switch to Databricks instead of Apache spark pool comes with Synapse?

File Format "NativeParquet" is never used in SampleDB.sql

NativeParquet is implemented as a file format, but when the view parquet.YellowTaxi is created, it refers to FORMAT='PARQUET'.

There does not seem to be a reason for it to exist, and it confuses the reader, when they are trying to get up to speed with using the platform.
Is there any benefit to creating a custom file format rather than refering to 'PARQUET'?

Secure string parameter is taking the default value `**********` when running the Fabric Notebook from pipeline

The secure string parameter is taking a default value during the pipeline run like ********** even though some default value was given in the pipeline parameters section.

See the parameter length in the below image.

Thus, the notebook also taking the same value.

When it is edited at the pipeline start, it is taking the original value.

Whereas, same scenario is working fine in normal synapse notebook and pipeline.

Creating Workspace in DevOps with SPN leaves workspace inaccessible

I have created an ARM Template/Parameter files that deploys the workspace. After deploying using an SPN in DevOps, I get the error: "You need permission to access workspace" when trying to access the workspace. I am configured as the Active Directory Admin on the resource, and owner of the resource in Access Control. Firewall rules allows all IP addresses.

Using the same exact ARM Template/Parameter files, but deploying using the Azure Portal leaves the workspace in the correct state. I am able to access the studio as expected.

The SPN that deploys the ARM template has been granted the Subscription Owner role.

I can not detect any difference in what has been deployed.

Is there a known issue with deploying Synapse with DevOps / SPN?

Pipeline migration not working

When running the importADFtoSynapseTool.ps1 script I have a problem where all of the resources except the pipelines gets migrated into the Synapse workspace. The script outputs that all four of the pipelines is successfully migrated to the synapse workspace, but are nowhere to be found. I also get the warning bellow, but I'm unsure if this actually has any effect on the migration.

Self-Hosted (Linked) Integration Runtime with the following name will be filtered and will NOT be migrated: corporatesynergiIR

ADF to Synapse Pipeline migration powershell bug

PowerShell for ADF to Synapse Pipeline migration, but there is a problem with garbled characters in case of non-English multibyte character codes.

https://github.com/Azure-Samples/Synapse/blob/main/Pipelines/ImportADFtoSynapse/importADFtoSynapseTool.ps1

Please make the following modifications to avoid the JSON character encoding issue.

$uri = "$destUri/$resourceType/$($_.name)?api-version=$($config.SynapseWorkspace.apiVersion)";
$jsonBody = ConvertTo-Json $_ -Depth 30
$jsonBody =[Text.Encoding]::UTF8.GetBytes($jsonBody)
$name = $_.name

Please add this code.
$jsonBody =[Text.Encoding]::UTF8.GetBytes($jsonBody)

Thanks.

This request is not authorized to perform this operation using this permission

Hi,

I'm trying to run the loading of the data into Spark DataFrames but have the attached issue.
"Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, "
synapse_issue

I've added my AAD account as Contributor on the ADLS Gen2 : NOK
I've added the Synapse workspace as Contributor on the ADLS Gen2 : NOK
I've added my account as Contributor on the subscription : NOK
I've added the Synapse workspace as Contributor on the subscription : NOK

Many thanks for your help !

ONNX conversion in notebook not working

The onnx conversion in the notebook tutorial-predict-nyc-taxi-tips-onnx.ipynb is not working. I'll get the error "ValueError: You passed in an iterable attribute but I cannot figure out its applicable type."

[Scala][Azure Synapse]: Using MSI access token unable to retrieve data from Azure sql server db table data.

Linked services configured for sql db through System assigned identity as shown in below image

image

Below find note book (in scala) in Azure synapse

import com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
import java.util.Properties

val jdbcHostname = ".sql.azuresynapse.net"
val jdbcPort = 1433
val jdbcDatabase = ""

// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

// Create a Properties() object to hold the parameters.
val connectionProperties = new Properties()

// Driver that can also be observed in the log when using the 'native' Synapse SQL way.
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
connectionProperties.setProperty("Driver", driverClass)

// Create a linked server to your dedicated pool with a Manged Identity
connectionProperties.setProperty("accessToken", mssparkutils.credentials.getConnectionStringOrCreds("samplesqllink"))

// Define your query
val pushdown_query = "(select top 10 ID from dbo.tblEmployees) data_alias"
val df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
display(df)

Here getting this error

com.microsoft.sqlserver.jdbc.SQLServerException: Login failed for user ''. ClientConnectionId:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/

But with User name and password it is working fine.

Please help here did I miss anything

Datetime on pyspark.pandas not working correctly

ps.date_range(start='1/1/2018', periods=5, freq='M') is skipping a month
DatetimeIndex(['2018-01-31', '2018-03-31', '2018-05-31', '2018-02-28',
'2018-04-30'],
dtype='datetime64[ns]', freq=None)

[Synapse - Apache Spark definition] - Python sample to copy local files to wabs

Hi, It would be nice to have a sample code to copy files from local directories to wabs.

I am developing a Python script in a Spark definition and I would like to copy a local file already created in the temp directory to wasb.

I am trying to use mssparkutils (Maybe this is my error):

local_file = '/tmp/a.txt'
azure_file = 'wasbs://<container>@<storage_account>.blob.core.windows.net/outputs/a.txt'
from notebookutils import mssparkutils
mssparkutils.fs.cp(local_file, azure_file, True)

But, I get error:

Current content of Temp folder:
   /tmp/eea_discodata_task-9c3df798c1ff11ebaedc000d3ab6dc78.log.json
   /tmp/ca-certificates.tmp.3cqtU8
Traceback (most recent call last):
  File "datapipelineapp.py", line 179, in <module>
    main()
  File "datapipelineapp.py", line 158, in main
    mssparkutils.fs.mv(file_rs, dataset_fn, True)
  File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/notebookutils/mssparkutils/fs.py", line 12, in mv
    return fs.mv(src, dest, create_path, overwrite)
  File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/notebookutils/mssparkutils/handlers/fsHandler.py", line 56, in mv
    return self.fsutils.mv(src, dest, create_path, overwrite)
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:mssparkutils.fs.mv.
: java.io.FileNotFoundException: /tmp/eea_discodata_task-9c3df798c1ff11ebaedc000d3ab6dc78.log.json
	at com.microsoft.spark.notebook.msutils.impl.MSFsUtilsImpl.mv(MSFsUtilsImpl.scala:228)
	at mssparkutils.fs$.mv(fs.scala:20)
	at mssparkutils.fs.mv(fs.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

I see that dbutils package is not available in this context, it is used in Databricks environments.
Thanks in advance

Malformed URL Links - https://github.com/Azure-Samples/Synapse/tree/master/Notebooks/PySpark/Synapse%20Link%20for%20Cosmos%20DB%20samples/IoT

01-CosmosDBSynapseStreamIngestion: Ingest streaming data into Azure Cosmos DB collection using Structured Streaming - 404
02-CosmosDBSynapseBatchIngestion: Ingest Batch data into Azure Cosmos DB collection using Azure Synapse Spark - 404
03-CosmosDBSynapseJoins: Perform Joins and aggregations across Azure Cosmos DB collections using Azure Synapse Link - 404
04-CosmosDBSynapseML: Perform Anomaly Detection using Azure Synapse Link and Azure Cognitive Services on Synapse Spark (MMLSpark) - 404

Correct URL Links on this page - https://github.com/Azure-Samples/Synapse/tree/master/Notebooks/PySpark/Synapse Link for Cosmos DB samples

Synapse Intelligent Cache is not working

Env:
--Spark 3.2
--Synapse-premium
--Intelligent caching enabled.

Intelligent cache enabled
image

Inital content of CSV file

image

read successfully

image

added more content later

image

the same result was shown even after running df1.take(100) several times instead of displaying the newly records added to it.

image

The issue was resolved after rerunning the spark.read again

it would be nice to get up to date step by step instructions for .net example

Just downloaded and unpacked "sample files for dotnet.zip" from under Synapse/Spark/DotNET/.

I had a sight hope on the off chance it might contain something like instructions and a working solution file for Visual Studio ( community edition would work perfectly well ).

Hope this is good enough description of an issue. Please let me know, I would be happy to better describe the issue at hand.

The information on environment construction is incorrect

Hello,

It is presumed that the content of "Let's get the environment ready" described in the following document is incorrect.

Creating requirements.txt as instructed and applying it to the Spark pool does not install pymongo.
Also, there was no mention of pymongo in the output result of the libraries described in the document.

I'm asking a similar question to Microsoft Q & A and I'm currently investigating the cause.

Please confirm and investigate.

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.