microsoftlearning / mslearn-fabric Goto Github PK

This repository hosts content related to Microsoft Fabric content on Microsoft Learn.

Home Page: https://microsoftlearning.github.io/mslearn-fabric/

License: MIT License

Jupyter Notebook 100.00%

mslearn-fabric's Introduction

Get started with Microsoft Fabric

This repo contains the instructions and assets required to complete the exercises in the Get started with Microsoft Fabric learning path on Microsoft Learn.

Reporting issues

If you encounter any problems in the exercises, please report them as issues in this repo.

Note: While we welcome feedback on both the exercise content and the Fabric service itself, we can only support issues relating to the exercise content in this repo. Please report any technical issue or feedback relating to the Microsoft Fabric service itself at https://support.fabric.microsoft.com/support/.

mslearn-fabric's People

Contributors

Stargazers

Watchers

Forkers

charleyhanania shettyganeshprasad samsondada weslbo srushti-714 gurkamaldeep cloudlabs-moc parveenkrraina vhmaheshp fmussari yang-jiayi lponnam75 gianlucadardia dereknguyenio alexos mayankymca andrewdp23 stevewithington markfromtheinternet molenaar marcelmolenaarittraining didacloud kevinmosby annmalboj doytsujin shannonlindsay gabonia mana170183 tsendayushganbold nextdynamic wolf0nfire myereddy mdemenis prashun13 verargulla msbelindaallen open-word nachoalonsoportillo ievsantillan luconsta repoaicert2024 stekosan sophiexu hawthorne001 mennake code360in ephiax20 mihyunk yogendrav291978 parveengithubdevops kumarcpt25 rzqh leestott achinta-mondal arpitag1 nielsvdc rbaukma parveengithubdevops lopesdiego12 simacoder afelix-95 azureandsecurityotaku devbel0 qasimkhan5x votestgit renan-peres srjanakiraman importantnotice drroad jpcanamaque nicole-hong theresa-i ahad86 johnlahai2 okaforoa alex-hedley sekyiwaa jjpinto

mslearn-fabric's Issues

Additional Spaces

Module: Analyze data with Apache Spark

Lab/Demo: 02

Task: Load data into a dataframe

Step: 09

Description of issue

The code class is the wrong code class as all the rest of the python blocks use language-Python not language-python

Repro steps:

this is how it shows up in the html which is not how it is written in the markdown

    <pre><code class="language-python"> from pyspark.sql.types import *

 orderSchema = StructType([
    StructField("SalesOrderNumber", StringType()),
    StructField("SalesOrderLineNumber", IntegerType()),
    StructField("OrderDate", DateType()),
    StructField("CustomerName", StringType()),
    StructField("Email", StringType()),
    StructField("Item", StringType()),
    StructField("Quantity", IntegerType()),
    StructField("UnitPrice", FloatType()),
    StructField("Tax", FloatType())
    ])

 df = spark.read.format("csv").schema(orderSchema).load("Files/orders/*.csv")
 display(df)
</code></pre>

mslearn-fabric/Instructions/Labs 03-delta-lake

Module: 00

Lab/Demo: 03

Task: Use delta tables for streaming data

Step: 1

Step: 4

Description of issue

Need a forward slash concatenated before txt filename so that the file goes into Files/streaming.

Current behavior the file has the inputPath variable prepended to filename which goes into Files.

Repro steps:

Run step 1 and Step 4
Look at root Files
See txt files in root Files.

03b Create a medallion architecture in a Microsoft Fabric lakehouse

Module: 03b Create a medallion architecture in a Microsoft Fabric lakehouse

Lab/Demo:03b

Task: Create a semantic model

Step: 4

Description of issue
When Creating a Semantic Model I encounter this error.
"We couldn't create the see through model"

Real-Time Analytics exercise: error message using abfss to create KQL db

Module: Get started with Real-Time Analytics in Microsoft Fabric

Lab/Demo: Microsoft Learn Exercise

Task: Create a KQL database

Step: Paste the ABFS path to your sales.csv file, which you copied to the clipboard previously

*** This seem related to (but distinct from) a previous issue: Lab 7 Error importing from OneLake #18 ***

Description of issue
Using the abfss:// path did not work; it generated an error about being in the wrong format. I used the https:// path instead, which worked.

I also created a new "sales" sub-folder under Files in the Lakehouse for this exercise; I don't know if that was strictly speaking necessary, but it seemed consistent with prior exercises.

Repro steps:

Real-time Analytics
Home page
select KQL database
Get data
OneLake
Source in wizard - generates error about file path format when using abfss:// string copied from sales.csv property

Code issues in the lab 08d-data-science-batch

Lab: 08d-data-science-batch

Task: Apply the model to generate predictions

Step: 01

Description of issues:

table_name is not initialized in the code snippet which causes error on running the code cell. Moreover, df is passed as a parameter in the line of code df_test = model.transform(df) instead of df_test.

Solution:

Initialize table_name using table_name = "diabetes_test" and add it before the line of code df_test = spark.read.format("delta").load(f"Tables/{table_name}").
Replace df_test = model.transform(df) with df_test = model.transform(df_test).

ingest pipeline needs an update

Module: 00

Lab/Demo: 00

Task: 00

Step: 00

Description of issue
Authentication type: Basic is not allowing to go to the next step leaving user name and password blank
Repro steps:

Authentication kind :Anonymous not available

Lab/Demo: 4

Task: Create a pipeline

Description of issue

When creating the pipeline, the Authentication kind drop-down only has Basic as option, which requires a username and password combination. There is no Anonymous which can be used.

I tested this several times, and the result was the same.

Organize Fabric Lakehouse - Lab - Step 8 Error

Module: Organize a Fabric lakehouse using medallion architecture design

Lab/Demo: Excercise - Organize your Fabric lakehouse using a medallion architecture

Task: Transform data for gold layer

Step: 8

Description of issue: display output is referencing old data frame (i.e., step 5), not data frame we procured in this step. This does not effect usability (i.e., code is not 'broken' in that it gives an error.

Repro steps:

Load Lab
Work until Step 5
Note that the output is not the dfdimCustomer_silver dataframe we just built

Update Needed: Lackhouse Lab 01 Materials

Module: 00

Lab/Demo: 01

Task: 00

Step: 00

Description of issue:
The Lackhouse Lab 01 materials in the Microsoft Fabric Labs repository require an update. Specifically, the images need to be updated, and the Lackhouse Explorer file names have changed.

Repro steps:

Navigate to the Lackhouse Lab 01 materials in the repository.
Observe outdated images, material description, file name discrepancies like in Lackhouse Explorer, etc...

This update is crucial for a seamless learning experience.

Lab 5 has outdated instructions and screenshots

Module: Ingest Data with Dataflows Gen2 in Microsoft Fabric

Lab/Demo: 05

Task: Add data destination for Dataflow

Step: 1 & 2

It says monthno and OrderDate cannot be saved because of the data types. However, orderdate is already a date/time (probably derived implicitly when the custom column was created), so there's no issue with it.
In step 2, it tells you to cancel and go set the data types. However, the data type is now included in the screen, so there's no need to cancel, you can do it directly when creating the destination.

Relationship schema incorrect in diagram

Module:

Lab/Demo: 14 Create and explore a semantic model

Task: Create relationships between tables

Step: 6

Description of issue
The instructions to create relationships state "Create relationships to the Trip fact table from the Geography and Weather dimensions, repeating the step above. Also ensure these relationships are One to many, with one occurance of the key in the dimension table, and many in the fact table."

But the diagram underneath shows the relationship between Trip and Weather to be Many:1

Repro steps:

Lab 15 Imagine pathing is invalid

Lab/Demo: 15

Task: 00

Step: 00

Description of issue:

Invalid pathing for image

Repro steps:

change pathing structure of all images

15 Work with model relationships

Module: 15 Work with model relationships

Lab/Demo: 15

Task: Add another date table

Step: 07

Description of issue :
When choosing Close & Apply in Power Query, this error messages pop up in
2 Tables : Sales & Sales Order : "The key didn't macth any rows in the table".

Investigating it deeper, I found that the 2 queries (Sales and Sales Order) refer to vFactSales.

I have downloaded the AdventureWorksDW2022 database but didnt find any vFactSales View or Table. Please provide the backup the Database.

Unable to "Load to Tables" in Fabric tutorial using the provided sales.csv file

Module: mslearn-fabric

Lab/Demo: 01

Task: Load file data into a table

Step: 03

Description of issue
Update: This can be disregarded. After doing some data cleanup on the data provided in the tutorial, I was able to load to table. The link in the tutorial opened raw data in the browser window which, when put into excel created a bad .csv due to the comma's used in the Item field. I would recommend that the tutorial provide a download link to a clean .csv file.

Working through the MSLearn-fabric tutorial on the "Load file date into a table" step on this page (https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/01-lakehouse.html) I get a 400 error. It is possible that this is due to my changing some of the default tenant permissions on initial setup. I couldn't find a way to reset those to the default permissions. If that is a possible cause, then can anyone point me to a way to reset the permissions or a list of the defaults? If not, then I really don't know where to start. Thanks for any help.

Repro steps:

Multiple attempts over a couple of hours.

Lab8 and pandas loading generated code

Lab 8

Description of issue

Repro steps:

Copy the churn.csv to a subfolder named data/
load the churn.csv file with pandas with a Notebook from this step
FileNotFoundError

import pandas as pd
# Load data into pandas DataFrame from "/lakehouse/default/" + "Files/churn.csv"
df = pd.read_csv("/lakehouse/default/" + "Files/churn.csv")
display(df)

Generated code missed the subfolder data/

Leading spaces in code blocks

Module: Any

for instance https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/04-ingest-pipeline.html#create-a-notebook

Description of issue

Each line of the code blocks starts with a leading space. When you copy this and paste it into a notebook, the notebooks complains (not errors, but annoying warnings) and so do I. (See screenshot at the bottom).
You know what kind of people are working with this; you can't put them through this kind of eye sore.
I kindly ask you to remove these spaces at the beginning of the lines, to reduce overall stress levels.

Screenshot of the spaces:

The warning in the notebook:

DWH load: Carson Butler is (not) the (only) top customer for the Bike category

06a-data-warehouse-load
Step: Run analytical queries

One of the queries has following note:
Note: The results of this query show the top customer for each of the categories: Bike, Helmet, and Gloves, based on their total sales. For example, Carson Butler is the top customer for the Bike category.

However, this is not entirely true. There are actually 3 top customer for the Bike category. So the comment is confusing when learners see a different name:

Here's a screenshot when looking only at the Bike category:

Inaccurate/outdated information about Calculation Groups in 16-use-tools-to-optimize-power-bi-performance.md

Power BI Desktop gained native support for creating calculation groups in October 2023 with the introduction of the Model explorer view, so this section of 16-use-tools-to-optimize-power-bi-performance.md is now inaccurate/outdated.

Lab 7 Error importing from OneLake

Lab/Demo: 07

Task: Create a KQL database

Step: 3

Description of issue

When reaching the Schema part, an error pops up and cannot continue.
This happens when using both URL and ABFS links. I tested in separate workspaces with new environments created, and issue continues.

Here is the error :

Couldn't infer file schema. Error: Request is invalid and cannot be processed: { "error": { "code": "BadRequest", "message": "Request is invalid and cannot be executed.", "@type": "Kusto.Cloud.Platform.Storage.PersistentStorage.PersistentStoragePathInvalidHostnameException", "@message": "Persistent storage path 'msit-onelake.dfs.fabric.microsoft.com' is not a valid storage endpoint. (operation 'EnsureValidHostname')", "@context": { "timestamp": "2023-05-09T09:04:58.2799952Z", "serviceAlias": "KUTRIDENTHOSTERMSWCUS.TRD-SBD7ZAD44D7A82TP2X", "machineName": "KEngine00000Z", "processName": "Kusto.Engine", "processId": 6104, "threadId": 6288, "clientRequestId": "KusTrident;01b48bf8-6574-4a32-8eaf-83cefcd58e89", "activityId": "0fde9657-112b-470b-ab7b-4384a182b97c",

To be able to continue with the exercise I used to option of add file instead of OneLake

Link broken in "Get started with querying a Kusto database in Microsoft Fabric" scenario lab

hi team,
small one
https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/12-query-data-in-kql-database.html#create-a-kql-database
in "Create a KQL Database" section, in point number 4, there is a link broken (bottom of this screenshot):

Lab 6 - optimisation are enabled by default so cell code needs to disable them first.

Module: 00

Lab/Demo: 06

Task: Transform and load data to a Delta table

Step: 01

Description of issue
Optimisations used to be disabled by default but now are enabled by default. This cell needs to include code to disable them so the two tables are different One optimised and the other not).

Disable V-Order

spark.conf.set("spark.sql.parquet.vorder.enabled", "false")

Enable automatic Delta optimized write

spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "false")

Otherwise, running the lab as it is results in two tables that are both optimised.
Repro steps:

Connections from old deleted workspaces still show up

Lab/Demo: 04

Task: Create a pipeline

Step: 4

Description of issue

Repro steps: When creating a new connection for a pipeline, the old connections created in already deleted workspaces are still showing up.
as seen in this screenshot, some of them I created when I ran the review in March, while others were with the previous runs made in the same day. All the workspaces were deleted after I finished the lab.

I don´t know if this is standard behavior, or if these will disappear some time after deleting the Workspace, and if the inscriptions should be updated to reflect this.

05-dataflows-gen2: few mistakes

Lab: 05-dataflows-gen2

renaming a flow goes through the three points menu (not right click)
it wants you to remove or don't save a pbix file that was never created

Lab 03b - "Free user cannot apply model changes" error when trying to add relationships

Module: Create a medallion architecture in a Microsoft Fabric lakehouse

Lab/Demo: 03b

Task: Create a dataset

Step: 03

Description of issue
When trying to add the table relationships, I get an error:
Free User cannot apply model changes, upgrade to PowerBI Pro.

I verified the "data model settings" change in the earlier portion of the lab was applied.

https://learn.microsoft.com/en-us/fabric/get-started/fabric-trial#users-who-are-new-to-power-bi
I set this up on a new account, with no Power BI license, associated with an account I created in my existing Azure subscription. I followed these steps to enable my Fabric trial...so I would expect to have a Power BI Free license.

https://learn.microsoft.com/en-us/power-bi/consumer/end-user-features#feature-list
This table shows that I should be able to edit datasets in my workspace with a Free license.

It's not clear if the Power BI license & the Fabric license are separate, or if a (Fabric) Trial license should have the same functionality as a (Power BI) Free license.

If this is a service-side bug & not the expected behavior in Fabric, please let me know and I'll report it via their support channel.

Repro steps:

Follow steps in this section: https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/03b-medallion-lakehouse.html#create-a-dataset
Hit error when trying to add any relationship between dimension & fact tables

Lab 4: Pipeline containing notebook fails with error 2010

Module: 00

Lab/Demo: 04

Task: Modify the pipeline

Step: 07

While trying to run the pipeline containing a notebook, I get an error on the notebook activity saying "2010 User configuration issue. Hit unexpected exception, please retry.” Tried retrying and refreshing browser page. Tried removing the parameter specified in step 6, same error. I select the notebook from a drop-down list so not much else I can edit. The notebook runs fine when run from the notebook editor.

Repro steps:

Follow lab tutorial until Modify the pipeline step 7

FactSalesOrder vs FactOrderSales

06-data-warehouse.md
Step: Define a data model

Incorrect:
FactOrderSales.CustomerKey → DimCustomer.CustomerKey
FactOrderSales.SalesOrderDateKey → DimDate.DateKey

Correct:
FactSalesOrder.CustomerKey → DimCustomer.CustomerKey
FactSalesOrder.SalesOrderDateKey → DimDate.DateKey

Sensitivity level requirement

Lab/Demo: 6

Task: Create a Warehouse

Step: In the Data Warehouse home page, create a new Warehouse with a name of your choice. Don't specify a sensitivity level.

Description of issue

There is no option to create a new warehouse without specifying a sensitivity level. A security level needs to be selected in order to continue with the creation of the warehouse.

Lab 03b - sales - SCHEMA_NOT_FOUND

Module: Organize a Fabric lakehouse using medallion architecture design

Lab/Demo: 03b - Create a medallion architecture in a Microsoft Fabric lakehouse

Task: Transform data and load to silver Delta table

Step: 09

Description of issue
Cell execution fails

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[17], line 21
      3 from pyspark.sql.types import *
      4 from delta.tables import *
      6 DeltaTable.createIfNotExists(spark) \
      7     .tableName("sales.sales_silver") \
      8     .addColumn("SalesOrderNumber", StringType()) \
      9     .addColumn("SalesOrderLineNumber", IntegerType()) \
     10     .addColumn("OrderDate", DateType()) \
     11     .addColumn("CustomerName", StringType()) \
     12     .addColumn("Email", StringType()) \
     13     .addColumn("Item", StringType()) \
     14     .addColumn("Quantity", IntegerType()) \
     15     .addColumn("UnitPrice", FloatType()) \
     16     .addColumn("Tax", FloatType()) \
     17     .addColumn("FileName", StringType()) \
     18     .addColumn("IsFlagged", BooleanType()) \
     19     .addColumn("CreatedTS", DateType()) \
     20     .addColumn("ModifiedTS", DateType()) \
---> 21     .execute()

File /usr/hdp/current/spark3-client/jars/delta-core_2.12-2.4.0.8.jar/delta/tables.py:1330, in DeltaTableBuilder.execute(self)
   1321 @since(1.0)  # type: ignore[arg-type]
   1322 def execute(self) -> DeltaTable:
   1323     """
   1324     Execute Table Creation.
   1325 
   (...)
   1328     .. note:: Evolving
   1329     """
-> 1330     jdt = self._jbuilder.execute()
   1331     return DeltaTable(self._spark, jdt)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw)
    171 converted = convert_exception(e.java_exception)
    172 if not isinstance(converted, UnknownException):
    173     # Hide where the exception came from that shows a non-Pythonic
    174     # JVM exception message.
--> 175     raise converted from None
    176 else:
    177     raise

AnalysisException: [SCHEMA_NOT_FOUND] The schema `sales` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS.

Repro steps:

Follow Lab steps
Run cell
Error generated
Remove schema from tableName

Task: Transform data for gold layer

Step: 03

Description of issue
Cell execution fails

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[8], line 2
      1 # Load data to the dataframe as a starting point to create the gold layer
----> 2 df = spark.read.table("Sales.sales_silver")

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:471, in DataFrameReader.table(self, tableName)
    437 def table(self, tableName: str) -> "DataFrame":
    438     """Returns the specified table as a :class:`DataFrame`.
    439 
    440     .. versionadded:: 1.4.0
   (...)
    469     >>> _ = spark.sql("DROP TABLE tblA")
    470     """
--> 471     return self._df(self._jreader.table(tableName))

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw)
    171 converted = convert_exception(e.java_exception)
    172 if not isinstance(converted, UnknownException):
    173     # Hide where the exception came from that shows a non-Pythonic
    174     # JVM exception message.
--> 175     raise converted from None
    176 else:
    177     raise

AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `Sales`.`sales_silver` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.;
'UnresolvedRelation [Sales, sales_silver], [], false

Repro steps:

Follow Lab steps
Run cell
Error generated
Remove schema from tableName

Step: 04

Description of issue
Cell execution fails

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[14], line 13
      2 from delta.tables import*
      4 # Define the schema for the dimdate_gold table
      5 DeltaTable.createIfNotExists(spark) \
      6     .tableName("sales.dimdate_gold") \
      7     .addColumn("OrderDate", DateType()) \
      8     .addColumn("Day", IntegerType()) \
      9     .addColumn("Month", IntegerType()) \
     10     .addColumn("Year", IntegerType()) \
     11     .addColumn("mmmyyyy", StringType()) \
     12     .addColumn("yyyymm", StringType()) \
---> 13     .execute()

File /usr/hdp/current/spark3-client/jars/delta-core_2.12-2.4.0.8.jar/delta/tables.py:1330, in DeltaTableBuilder.execute(self)
   1321 @since(1.0)  # type: ignore[arg-type]
   1322 def execute(self) -> DeltaTable:
   1323     """
   1324     Execute Table Creation.
   1325 
   (...)
   1328     .. note:: Evolving
   1329     """
-> 1330     jdt = self._jbuilder.execute()
   1331     return DeltaTable(self._spark, jdt)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw)
    171 converted = convert_exception(e.java_exception)
    172 if not isinstance(converted, UnknownException):
    173     # Hide where the exception came from that shows a non-Pythonic
    174     # JVM exception message.
--> 175     raise converted from None
    176 else:
    177     raise

AnalysisException: [SCHEMA_NOT_FOUND] The schema `sales` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS.

Repro steps:

Follow Lab steps
Run cell
Error generated
Remove schema from tableName

Lab 17 Image pathing invalid

Lab/Demo: 17

Task: 00

Step: 00

Description of issue

pathing issue
invalid:

valid:

Repro steps:

edit pathing of image

Lab 4 - Delete Data activity doesn't have option to select Workspace

Module: Use Data Factory pipelines in Microsoft Fabric

Lab/Demo: 04

Task: Modify the pipeline

Step: 02

Description of issue
https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/04-ingest-pipeline.html
I have a trial tenant with several other workspaces in it. I followed the instructions within this one to create a new workspace. I followed the steps in order.

I'm attempting to add a Delete data activity to the pipeline. But the options for the "Source" tab listed in the lab don't match what I see in the UI.

If I try to create a new connection, it only gives me options for various file stores (ADLS gen 2, S3, Blob, etc.). There's no mechanism to connect to a Lakehouse.

The "copy data" task is correctly configured with the Destination and works.

Not clear if this is a temporary service issue or if something has changed related to the configuration of this activity.

Steps described in "Ingest data with a pipeline in Microsoft Fabric" don't seem to work

Module: 00

Lab/Demo: Ingest data with a pipeline in Microsoft Fabric

Task: Create a pipeline

Step: 12

Link: https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/04-ingest-pipeline.html#create-a-pipeline

Description of issue
Pipeline won't run. I got the following error message:
{ "errorCode": "2200", "message": "Table action 'Append' is not supported for this connector: 'Lakehouse' reason: 'Invalid format DelimitedTextFormat' ", "failureType": "UserError", "target": "Copy_52w", "details": [] }

I got it working by going to "Mapping" settings of the Pipeline and clicking "Import schemas".
I would like to suggest that, in case this is intended behaviour, this step is added to the course.

06c Monitor a data warehouse in Microsoft Fabric

Module: 06c Monitor a data warehouse in Microsoft Fabric

Lab/Demo: 06c

Task: Explore query insights

Step: 00

Description of issue
Before Step 2-6 may be add instruction to wait for 5 Minutes before continuing the steps, if not then nothing will show in the results

Typo in Lab 2 - Analyze Spark

Module: 00

Lab/Demo: 02

Task: Use the seaborn Library

Step: 05

Description of issue:
We are creating a line plot, but the comments indicate that it is a bar chart.

Repro steps:

Copy code from step 5
Notice that the comment indicates that it is a bar chart, when it is actually a line chart
Mistakes in the DP-600 labs.docx

14 Create and explore a semantic model

Module: 14 Create and explore a semantic model

Lab/Demo: 14

Task: Create a data warehouse and load sample data

Step: 5

Description of issue
Can not create New Semantic Model, with this error message :
"We couldn't create the see through model".

SQL Server Primary Key Constraint Syntax Error and Invalid Object Name in MS Fabric Tutorial

Module: 00

Lab/Demo: 00

Task: 00

Step: 00

I encountered a syntax error in the T-SQL script provided in the Microsoft Fabric tutorial for data warehousing in SQL Datawarehouse. The error pertains to the use of NOT ENFORCED in the primary key constraint syntax which is not recognized by SQL Server.

Additionally, the script references an object [ExternalData].[dbo].[staging_sales] which does not exist, leading to an 'Invalid object name' error.

Repro steps:

Execute the provided T-SQL script to create schemas, tables, and a view.
Encounter syntax error on NOT ENFORCED clause when adding a primary key constraint.
Encounter 'Invalid object name' error when creating the view [Sales].[Staging_Sales].

The errors were resolved by:

Removing the NOT ENFORCED clause from the primary key constraint definition as it is not a valid syntax in SQL Server.
Correcting the object name to match the actual Lakehouse name used in the exercise instead of [ExternalData].

Here is the link to the exercise where I encountered the issue:
MS Fabric Tutorial Exercise

Please update the tutorial to correct the syntax and ensure the object names are clearly explained to match the user's environment setup.

03b Medallion Lakehouse Transform Data and Load missing run cell step for upsert

The step by step instructions always include a "Run the cell to execute the code..." but Step 12 of Transform data and load to silver Delta table in module 03b "Create a medallion architecture in a Microsoft Fabric lakehouse" does not have this included.

Running the cell is an important part of the section as it populates the silver table in preparation for the following sections.

Lab 14 Imagine Pathing is invalid

Lab/Demo: 14

Task: 00

Step: 00

Description of issue:

Imagine pathing is incorrect using ".." for pathing compared to other labs using a single period.
Example:

Invalid:

Valid:

Repro steps:

change pathing

Create database for fabric_lakehouse is not permitted using Apache Spark in Microsoft Fabric.

Module: 01

Lab/Demo: 03

Task: Create a managed table

Step: 1

Description of issue

I got below error message
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Create database for fabric_lakehouse is not permitted using Apache Spark in Microsoft Fabric. )

Repro steps:

1.Follow all steps upto Create a managed table
2.Run the code

Lab 16 image pathing in invalid

Lab/Demo: 16

Task: 00

Step: 00

Description of issue

invalid imaging

Repro steps:

change pathing for image

Lab 08c - Copying from GitHub pages adds space

Module: 00

Lab/Demo: 08c

Task: All

Step: All

Description of issue
When you're copying code from the GitHub Pages site into Fabric notebook editor, it adds a space at the beginning of each line. The Python code syntax is not correct anymore.

The issue does not happen if we copy code from GitHub repository.

Repro seps:

Go to GitHub Pages site
Copy any code block
Open a new notebook in Fabric
Paste the content on a code cell

Need more guidance on which fields to use when creating relationships between tables

14-create-a-star-schema-model
Step: Create relationships between tables

It is not mentioned which fields should be selected for creating the relationships

"Create relationships to the Trip fact table from the Geography and Weather dimensions, repeating the step above. Also ensure these relationships are One to many, with one occurance of the key in the dimension table, and many in the fact table."

For Geography table: is it DropoffGeographyId or PickupGeographyId?
From Weather table: is it DateID?

Lab 12 - All steps in 'SELECT' example have the same number

Module: Query data from a Kusto Query database in Microsoft Fabric

Lab/Demo: 12 Query data in KQL Database

Task: SELECT data from our sample dataset using KQL

Description of issue
https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/12-query-data-in-kql-database.html

The markdown version of the page looks correct, but the HTML version of this page shows "1." for all of the numbered steps in this section and the ones below it.

I made changes in this private branch to make the formatting more consistent with other labs:

line break
4 spaces - start code block
3 spaces before each line of code
4 spaces - end code block
line break

But there were still some blocks, like 'order by' and 'where' that only had 1 step. But they didn't display correctly until a number was added to the text block. Investigated, but it's not clear to me why this would be required. That's why I didn't submit a PR for it.

LP03: Lakehouse creation fails with trial capacity located in Central US

Learning Path 03: Use delta tables in Apache Spark
Task: Create a lakehouse and upload data
step 1

https://github.com/MicrosoftLearning/mslearn-fabric/blob/7513f155c43635665f488ce6353146490d9f744b/Instructions/Labs/03-delta-lake.md#create-a-lakehouse-and-upload-data

When the trial capacity provisioned in previous steps is located in the Central US region, a lakehouse cannot be created at the above-referenced step. The message quoted below is displayed.

Upgrade to a paid Microsoft Fabric capacity
To work with Lakehouse, this workspace needs to use a Fabric enhanced
capacity. You can purchase a fabric capacity on the Azure portal using your
Azure subscription. Learn more.

Creating the lakehouse with a trial capacity in the North Central US region succeeds.

Per the below Microsoft doc, the Central US region is not a supported Fabric region.

https://learn.microsoft.com/en-us/fabric/admin/region-availability

The below document suggests that the "home region" viewable through the "About Microsoft Fabric" dialog should determine the location of the trial capacity. However, we observed that despite this home region being set to North Central US, trials still sometimes provision in Central US.

https://learn.microsoft.com/en-us/fabric/get-started/fabric-trial#considerations-and-limitations

Per this comment in another issue, there is a known Fabric issue relating to the Central US region. We are unaware of the details of this issue, but it may be relevant here.

#73 (comment)

Central US trial assignment appears to be somewhat random. We do not know of a method to explicitly set the location of a trial to be used with this lab. Avoiding Central US should prevent the issue, if a means of doing so exists.

Repro steps:

Complete tasks previous to "Create a lakehouse and upload data".
On the Fabric home page, navigate to Settings -> Admin portal -> Capacity settings -> Trial.
Check the region listed for the active trial capacity. If it is set to the Central US location, continue with "Create a lakehouse and upload data" to observe lakehouse creation failure. If set to another location, restart the lab with a fresh environment and repeat repro steps as needed until a Central US location is assigned.

Lab 06a invalid column name in query

Module: Load data into a Microsoft Fabric data warehouse

Lab/Demo: 06a Load data into a warehouse in Microsoft Fabric

Task: Run analytical queries

Step: 1

https://github.com/MicrosoftLearning/mslearn-fabric/blob/main/Instructions/Labs/06a-data-warehouse-load.md#run-analytical-queries

Description of issue

In the query

SELECT c.CustomerName, SUM(s.UnitPrice * s.Quantity) AS TotalSales
FROM Sales.Fact_Sales s
JOIN Sales.Dim_Customer c
ON s.SalesOrderNumber = c.SalesOrderNumber
WHERE YEAR(s.OrderDate) = 2021
GROUP BY c.CustomerName
ORDER BY TotalSales DESC;

the column to join tables on should be CustomerID; Dim_Customer does not have the column SalesOrderNumber.

Repro steps:

Follow the lab instructions.

Organize Fabric Lakehouse - Lab - Step 14 Error

Module: Organize a Fabric lakehouse using medallion architecture design

Lab/Demo: Excercise - Organize your Fabric lakehouse using a medallion architecture

Task: Transform data for gold layer

Step: 14

Description of issue
Error when running script from training (see attached image). Solution was to unindent lines 5 through 24

Repro steps:

Work until Step 14
Run Step 14
See Error
solution: Unindent lines 5 though 24

Fabric Tutorial loading sales csv file doesn't work

Following this tutorial there are a number of issues loading the sales csv file from the lake to table
Tutorial: https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/01-lakehouse.html

Keep getting the following error when trying to load the file to table:

org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Create database for GMLakehouse002 is not permitted using Apache Spark in Microsoft Fabric. )

Have tried creating new Fabric Workspace, editing the csv before import, smaller files etc.
The file loads ok into the Lake but every time load to Tables fails with the error above.

Lab 08C Unexpected Indent

Module: Train and track machine learning models with MLflow in Microsoft Fabric

Lab/Demo: 08C

Task: Load data into a dataframe

Step: 01

Description of issue

There is an unexpected indent in the first code block that causes an error when attempting to run. See screenshot.

Repro steps:

Navigate to https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/08c-data-science-train.html
Copy the Load data into a dataframe - step. 1 code.
Paste Code to notebook and attempt to run.

#10-ingest-notebooks.md Optimize Delta table writes task does not make sense

Module: Ingest data with Spark and Microsoft Fabric notebooks

Lab/Demo: 10 - Ingest data with Spark and Microsoft Fabric notebooks

Task: Optimize Delta table writes

Step: 00

Link to Lab Instructions: https://github.com/MicrosoftLearning/mslearn-fabric/blob/main/Instructions/Labs/10-ingest-notebooks.md#optimize-delta-table-writes

In my opinion the task for optimizing delta table doesn't make sense.
The results of the "Create a Fabric notebook and load external data" steps are cached so the execution of the same steps will be a lot faster, no matter what Spark Config Settings we set.
Besides that the Spark Config settings, which the tasks wants to show are anyway activated by default: https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?tabs=pyspark
So the code in Optimize Delta table writes does in the end the exact same thing as the code from "Create a Fabric notebook and load external data".

Instead of that it would probably make more sense to compare the already optimized code from "Create a Fabric notebook and load external data" with a version where we change the Spark Config Optimization Parameters to false after restarting the session. To have comparable results.

But besides that my understanding of the two Spark config Optimization Parameters is that writing takes ~15% longer when they are activated, so the statement "Now, take note of the run times for both code blocks. Your times will vary, but you can see a clear performance boost with the optimized code." is not true and it should even take longer with the optimized code. Only when we read the data we should see the performance boost

Create a Lakehouse — There is no Synapse Data Engineering

Module: 01 Create a Lakehouse

Lab/Demo: 01

Task: 01

Step: 01

Description of issue

On the Microsoft Fabric home page, select Synapse Data Engineering.

There is no Synapse Data Engineering on the Microsoft Fabric home page.

Repro steps:

Activate a Microsoft Fabric trial
Create a workspace
1. verify that workspace is a Fabric workspace
  1. The workspace title is followed by a 💎 ("Fabric Content" appears when putting mouse over the diamond)
  2. The license in workspace settings is a Fabric license (... ▸ Workspace settings ▸ Premium)
    - attempted two license types, unable to access with either:
      - license is "Trial"
      - license is a "Fabric capacity" with a Fabric capacity created in Azure
2. Zone: Central US (Iowa)
  1. Central US doesn't appear to be listed but North Central and South Central are https://learn.microsoft.com/en-us/fabric/admin/region-availability