Giter VIP home page Giter VIP logo

spark-monitoring's Introduction

Monitoring Azure Databricks in an Azure Log Analytics Workspace

This repository extends the core monitoring functionality of Azure Databricks to send streaming query event information to Azure Log Analytics. It has the following directory structure:

/src
  /spark-jobs
  /spark-listeners-loganalytics
  /spark-listeners
  /pom.xml

The spark-jobs directory is a sample Spark application with sample code demonstrating how to implement a Spark application metric counter.

The spark-listeners-loganalytics and spark-listeners directories contain the code for building the two JAR files that are deployed to the Databricks cluster. The spark-listeners directory includes a scripts directory that contains a cluster node initialization script to copy the JAR files from a staging directory in the Azure Databricks file system to execution nodes.

The pom.xml file is the main Maven project object model build file for the entire project.

Build the Azure Databricks monitoring library and configure an Azure Databricks cluster

Before you begin, ensure you have the following prerequisites in place:

Build the Azure Databricks monitoring library

To build the Azure Databricks monitoring library, follow these steps:

  1. Import the Maven project project object model file, pom.xml, located in the /src folder into your project. This will import three projects:
  • spark-jobs
  • spark-listeners
  • spark-listeners-loganalytics
  1. Execute the Maven package build phase in your Java IDE to build the JAR files for each of the these three projects:
Project JAR file
spark-jobs spark-jobs-1.0-SNAPSHOT.jar
spark-listeners spark-listeners-1.0-SNAPSHOT.jar
spark-listeners-loganalytics spark-listeners-loganalytics-1.0-SNAPSHOT.jar
  1. Use the Azure Databricks CLI to create a directory named dbfs:/databricks/monitoring-staging:
dbfs mkdirs dbfs:/databricks/monitoring-staging
  1. Open the /src/spark-listeners/scripts/listeners.sh script file and add your Log Analytics Workspace ID and Key to the lines below:
export LOG_ANALYTICS_WORKSPACE_ID=
export LOG_ANALYTICS_WORKSPACE_KEY=
  1. Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/listeners.sh to the directory created in step 3:
dbfs cp <local path to listeners.sh> dbfs:/databricks/monitoring-staging/listeners.sh
  1. Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/metrics.properties to the directory created in step 3:
dbfs cp <local path to metrics.properties> dbfs:/databricks/monitoring-staging/metrics.properties
  1. Use the Azure Databricks CLI to copy spark-listeners-1.0-SNAPSHOT.jar and spark-listeners-loganalytics-1.0-SNAPSHOT.jar that were built in step 2 to the directory created in step 3:
dbfs cp <local path to spark-listeners-1.0-SNAPSHOT.jar> dbfs:/databricks/monitoring-staging/spark-listeners-1.0-SNAPSHOT.jar
dbfs cp <local path to spark-listeners-loganalytics-1.0-SNAPSHOT.jar> dbfs:/databricks/monitoring-staging/spark-listeners-loganalytics-1.0-SNAPSHOT.jar

Create and configure the Azure Databricks cluster

To create and configure the Azure Databricks cluster, follow these steps:

  1. Navigate to your Azure Databricks workspace in the Azure Portal.
  2. On the home page, click "new cluster".
  3. Choose a name for your cluster and enter it in "cluster name" text box.
  4. In the "Databricks Runtime Version" dropdown, select 5.0 or later (includes Apache Spark 2.4.0, Scala 2.11).
  5. Under "Advanced Options", click on the "Init Scripts" tab. Go to the last line under the "Init Scripts section" Under the "destination" dropdown, select "DBFS". Enter "dbfs:/databricks/monitoring-staging/listeners.sh" in the text box. Click the "add" button.
  6. Click the "create cluster" button to create the cluster. Next, click on the "start" button to start the cluster.

More information

For more information about using this library to monitor Azure Databricks, see Monitoring Azure Databricks

spark-monitoring's People

Contributors

atoakley avatar nithinpnp avatar petertaylor9999 avatar rohitsharma-pnp avatar veronicawasson avatar algattik avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.