Giter VIP home page Giter VIP logo

turbot / steampipe-plugin-databricks Goto Github PK

View Code? Open in Web Editor NEW
7.0 9.0 2.0 422 KB

Use SQL to instantly query Databricks resources. Open source CLI. No DB required.

Home Page: https://hub.steampipe.io/plugins/turbot/databricks

License: Apache License 2.0

Makefile 0.05% PLSQL 0.94% Go 99.01%
databricks postgresql sql steampipe steampipe-plugin hacktoberfest postgresql-fdw backup etl sqlite zero-etl

steampipe-plugin-databricks's Introduction

image

Databricks Plugin for Steampipe

Use SQL to query clusters, jobs, users, and more from Databricks.

Quick start

Install

Download and install the latest Databricks plugin:

steampipe plugin install databricks

Configure your credentials and config file.

Configure your account details in ~/.steampipe/config/databricks.spc:

connection "databricks" {
  plugin = "databricks"

  # A connection profile specified within .databrickscfg to use instead of DEFAULT.
  # This can also be set via the `DATABRICKS_CONFIG_PROFILE` environment variable.
  # profile = "databricks-dev"

  # The target Databricks account ID.
  # This can also be set via the `DATABRICKS_ACCOUNT_ID` environment variable.
  # See Locate your account ID: https://docs.databricks.com/administration-guide/account-settings/index.html#account-id.
  # account_id = "abcdd0f81-9be0-4425-9e29-3a7d96782373"

  # The target Databricks account SCIM token.
  # See: https://docs.databricks.com/administration-guide/account-settings/index.html#generate-a-scim-token
  # This can also be set via the `DATABRICKS_TOKEN` environment variable.
  # account_token = "dsapi5c72c067b40df73ccb6be3b085d3ba"

  # The target Databricks account console URL, which is typically https://accounts.cloud.databricks.com.
  # This can also be set via the `DATABRICKS_HOST` environment variable.
  # account_host = "https://accounts.cloud.databricks.com/"

  # The target Databricks workspace Personal Access Token.
  # This can also be set via the `DATABRICKS_TOKEN` environment variable.
  # See: https://docs.databricks.com/dev-tools/auth.html#databricks-personal-access-tokens-for-users
  # workspace_token = "dapia865b9d1d41389ed883455032d090ee"

  # The target Databricks workspace URL.
  # See https://docs.databricks.com/workspace/workspace-details.html#workspace-url
  # This can also be set via the `DATABRICKS_HOST` environment variable.
  # workspace_host = "https://dbc-a1b2c3d4-e6f7.cloud.databricks.com"

  # The Databricks username part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS).
  # This can also be set via the `DATABRICKS_USERNAME` environment variable.
  # username = "[email protected]"

  # The Databricks password part of basic authentication. Only possible when Host is *.cloud.databricks.com (AWS).
  # This can also be set via the `DATABRICKS_PASSWORD` environment variable.
  # password = "password"

  # A non-default location of the Databricks CLI credentials file.
  # This can also be set via the `DATABRICKS_CONFIG_FILE` environment variable.
  # config_file_path = "/Users/username/.databrickscfg"
  
  # OAuth secret client ID of a service principal
  # This can also be set via the `DATABRICKS_CLIENT_ID` environment variable.
  # client_id = "123-456-789"

  # OAuth secret value of a service principal
  # This can also be set via the `DATABRICKS_CLIENT_SECRET` environment variable.
  # client_secret = "dose1234567789abcde"
}

Or through environment variables:

export DATABRICKS_CONFIG_PROFILE=user1-test
export DATABRICKS_TOKEN=dsapi5c72c067b40df73ccb6be3b085d3ba
export DATABRICKS_HOST=https://accounts.cloud.databricks.com
export DATABRICKS_ACCOUNT_ID=abcdd0f81-9be0-4425-9e29-3a7d96782373
export [email protected]
export DATABRICKS_PASSWORD=password
export DATABRICKS_CLIENT_ID=123-456-789
export DATABRICKS_CLIENT_SECRET=dose1234567789abcde

Run steampipe:

steampipe query

List details of your Databricks clusters:

select
  cluster_id,
  title,
  cluster_source,
  creator_user_name,
  driver_node_type_id,
  node_type_id,
  state,
  start_time
from
  databricks_compute_cluster;
+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+
| cluster_id           | title                          | cluster_source | creator_user_name | driver_node_type_id | node_type_id | state      | start_time                |
+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+
| 1234-141524-10b6dv2h | [default]basic-starter-cluster | "API"          | [email protected]   | i3.xlarge           | i3.xlarge    | TERMINATED | 2023-07-21T19:45:24+05:30 |
| 1234-061816-mvns8mxz | test-cluster-for-ml            | "UI"           | [email protected]   | i3.xlarge           | i3.xlarge    | TERMINATED | 2023-07-28T11:48:16+05:30 |
+----------------------+--------------------------------+----------------+-------------------+---------------------+--------------+------------+---------------------------+

Engines

This plugin is available for the following engines:

Engine Description
Steampipe The Steampipe CLI exposes APIs and services as a high-performance relational database, giving you the ability to write SQL-based queries to explore dynamic data. Mods extend Steampipe's capabilities with dashboards, reports, and controls built with simple HCL. The Steampipe CLI is a turnkey solution that includes its own Postgres database, plugin management, and mod support.
Postgres FDW Steampipe Postgres FDWs are native Postgres Foreign Data Wrappers that translate APIs to foreign tables. Unlike Steampipe CLI, which ships with its own Postgres server instance, the Steampipe Postgres FDWs can be installed in any supported Postgres database version.
SQLite Extension Steampipe SQLite Extensions provide SQLite virtual tables that translate your queries into API calls, transparently fetching information from your API or service as you request it.
Export Steampipe Plugin Exporters provide a flexible mechanism for exporting information from cloud services and APIs. Each exporter is a stand-alone binary that allows you to extract data using Steampipe plugins without a database.
Turbot Pipes Turbot Pipes is the only intelligence, automation & security platform built specifically for DevOps. Pipes provide hosted Steampipe database instances, shared dashboards, snapshots, and more.

Developing

Prerequisites:

Clone:

git clone https://github.com/turbot/steampipe-plugin-databricks.git
cd steampipe-plugin-databricks

Build, which automatically installs the new version to your ~/.steampipe/plugins directory:

make

Configure the plugin:

cp config/* ~/.steampipe/config
vi ~/.steampipe/config/databricks.spc

Try it!

steampipe query
> .inspect databricks

Further reading:

Open Source & Contributing

This repository is published under the Apache 2.0 (source code) and CC BY-NC-ND (docs) licenses. Please see our code of conduct. We look forward to collaborating with you!

Steampipe is a product produced from this open source software, exclusively by Turbot HQ, Inc. It is distributed under our commercial terms. Others are allowed to make their own distribution of the software, but cannot use any of the Turbot trademarks, cloud services, etc. You can learn more in our Open Source FAQ.

Get Involved

Join #steampipe on Slack →

Want to help but don't know where to start? Pick up one of the help wanted issues:

steampipe-plugin-databricks's People

Contributors

dependabot[bot] avatar karanpopat avatar khushboo9024 avatar madhushreeray30 avatar misraved avatar nfx avatar rinzool avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

rinzool nfx

steampipe-plugin-databricks's Issues

Add initial tables

Add initial tables:

  • databricks_account_budget
  • databricks_account_credential
  • databricks_account_custom_app_integration
  • databricks_account_user
  • databricks_account_encryption_keys
  • databricks_account_group
  • databricks_account_ip_access_lists
  • databricks_account_log_delivery
  • databricks_account_metastore_assignments
  • databricks_account_metastore
  • databricks_account_network
  • databricks_account_published_app_integration
  • databricks_account_storage
  • databricks_account_vpc_endpoint
  • databricks_account_workspace
  • databricks_workspace_alert
  • databricks_workspace_catalog
  • databricks_workspace_cluster
  • databricks_workspace_cluster_policy
  • databricks_workspace_connection
  • databricks_workspace_current_user
  • databricks_workspace_dashboard
  • databricks_workspace_data_source
  • databricks_workspace_dbfs
  • databricks_workspace_experiment
  • databricks_workspace_external_location
  • databricks_workspace_function
  • databricks_workspace_git_credential
  • databricks_workspace_global_init_script
  • databricks_workspace_group
  • databricks_workspace_instance_pool
  • databricks_workspace_instance_profile
  • databricks_workspace_ip_access_list
  • databricks_workspace_job
  • databricks_workspace_job_run
  • databricks_workspace_metastore
  • databricks_workspace_model
  • databricks_workspace_pipeline
  • databricks_workspace_pipeline_event
  • databricks_workspace_pipeline_update
  • databricks_workspace_user
  • databricks_workspace_webhook

Allow OAuth login in config

Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
I would like to have the possibility to use OAuth Authentication.
This way we can use Databricks plugin in automatic dashboard, without using secure-less tokens.

Databricks GO sdk allow to use ClientID / ClientSecret: https://pkg.go.dev/github.com/databricks/[email protected]/config#Config

Describe alternatives you've considered

Additional context
I just begin to develop the feature in this PR: #6
It should be quite simple 👌

List job run in error if job_id doesn't exist

Describe the bug
When performing a query on databricks_job_run table, if there is a condition onjob_id and the job_id does not exists, an error is thrown.

This can be annoying when performing query on aggregated connection with condition like job_id IN (...)

Steampipe version (steampipe -v)
v0.21.2

Plugin version (steampipe plugin list)
v0.3.0

To reproduce
Steps to reproduce the behavior (please include relevant code and/or commands).
Run a query with a job_id that does not exists like

> select * from databricks.databricks_job_run where job_id = 42

Error: databricks_prod_videoland: Job 42 does not exist. (SQLSTATE HV000)

Expected behavior
If the job does not exist we expect to have no results instead

> select * from databricks.databricks_job_run where run_id = 42
+--------+----------+----------------+------------------+-------------------+----------+--------------------+--------+---------------+-------------------------+--------------+--------------+----------+----------------+------------+---------+----------->
| run_id | run_name | attempt_number | cleanup_duration | creator_user_name | end_time | execution_duration | job_id | number_in_job | original_attempt_run_id | run_duration | run_page_url | run_type | setup_duration | start_time | trigger | cluster_in>
+--------+----------+----------------+------------------+-------------------+----------+--------------------+--------+---------------+-------------------------+--------------+--------------+----------+----------------+------------+---------+----------->
+--------+----------+----------------+------------------+-------------------+----------+--------------------+--------+---------------+-------------------------+--------------+--------------+----------+----------------+------------+---------+----------->

Additional context
This is causes by Databricks SDK. If we provide a JobId (done here for that table), Databricks API return an error if the job does not exists.

A possible solution could be to return nil, nil if we have this exact error

databricks_iam_account_user & databricks_iam_account_group Errors

Describe the bug
Databricks tables databricks_iam_account_user & databricks_iam_account_group generate errors as below:
image

Steampipe version (steampipe -v)
v0.20.10

Plugin version (steampipe plugin list)
hub.steampipe.io/plugins/turbot/databricks@latest | 0.0.1

To reproduce
I did not do anything out of ordinary, I simply ran
select * from databricks_all.databricks_iam_account_user
and
select * from databricks_all.databricks_iam_account_group
Expected behavior
I would expect it to work the same way all other tables work. I do have Account Admin Privileges so I would expect privileges not to be the issue here.

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.