Giter VIP home page Giter VIP logo

build-tooling's Introduction

build-tools

This directory contains source for the various build tools used during curriculum development within the Education department at Databricks.

  • bdc: Build Databricks Course: This is the main build tool. See the bdc README for full details.
  • gendbc: Create Databricks DBC files from the command line. See the gendbc README for full details.
  • master_parse: The master notebook parse tool and module. See the master_parse README for full details.
  • course: An optional curriculum workflow management tool that sits on top of bdc. There's no README for course. Just install it (or symlink to it), and run course help.

Unless you're actually developing the build tools, you'll probably never run master_parse or gendbc manually; bdc will do that for you.

Prerequisites

  • Ensure that you have a Python 2 environment (preferably, an activated virtual environment).
  • Ensure that you have a Java 7 or Java 8 JDK and that java is in your path. Java 9 is not supported.
  • gendbc will be installed in $HOME/local/bin. Make sure $HOME/local/bin is in your path, or your builds will fail.

Quick Links

Installing the Build Tools

Using Docker

One of the simplest ways to set your build environment up is to use Docker. See the README in the docker directory for details on creating and updating a Docker-based build tool environment.

Installing the Build Tools Manually

Virtual Python Environment

bdc is currently limited to Python 2.

While it is possible to build the courseware by installing the necessary software in the system-installed (or Homebrew-installed) Python, it is not recommended. It's much better to run the build from a dedicated Python virtual environment. This document describes how to do that. If you want to use the system version of Python, you're on your own (because it's riskier).

Install pip

You'll have to install pip, if it isn't already installed. First, download get-pip.py from here: https://pip.pypa.io/en/stable/installing/

Once you have get-pip.py, install pip.

  • If you're on Linux, run this command: sudo /usr/bin/python get-pip.py
  • If you're on a Mac and not using Homebrew: sudo /usr/bin/python get-pip.py
  • If you're on a Mac and using a Homebrew-installed Python: /usr/local/bin/python get-pip.py
  • If you're on Windows and you used the standard installer: C:\Python27\python get-pip.py

Install virtualenv

  • Linux: sudo pip install virtualenv
  • Mac and not using Homebrew: sudo pip install virtualenv
  • Mac with Homebrew-install Python: /usr/local/bin/pip install virtualenv
  • Windows: C:\Python27\Scripts\pip install virtualenv
Create a virtual environment

Create a virtual Python environment for the build. You can call it anything you want, and you can create it any where you want. Let's assume you'll call it dbbuild and put it in your home directory. Here's how to create the virtual environment.

From a command window, assuming you're in your home directory:

  • Linux or Mac: virtualenv dbbuild
  • Windows: C:\Python27\Scripts/virtualenv dbbuild
Activate the virtual environment

Once you have the virtual Python environment installed, you'll need to activate it. You have to activate the environment any time you create a new command window.

(For complete details on using virtualenv, see https://github.com/pypa/virtualenv.)

  • Linux or Mac: . $HOME/dbbuild/bin/activate
  • Windows: dbbuild\bin\activate.bat

Installing the Tools

The course tool

If you're using course, which helps you automate your workflow, start by installing that tool. The easiest solution:

  • Choose a directory that is already in your path (e.g., $HOME/bin, /usr/local/bin)

  • cd to that directory

  • Create a symbolic link to course in that directory:

$ ln -s /path/to/repos/build-tooling/course .
Installing the rest of the build tools with course

If you're using course, you can just type:

course install-tools

to install and update the build tools. It will also install databricks-cli for you.

NOTE: course install-tools does not work for Docker-based installations. See Using Docker if you're using a Docker-based setup.

Installing the build tools manually

If you have never installed the tools in your virtual Python environment, run this command:

pip install git+https://github.com/databricks-edu/build-tooling

If you have installed the tools before, run:

pip install --upgrade git+https://github.com/databricks-edu/build-tooling

This installation script will install:

  • bdc
  • master_parse
  • gendbc
  • databricks-cli

It'll take a few minutes, but it will download and install all three pieces.

NOTICE

  • This software is copyright © 2017-2018 Databricks, Inc., and is released under the Apache License, version 2.0. See LICENSE.txt for details.
  • Databricks cannot support this software for you. We use it internally, and we have released it as open source, for use by those who are interested in building similar kinds of Databricks notebook-based curriculum. But this software does not constitute an official Databricks product, and it is subject to change without notice.

build-tooling's People

Contributors

bmc avatar conorbmurphy avatar dougbateman avatar sireinsectus avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

isabella232

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.