Giter VIP home page Giter VIP logo

box's Introduction

Box: A Next-Generation Builder for Docker Images

Build Status Join the chat at https://gitter.im/box-builder/Lobby Go Report Card

Box is a builder for docker that gives you the power of mruby, a limited, embeddable ruby. It allows for notions of conditionals, loops, and data structures for use within your builder plan. If you've written a Dockerfile before, writing a box build plan is easy.

Box Build Plans are Programs

Exploit this! Use functions! Set variables and constants!

run this plan with:

GOLANG_VERSION=1.7.5 box <plan file>
from "ubuntu"

# this function will create a new layer running the command inside the
# function, installing the required package.
def install_package(pkg)
  run "apt-get install '#{pkg}' -y"
end

run "apt-get update"
install_package "curl" # `run "apt-get install curl -y"`

# get the local environment's setting for GOLANG_VERSION, and set it here:
go_version = getenv("GOLANG_VERSION")
run %Q[curl -sSL \
    https://storage.googleapis.com/golang/go#{go_version}.linux-amd64.tar.gz \
    | tar -xvz -C /usr/local]

Powered by mruby

Box uses the mruby programming language. It does this to get a solid language syntax, functions, variables and more. However, it is not a fully featured Ruby such as MRI and contains almost zero standard library functionality, allowing for only the basic types, and no I/O operations outside of the box DSL are permitted.

You can however:

  • Define classes, functions, variables and constants
  • Access the environment through the getenv box function (which is also omittable if you don't want people to use it)
  • Retrieve the contents of container files with read
  • import libraries (also written in mruby) to re-use common build plan components.

Tagging and Image Editing

You can tag images mid-plan to create multiple images, each subsets (or supersets, depending on how you look at it) of each other.

Additionally, you can use functions like after, skip, and flatten to manipulate images in ways you may not have considered:

from :ubuntu
skip do
  run "apt-get update"
  run "apt-get install curl -y"
  run "curl -sSL -O https://github.com/box-builder/box/releases/download/v0.4.2/box_0.4.2_amd64.deb"
  tag :downloaded
end

run "dpkg -i box*.deb"
after do
  flatten
  tag :installed
end

And more!

All the standard docker build commands such as user, env, and a few new ones:

  • with_user and inside temporarily scope commands to a specific user or working directory respectively, allowing you to avoid nasty patterns like cd foo && thing.
  • debug drop-in statement: drops you to a container in the middle of a build where you place the call.

REPL (Shell)

REPL is short for "read eval print loop" and is just a fancy way of saying this thing has readline support and a shell history. Check the thing out by invoking box repl or box shell.

Here's a video of the shell in action (click for more):

Box REPL

Install

Using the Homebrew Tap

brew tap box-builder/box && brew install box-builder/box/box

Advanced Use

The documentation is the best resource for learning the different verbs and functions. However, check out our own build plan for box for an example of how to use different predicates, functions, and verbs to get everything you need out of it.

Development Instructions

  • Requires: compiler, bison, flex, and libgpgme, libdevmapper, btrfs headers.
  • go get -d github.com/box-builder/box && cd $GOPATH/src/github.com/box-builder/box
  • To build on the host (create a dev environment):
    • make
  • To build a docker image for your dev environment (needed for test and release builds):
    • make build
  • If you have a dev environment:
    • make test
  • To do a release build:
    • VERSION=<version> make release

box's People

Contributors

capoferro avatar cmaujean avatar errordeveloper avatar gitter-badger avatar invisiblehermit avatar shift avatar unclejack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

box's Issues

debugger mode

@raggi gave me this idea over twitter.

Basically, we would have this debug or similar verb within the statement that would actually perform a docker attach with a bourne (or maybe changeable?) shell for inspection purposes. This layer would then be committed after exit and the build would continue. If it exits non-zero, it will abort the build like it always does.

Example:

from "debian"
run "echo foo > bar"
debug # shell started, stdin attached, build suspended
run "another command"

I think this could be done trivially by partitioning how the run command works a bit more, the attach+exec are currently coupled: https://github.com/erikh/box/blob/master/builder/executor/docker/docker.go#L254-L263 is the code. This way the run code could largely be re-used as long as we appropriately attach stdin in the debug case.

I also think a flag to enable/disable the verb would be useful, but I think that befits another ticket.

REPL only preserves @ and $ variables

> docker run -v /var/run:/var/run -ti erikh/box:master repl
box> a = 1

box> puts a
+++ Error: undefined method 'a' for main
box> a = 1 ; puts a
1

box> $a = 1

box> puts $a
1

box> @a = 1

box> puts @a
1

box> ^D

Feature: template keyword

This keyword would accept a source filename, target filename, and a map of parameters. The parameters would be included into a go or erb template (torn on this; go would be much simpler and erb would be more familiar to devops-y types) which would then be copied into the container.

example:

template "hosts.tmpl", "/etc/hosts", hosts: ["127.0.0.1 localhost"]

Flatten on top

I'd like to be able to flatten the layers created by box as a layer on top of the base image in the from clause.

This would allow tenancy with multiple services using the the same base image without worker nodes needing to download the entire contents of the base image over and over again with each new release of the container. This would be entirely opt-in by means of explicitly telling flatten to do this.

EX:

from "centos:7"

# install some packages, etc
# set up a service to run

flatten :onTop

feature: flatten with scope

WIP, this'll probably be edited a few times.

Right now flatten does a few things wrong:

  • it merges the from image with the rest of the images; that's probably not what people want most of the time (although the option should be added in at some point)
  • it cannot scope what is flattened and what is not, e.g., flatten two run statements into one, but leave the next copy statement alone.
  • it cannot influence the cache, and any post-flattening commands will have their cache invalidated.

Proposal:

Surface-wise, we need to extend the flatten statement itself.

flatten "cacheKey" do
  run "foo"
  run "bar"
end

The cache key can be literally supplied as an argument. Corresponding with a cache key of some sort (this could be derived from environment, local files, other container's files, etc to ensure cache validation occurs safely) could influence the cache in a way that was safe to handle for multiple invocations of each run without corrupting or otherwise poorly influencing the cache. This, however, is completely driven by the end user and could lead to some very bad abuses, consistency problems in their internal build systems, etc. Maybe this should be a separate block statement so it can be filtered without losing flatten functionality.

Block mode is shown here which would allow all the statements inside the flatten to be flattened. Blocks should not be required and it is assumed that flatten without a block will flatten everything to the container before the run statement.

Internals-wise, we need to do two things:

  • solve #33
  • implement a layer recording device, since 99% of this is here already I don't think it'll take too much effort.
  • flatten hooks into new executor calls: FlattenAll() and FlattenLast(n int) which flattens the entire run or the last n elements in the run respectively.

All of the above should satisfy our flatten needs. The full flatten can have the from omitted for free if we track only the layers we created.

Multi-Build mode

Multi-Build mode is a feature I'm working on that will just let you specify multiple build plans at the same time with the same working dir and it will build them all at once.

Right now I have a small prototype in the multi-mode branch. It does not organize output and is a very naive approach to it, but it does build multiple images at once.

It should probably be extended to do a few things beyond what box already does:

  • One line per build, with a count of how many instructions it's been through (I don't think we'll have a count of the total instructions, but this is better than nothing). No statement output.
  • Disable debug
  • Failed builds should not terminate the program but instead indicate on their build line. At the end of all containers building, it should error.

It'd look something like this:

$ box build.rb build2.rb build3.rb

* build.rb: 42 statements processed
* build2.rb: 7 statements processed
* build3.rb: error: error message here!

Reusabilty

Essentially one of major limitation of Dockerfile for me had been that the only way to achieve reusability is to either write a shell script, or create an image. I cannot see any example of loading helper methods from another file, is it just like you'd expect it to work in Ruby or I'd have to do something special? (I wouldn't need gems, although would be nice to be able to share stuff, but local libraries would a great option).

cannot add files into a volume

> docker inspect wordpress
...
            "Image": "sha256:4bc7f70a42b0858b60fc0e780fabb387c8934faf4bf626876b40e7e1a665485d",
            "Volumes": {
                "/var/www/html": {}
            },
            "WorkingDir": "/var/www/html",
...
> cat wordpress-import.rb
from "wordpress"

inside "/tmp" do
  run "mkdir -p site"
  run "touch site/foo"
end

run "cp -vr /tmp/site /srv/site"
run "cp -vr /tmp/site site"

debug

tag "wordpress-import"
> make wordpress-import
docker run --rm -ti \
	  -v /Users/ilya/Code/wordepress/src/github.com/weaveworks/wordepress/wordpress.local/tools:/Users/ilya/Code/wordepress/src/github.com/weaveworks/wordepress/wordpress.local/tools \
	  -v /var/run/docker.sock:/var/run/docker.sock \
	  -w /Users/ilya/Code/wordepress/src/github.com/weaveworks/wordepress/wordpress.local/tools \
	    erikh/box:latest wordpress-import.rb
+++ Execute: from wordpress
+++ Execute: inside /tmp
+++ Execute: run mkdir -p site
+++ Cache hit: using "sha256:8dc8c36d6ac030bc14db3f799547637592b75971fdf27638de3a308ce6279dc2"
+++ Execute: run touch site/foo
+++ Cache hit: using "sha256:93e26e0bf7d923190a3d37d7e8d7b686922f059657e83594c80f8ab2bdd2ee30"
+++ Execute: run cp -vr /tmp/site /srv/site
------ BEGIN OUTPUT ------
'/tmp/site' -> '/srv/site'
'/tmp/site/foo' -> '/srv/site/foo'
------- END OUTPUT -------
+++ Execute: run cp -vr /tmp/site site
------ BEGIN OUTPUT ------
'/tmp/site' -> 'site'
'/tmp/site/foo' -> 'site/foo'
------- END OUTPUT -------
+++ Execute: debug 
root@b5ba1083dad3:/var/www/html# ls site
ls: cannot access site: No such file or directory
root@b5ba1083dad3:/var/www/html# ls -la
total 8
drwxr-xr-x 2 www-data www-data 4096 Dec  1 14:51 .
drwxr-xr-x 4 root     root     4096 Nov  8 23:22 ..
root@b5ba1083dad3:/var/www/html# ls -la /srv/site
total 8
drwxr-xr-x 2 root root 4096 Dec  1 14:51 .
drwxr-xr-x 3 root root 4096 Dec  1 14:51 ..
-rw-r--r-- 1 root root    0 Dec  1 14:51 foo
root@b5ba1083dad3:/var/www/html#

...I happen to hit https://github.com/erikh/box/issues/54 every other time with this plan.

flatten copies everything back in as root

Related to #5

The cause of this is again moby/moby#21651 but we're doing something different here that we can work around by using other APIs.

This is potentially possible to hack around by using ContainerExport, tar it back up, attach it to an image manifest, and upload it with ImageImport.

I'm working on an impl of this now, but it may take a bit. Jotting it down here in case anyone sees the problem and wants to report on it.

without-tty mode

If a tty isn't present, the output from run, etc may not be very nice. If a tty is not present we should:

  • display more useful text in from invocation
  • run option should not be passed TTY flags.

We should also make this a runtime flag.

copy can overwrite a directory with a file

> cat apache.rb 
from "wordpress:php7.0-apache"

inside "/etc/apache2" do
  copy "apache2.conf", "./"
end

> docker run --rm -ti -v /Users/ilya/Code/wordepress/src/github.com/weaveworks/wordepress/wordpress.local/tools:/Users/ilya/Code/wordepress/src/github.com/weaveworks/wordepress/wordpress.local/tools -v /var/run:/var/run -w /Users/ilya/Code/wordepress/src/github.com/weaveworks/wordepress/wordpress.local/tools erikh/box@sha256:2e25cc0af3935b535308b6da8fddc13f206eeca1f81aa92a493876bfd2a87b93 apache.rb
+++ Execute: from wordpress:php7.0-apache
+++ Execute: inside /etc/apache2
+++ Execute: copy apache2.conf, ./
+++ Eval Response: sha256:d8ee465090213abc4e2e55c589083568d542c6cf195c82ca01f169ad7034faa3
+++ Finish: d8ee465090213abc4e2e55c589083568d542c6cf195c82ca01f169ad7034faa3

> docker run -ti d8ee465090213abc4e2e55c589083568d542c6cf195c82ca01f169ad7034faa3 ls -la /etc/apache2
-rw-r--r-- 1 root root 1119 Dec  6 14:39 /etc/apache2

> docker run -ti d8ee465090213abc4e2e55c589083568d542c6cf195c82ca01f169ad7034faa3 ls -la /etc/apache2/
ls: cannot access /etc/apache2/: Not a directory

Google Cloud Builder

Google Cloud Builder would be a good candidate for fully-remote builder/executor implementation, it supports async and parallel modes, but it's probably easier to start with a serial implementation.
I'm still learning how GCB works in a another project, will update once I have more info.

build from host machine (cm tool)

basically, this would allow your build plan to generate an image based on your host filesystem. Presumably this would only work on linux hosts.

Not sure how possible this is, but I'd like to try.

from "scratch" doesn't work

from "scratch"      
                    
copy "glue", "/glue"
                    
entrypoint "/glue"  
tag "xena/glue"     
$ box box.rb                                             
[box.rb] +++ Execute: from scratch
[box.rb] !!! Error: Error response from daemon: 'scratch' is a reserved name

progress meters for i/o with docker

Pulls are covered because of the API, but actual downloads of images and uploads of them to docker are currently silent and take some time.

run dockerfiles

The dockerfile parser is well segmented from the rest of the docker codebase. Extract the parser and apply it to box's executor interface (docker integration only for now).

Omit statements that we don't support, like VOLUME.

build images with runc

This would enable the exploration of different ideas around the build process. It would also make it possible to produce images for runc.

copy should respect user and with_user statements

Basically, copy always copies as root right now. If we getuid/getgid a username inside the container right before copying it, we can modify the tar data at just the right time, allowing this to happen appropriately.

This is a fair bit of work.

Way to remove build tooling.

Many containers build code in some fashion during their construction. However, for convenience reasons build tools are often left inside the resulting container image. This is not ideal. It would be nice if box provided a convenient way to remove the excess layers.

Some ways of accomplishing this that I can see are:

  1. Build an artifact in one container but move that layer to be based on a different image at the end -- effectively omitting intermediate layers.
  2. Allow a way to flatten an image with a whitelist directory. E.g. "Flatten all these layers, and only keep files from this particular directory."
  3. Allow some layers to be specified as ephemeral. These layers could be removed at the end of a build with parents relinked appropriately by box.

REPL mode

REPL mode that lets you enter commands one at a time and commits as-you-go. This would also include a few new verbs:

  • reset <sha> reset to a known sha
  • rewind reset to the last sha
  • inspect shows data about the current layer
  • shell gets a shell in a fresh container at the current layer

certain entities are not resetting the exec config

in particular, the combination of run and user behaves this way:

run "useradd -m -s /bin/bash terraform"
user "terraform"

Not sure why yet. It could be related to this image not having a cmd or entrypoint and some inheritance issues are happening.

panics on string argument extraction

This is related to a design flaw in go-mruby which needs to be patched to change the automatic conversion functions to return errors when conversions cannot occur. Right now, it dereferences a null pointer.

This will take some work and I will reference the relevant issues here, I just wanted to make a note of it for users who are new to the platform.

copy "." doesn't tolerate symlinks

I've seen this:

[Boxfile] !!! Error: Rel: can't make . relative to ../../examples

...and it was in my project, so I've worked around it.
But my project has vendor/github.com/docker/docker and that resulted in:

[Boxfile] !!! Error: Rel: can't make . relative to ../../../contrib/init/sysvinit-debian/docker.default

I've tried using ignore_list, but it didn't seem to work, so I ended-up deleting those symlinks manually.

I'm looking into fixing this.

flaky errors with `debug`

When using debug command, I see this once in a while:

+++ Run Error: read unix @->/var/run/docker.sock: use of closed network connection

!!! Error: Could not remove intermediate container "11f3be63b86811762ceb4084c2344816364d78a1afd59b7d129f0def2c3c222e": Error response from daemon: No such container: 11f3be63b86811762ceb4084c2344816364d78a1afd59b7d129f0def2c3c222e
make: *** [wordpress-import] Error 1

Sometimes it only complains about the socket and doesn't complain about removing intermediate container:

+++ Run Error: write unix @->/var/run/docker.sock: use of closed network connection

+++ Tagged: wordpress-import

+++ Eval Response: sha256:1178c61d6a163aff1ca54b3b22fc2f134e75474c9ac3e97cd414f7bb38599482
+++ Finish: 1178c61d6a163aff1ca54b3b22fc2f134e75474c9ac3e97cd414f7bb38599482

It happens either before I get the debug shell, or once I exit it.

errors encountered during mkdocs related install

[/dev/stdin] +++ Execute: run pip -q install mkdocs mkdocs-bootswatch
[/dev/stdin] ------ BEGIN OUTPUT ------
Compiling /tmp/pip-build-1v7jJr/Jinja2/jinja2/asyncfilters.py ...
  File "/tmp/pip-build-1v7jJr/Jinja2/jinja2/asyncfilters.py", line 7
    async def auto_to_seq(value):
            ^
SyntaxError: invalid syntax

Compiling /tmp/pip-build-1v7jJr/Jinja2/jinja2/asyncsupport.py ...
  File "/tmp/pip-build-1v7jJr/Jinja2/jinja2/asyncsupport.py", line 22
    async def concat_async(async_gen):
            ^
SyntaxError: invalid syntax

Copy progress is overlapping

Right now the copy progress meter will overrun the line, repeating output in a very annoying fashion until on the interval until the copy is finished.

The problem is that we aren't counting the amount of text in plain text (as opposed to ansi colored) for calculating the length of the string, instead we are counting escape codes and that is bad.

A pure fix would be to get the logger to yield this information instead of parsing it out, but tracking it might not be worth the trouble too. Dunno.

label verb

Would be great to be able to set labels, e.g. org.label-schema.vcs-url and org.label-schema.vcs-ref are quite useful for implementing CD etc.

One options to consider is that labels may be passed to tag, as you might want to associate different sets of labels with different tags. It'd make a lot of sense with how we can already set tags as we progress through a build plan.

def build_base_image
end

def enable feature
end

build_base_image

rev = getenv("GIT_COMMIT")
feature = getenv("BUILD_FEATURES").slip(",")

labels = {
  "org.label-schema.vcs-ref" => rev
}

tag "foo:minimal-#{rev}", with_labels: labels

features.each do |feature|
  enable feature
  labels["org.example.features.#{feature}"] = true
end

tag "foo:full-#{rev}", with_labels: labels

verb/func filtering at the commandline

Command line option to remove verbs/funcs from the builder so that they cannot be used and will abort the build. This allows users to tune their level of comfort with some of the power that box provides.

new statement: setexec

setexec takes a key/value pair with two potential elements:

  • entrypoint: a description of the entrypoint in array form
  • cmd: a description of the cmd in array form

This is to prevent races in statements where the entrypoint and cmd could trump each other out of existence. This behavior is really confusing. This statement puts them both into one, allowing the user to set them carefully at the same time.

Remove stdin default

Right now, if a filename is not provided it listens to stdin for instructions.

In practice, this is ineffective. We'll implement a repl instead per #13 which will solve this problem for most people; the rest can use /dev/stdin or equivalent.

add post-build hooks

This would make it possible to tag the image after the build.

after { tag "foo" }

cmd gets set to one of the commands...

With this plan:

from "wordpress"

inside "/tmp" do
  %w(code files).each { |t| copy "./pantheon-data/#{t}.tar.gz", "./pantheon-data/#{t}.tar.gz" }
  run "mkdir -p /srv/site/wp-content/uploads"
  run "tar xzf pantheon-data/code.tar.gz -C /srv/site --strip-components=1"
  run "tar xzf pantheon-data/files.tar.gz -C /srv/site/wp-content/uploads --strip-components=1"
  run "chown www-data -R /srv/site"
  run "rm -rfv pantheon-data"
end

inside "/etc/apache2/sites-available" do
  run "sed 's|\\(DocumentRoot\\ \\)/var/www/html|\\1/srv/site|' -i 000-default.conf default-ssl.conf"
end

workdir "/srv/site"

flatten

tag "wordpress-snapshot"

I somehow end-up with "Cmd": [ "rm -rfv pantheon-data" ]...

export function

The export function would replace the tag verb (which does not actually create a layer). It would be responsible for all image authoring outside of docker commit; additionally:

  • saving to file
  • tagging images
  • converting images to other formats that are not docker (e.g., OCI)

Something like this:

export type: "docker", tag: "foo"
# or
export type: "oci", tag: "foo-oci", file: "foo-oci.tar.gz"

Again, this would replace the tag verb and the tag verb would be rendered obsolete in 0.6 or so.

Please comment! I would love to hear feedback specifically on this feature.

Fixups: Integration test suite

Either directly with go or use something like bats. We need to evaluate:

  • tty mode
  • debug mode (when available)
  • all flags
  • running in various execution contexts

copy: do not allow traversing above the working directory

Right now if you specify a path, it will be coerced into a relative path. If you specify a relative path with a higher level directory as its target such as .., it will traverse and copy everything underneath it into the container.

I don't really think this is the best idea.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.