Giter VIP home page Giter VIP logo

docker-hackathon's Introduction

docker-hackathon

Docker Hackathon: "Data Science" on index.docker.io

I am collecting (most of the work) and analyzing (most of the fun) of index.docker.io-related stats.

This would be a little easier if I could infiltrate into index.docker.io, but that would be less fun.

Since I won't have an app to show and my commits will look cryptic, I will document what I am doing.

Step 1: Gathering repo names

I am using their search api for this.

Specifically, my search term is an increasingly long sequence of alphabets. I spun up an instance on EC2 and ran scripts/kiyoto/search.sh with different search terms.

Step 2: Gathering metadata about the repos

Now that you have a reasonable # of repo names, you want to fetch their metadata. Andy@Docker pointed me to this nifty tool to pull metadata.

Actually, I ended up modifying the original (along the way, learning to read and write some Go for the first time). My fork is here.

6/8/2014 1145h: Done collecting the data. There are 13,475 repositories with 11,251 distinct image ids The data is in data/updates_repo_names.json and data/tree_image_ids.json. The script is to "invert" the response of Docker API since we want to know which image id has the most children images.

Step 3: Webapp for the genealogy

Here is a minimal webapp that lets you traverse the image id tree.

Some interesting data (as of 6/8/2014)

  • 13,475 repos found
  • 101,477 image ids found
  • 792 "root" image ids

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.