Giter VIP home page Giter VIP logo

spark-hats's Introduction

Big Data with Spark HATS

This Hands on Advanced Tutorial Session (HATS) is presented by the LPC to demonstrate a CMS analysis using Apache Spark, Spark-ROOT, Histogrammar, and MatplotLib. After introducing Spark and the paradigm it brings with it, students will learn some basic building blocks then combine them to perform a basic measurement of the Z-boson mass using CMS data recorded in 2016.

Getting Started

Students of the HATS will be provided access to Vanderbilt's Jupyter instance using their GitHub username. The jupyter instance contains this repository and all necessary software preconfigured.

Pre-Exercises

The day before the tutorial, it's critical that each student perform the pre-exercises. This way, any potential technical/login issues can be cleared up beforehand. To perform the pre-exercises, connect to Jupyter. You will first need to log in to GitHub and authorize Jupyter to authenticate (don't worry, GitHub doesn't transfer your password, just a secret authentication token). You will get a request to give me, PerilousApricot, your credentials.

Once you've given Jupyter permission to authenticate, click "Start My Server" to start your Jupyter instance.

Once your server starts, you'll be placed into the Jupyter file browser. Then, navigate to

spark-hats/notebooks/00-preexercise.ipynb

to begin the pre-exercise.

Accessing this Tutorial in Jupyter

Once logged into Jupyter, navigate to the spark-hats directory and open the notebook named Start-Here.ipynb

Built With

  • Jupyter - Interactive python notebook interface
  • Apache Spark - Fast and general engine for large-scale data processing
  • Spark-ROOT - Scala-based ROOT/IO interface to Spark
  • Histogrammar - Functional historgamming framework, optimized for Spark
  • MatplotLib - Python plotting library

Authors

Acknowledgments

  • The LPC Distinguished Researcher Program (link) - Support for the author
  • Advanced Computing Center for Research and Education (ACCRE) (link) - Host facility and sysadmin support
  • The Diana-HEP project (link - Interoperability and compatibility libaries
  • Vanderbilt Trans Institutional Program (TIPs) Award (link) - Big Data hardware seed funding

spark-hats's People

Contributors

perilousapricot avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.