Giter VIP home page Giter VIP logo

thurber's Introduction

thurber

thurber

Clojars Project

Apache Beam and Google Cloud Dataflow on steroids Clojure. The walkthrough explains everything.

This is alpha software. Watch release notes carefully. API subject to mood swings.

Quickstart

  1. Clone & cd into this repository.
  2. lein repl
  3. Copy & paste:
(ns try-thurber
  (:require [thurber :as th]
            [thurber.sugar :refer :all]))

(->
  (th/create-pipeline)

  (th/apply!
    (read-text-file
      "demo/word_count/lorem.txt")
    (th/fn* extract-words [sentence]
      (remove empty? (.split sentence "[^\\p{L}]+")))
    (count-per-element)
    (th/fn* format-as-text
      [[k v]] (format "%s: %d" k v))
    (log-sink))

  (th/run-pipeline!))

Output:

...
INFO thurber - extremely: 1
INFO thurber - undertakes: 1
INFO thurber - pleasure: 7
INFO thurber - you: 2
...

Project Goals

  • Enable Clojure
    • Bring Clojure's powerful, expressive toolkit (destructuring, immutability, REPL, async tools, etc etc) to Apache Beam.
  • REPL Oriented
    • Functions are idiomatic/pure Clojure functions by default. (E.g., lazy sequences are supported making iterative event output optional/unnecessary, etc.)
    • Develop and test pipelines incrementally from the REPL.
    • Evaluate/learn Beam semantics (windowing, triggering) interactively.
  • Avoid Macros
    • Limit macro infection. Most thurber constructions are macro-less, use of any thurber macro constructions (like inline functions) is optional.
  • AOT Nothing
    • Fully dynamic experience. Reload namespaces at whim. thurber's dependency on Beam, Clojure, etc versions are completely dynamic/floatable. No forced AOT'd dependencies, Etc.
  • No Lock-in
    • Pipelines can be composed of Clojure and Java transforms. Incrementally refactor your pipeline to Clojure or back to Java.
  • Not Afraid of Java Interop
    • Wherever Clojure's Java Interop is performant and works cleanly with Beam's fluent API, encourage it; facade/sugar functions are simple to create and left to your own domain-specific implementations.
  • Completeness
    • Support all Beam capabilities (Transforms, State & Timers, Side Inputs, Output Tags, etc.)
  • Performance
    • Be finely tuned for data streaming.

Documentation

Demos

Each namespace in the demo/ source directory is a pipeline written in Clojure using thurber. Comments in the source highlight salient aspects of thurber usage.

Along with the code walkthrough these are the best way to learn thurber's API and serve as recipes for various scenarios (use of tags, side inputs, windowing, combining, Beam's State API, etc etc.)

To execute a demo, start a REPL and evaluate (demo!) from within the respective namespace.

Word Count

The word_count package contains ports of Beam's Word Count Examples to Clojure/thurber.

Mobile Gaming Examples

Beam's Mobile Gaming Examples have been ported to Clojure using thurber.

These are fully functional ports. They require deployment to GCP Dataflow:

Make It Fast

First make your pipeline work. Then make it fast.

Streaming/big data implies hot code paths. Use Clojure type hints liberally within your stream functions.

If deploying to GCP, use Dataflow profiling to zero in on areas to optimize.

References

License

Copyright © 2020 Aaron Dixon

Like Clojure distributed under the Eclipse Public License.

thurber's People

Contributors

atdixon avatar dave-dixon-ck avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.