Giter VIP home page Giter VIP logo

sparkta's Introduction

About Stratio Sparkta

Since Aryabhatta invented zero, Mathematicians such as John von Neuman have been in pursuit of efficient counting and architects have constantly built systems that computes counts quicker. In this age of social media, where 100s of 1000s events take place every second, we were inspired by twitter's Rainbird project to develop a distributed aggregation engine with this high level features:

  • Pure Spark
  • No need of coding, only declarative aggregation workflows
  • Data continuously streamed in & processed in near real-time
  • Ready to use, plug&play
  • Flexible workflows (input, output, parsers, etc...)
  • High performance
  • Scalable
  • Business Activity Monitoring
  • Visualization

Strataconf London 2015 slideshare

Introduction

Social media and networking sites are part of the fabric of everyday life, changing the way the world shares and accesses information. The overwhelming amount of information gathered not only from messages, updates and images but also readings from sensors, GPS signals and many other sources was the origin of a (big) technological revolution.

This vast amount of data allows us to learn from the users and explore our own world.

We can follow in real-time the evolution of a topic, an event or even an incident just by exploring aggregated data.

But beyond cool visualizations, there are some core services delivered in real-time, using aggregated data to answer common questions in the fastest way.

These services are the heart of the business behind their nice logos.

Site traffic, user engagement monitoring, service health, APIs, internal monitoring platforms, real-time dashboards…

Aggregated data feeds directly to end users, publishers, and advertisers, among others.

In Sparkta we want to start delivering real-time services. Real-time monitoring could be really nice, but your company needs to work in the same way as digital companies:

Rethinking existing processes to deliver them faster, better. Creating new opportunities for competitive advantages.

Features

  • Highly business-project oriented
  • Multiple application
  • Cubes
    • Time-based
    • Secondly, minutely, hourly, daily, monthly, yearly...
    • Hierarchical
    • GeoRange: Areas with different sizes (rectangles)
    • Flexible definition of aggregation policies (json, web app)
  • Operators:
    • Max, min, count, sum, range
    • Average, median
    • Stdev, variance, count distinct
    • Last value
    • Full-text search

Architecture

Sparkta overview

Architecture

Key technologies

Input/Outputs

Inputs

  • Twitter
  • Kafka
  • Flume
  • RabbitMQ
  • Socket

Outputs

  • MongoDB
  • Cassandra
  • ElasticSearch
  • Redis
  • Spark's DataFrames Outputs
  • PrintOut
  • CSV
  • Parquet

Build

You can generate rpm and deb packages by running:

mvn clean package -Ppackage

Note: you need to have installed the following programs in order to build these packages:

In a debian distribution:

  • fakeroot
  • dpkg-dev
  • rpm

In a centOS distribution:

  • fakeroot
  • dpkg-dev
  • rpmdevtools

Sandbox

Documentation

sparkta's People

Contributors

aalfonso-stratio avatar ajnavarro avatar alexrchies avatar anistal avatar compae avatar danielcsant avatar dcarroza-stratio avatar eambrosio avatar emgaitan-stratio avatar gasparms avatar gjimenez-stratio avatar gschiavon avatar mariostratio avatar pedrogutierrezstratio avatar roclas avatar sgomezg avatar smola avatar tomasperezv avatar witokondoria avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.