Giter VIP home page Giter VIP logo

stratosphere.github.io's Introduction

Repository Has Moved

Stratosphere is now an Apache incubator project and has been renamed to Apache Flink.

This repository will not be maintained anymore.

Please move to the following GitHub repository:

git clone https://github.com/apache/incubator-flink.git

Thanks!


If you have an existing clone of the old Stratosphere repository, you can update your remote to point to the new repository:

git remote set-url origin https://github.com/apache/incubator-flink.git

stratosphere.github.io's People

Contributors

aalexandrov avatar aljoscha avatar andrehacker avatar arheinlaender avatar asteriosk avatar dimalabs avatar fhueske avatar filiphaase avatar ktzoumas avatar lauritzthamsen avatar mariemayadi avatar markus-h avatar moewex avatar physikerwelt avatar pims avatar qmlmoon avatar rmetzger avatar sarathsomana avatar sdudoladov avatar skunert avatar tillrohrmann avatar tommy-neubert avatar tongr avatar twalthr avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

stratosphere.github.io's Issues

Make it clearer that Stratosphere is not build upon Hadoop

The website does not emphasize enough that Stratosphere has its own MapReduce runtime and does not use Hadoop's MapReduce.

At the beginning I also through that Stratosphere uses Hadoop.

Some statements like

"It combines the strengths of MapReduce/Hadoop with powerful programming abstractions in Java" on the first page.

and

"Stratosphere for Hadoop 1/Hadoop 2" in the download section are very misleading.

In my opinion in the download section we should completely leave out the term "Hadoop" and use "HDFS" instead.

Maybe we can prevent questions like the most recent one:
https://groups.google.com/forum/#!topic/stratosphere-dev/-WSxxtsdCSo

Add description how to use SNAPSHOT quickstarts

The snapshot quickstarts are only in sonatype, not in maven central. To directly use the 0.5-SNAPSHOT quickstarts, one needs to add the sonatype repo to the known repositories.

The website does currently not describe how to do so.

Text for Introduction

I cannot make a pull request for this section, because I am on mobile internet and cannot afford to clone the stratosphere.github.io repository. I have pasted my text below. Sorry Ufuk, for causing additional work.

Introduction

Analysis programs in Stratosphere's are regular Java Programs that implement transformations on data sets (e.g., filtering, , mapping, joining, grouping). The data sets are initially created from certain sources (e.g., by reading files, or from collections). The results are returned by sinks, which may for example write the data to (distributed) files, or print it to the command line. The sections on the program skeleton and transformations show the general template of a program and describe the available transformations.

Stratosphere programs can run in a variety of contexts, for example locally as standalone programs, locally embedded in other programs, or on clusters of many machines (see [program skeleton] how to define different environments). All programs are executed lazily: When the program is run and the transformation method on the data set is invoked, it creates a specific transformation operation. That transformation operation is only executed once program execution is triggered on the environment. Whether the program is executed locally or on a cluster depends on the environment of the program.

In contrast to the Stratospheres Record API, the Java API is strongly typed: All data sets and transformations accept typed elements rather than generic records. This allows to catch typing errors very early and supports safe refactoring of programs.

Inconveniences with navbar

I think more people will have noticed the following issue with the navbar on the left:
screen shot 2014-04-02 at 14 32 03

It still kinda works, but I find it inconvenient. The problem did also occur before, but wasn't so bad. I suspect that now almost everybody working from a Laptop will experience it... especially since people will likely look into the Java API.

The questions are the following:

  • Did you experience the problem? How inconvenient is this for you?
  • Should we move the subnavigation the the top of each article (as seen here)? This would also allow to have more fine-grained sub bullets when needed (e.g. data transformations => map, reduce, etc.) But we would loose the option to easily jump around in an article as you would have to go back to the top every time.

GitHub page improvements

Just got this response from GitHub after pushing a small update:

The page build completed successfully, but returned the following warning:

GitHub Pages recently underwent some improvements (https://github.com/blog/1715-faster-more-awesome-github-pages) to make your site faster and more awesome, but we've noticed that stratosphere.eu isn't properly configured to take advantage of these new features. While your site will continue to work just fine, updating your domain's configuration offers some additional speed and performance benefits. Instructions on updating your site's IP address can be found at https://help.github.com/articles/setting-up-a-custom-domain-with-github-pages#step-2-configure-dns-records, and of course, you can always get in touch with a human at [email protected]. For the more technical minded folks who want to skip the help docs: your site's DNS records are pointed to a deprecated IP address.

For information on troubleshooting Jekyll see:

https://help.github.com/articles/using-jekyll-with-pages#troubleshooting

If you have any questions please contact us at https://github.com/contact.

Answer how Stratosphere compares to Apache Spark

This message from our mailing list, posted by @fhueske might be a good skeleton:

Similar to Spark, Stratosphere is a complete data processing system, i.e., it has a programming API, a program compiler (optimizer), and an own execution runtime.
It is also an alternative for Hadoop MapReduce and in several design points quite similar to Spark:

  • Programs are executed as DAGs
  • Higher-level programming primitives (compared to Hadoop MR)
  • APIs in Scala and Java
  • Reads data from external data stores (has no own data storage), e.g., HDFS, S3, RDBMS, ...

However, Stratosphere is also different in some aspects:

  • Database-inspired processing using pipelining, gradually going to disk if memory is not sufficient (Hybridhash Joins, external sorts)
  • Sophisticated cost-based optimizer choosing execution strategies (broadcasting vs. partitioning, sort vs. hash joins, ...)
  • Implemented in Java (in contrast to Spark which uses Scala)
  • No intermediate result materialization in memory (this is on the roadmap)

Stratosphere and Spark can be rather seen as alternatives.
We do not build on any of Sparks components as we have our own programming API and execution engine.

Text for Packaging a Program Section

## Program Packaging & Distributed Execution

As described in the program skeleton section, Stratosphere programs can be executed on clusters (or local mini clusters) by using the RemoteEnvironment. Alternatively, programs can be packaged into JAR Files (Java Archives) for execution. Packaging the program is a prerequisite to executing them through the [command line interface](link to CLI docs) or the [web client](link to web client docs).

Packaging Programs

To support execution from a packaged JAR file via the command line interface or the web client, a program must use the environment obtained by ExecutionEnvironment.getExecutionEnvironment(). This environment will act as the cluster's environment when the JAR is submitted to the command line interface or the web client. If the Stratosphere program is invoked differently than through these interfaces, the environment will act like a local environment.

To package the program, simply export all involved classes as a JAR file. The JAR file's manifest must point to the class that contains the program's entry point (the class with the public void main(String[]) method). The simplest way to do this is by putting the main-class entry into the manifest (such as main-class: eu.stratosphere.example.MyProgram). The main-class attribute is the same one that is used by the Java Virtual Machine to find the main method when executing a JAR files through the command java -jar pathToTheJarFile. Most IDEs offer to include that attribute automatically when exporting JAR files.

Packaging Programs through Plans

The Java API supports additionally packaging programs as Plans. This method resembles the way that the Record API and Scala API package programs. Instead of defining a progam in the main method and calling execute() on the environment, plan packaging returns the Program Plan, which is a description of the program's data flow. To do that, the program must implement the eu.stratosphere.api.common.Program interface, defining the getPlan(String...) method. The strings passed to that method are the command line arguments. The program's plan can be created from the environment via the ExecutionEnvironment#createProgramPlan() method. When packaging the program's plan, the JAR manifest must point to the class implementing the eu.stratosphere.api.common.Program interface, instead of the class with the main method.

Summary

The overall procedure to invoke a packaged program is as follows:

  1. The JAR's manifest is searched for a main-class or program-class attribute. If both attributes are found, the program-class attribute takes precedence over the main-class attribute. Both the command line client and the web client support a parameter to pass the entry point class name manually for cases where the JAR manifest contains neither attribute.
  2. If the entry point class implements the eu.stratosphere.api.common.Program, then the system calls the getPlan(String...) to obtain the program plan and it will execute that plan. The getPlan(String...) method was the only possible way of defining a program in the Record API and is also supported in the new Java API.
  3. If the entry point class does not implement the eu.stratosphere.api.common.Program interface, the system will invoke the class' main method.

Quickstart broken?

$ bin/stratosphere run
--jarfile ./examples/stratosphere-java-examples-0.5-WordCount.jar
--arguments 1 file://pwd/hamlet.txt file://pwd/wordcount-result.txt

should be

$ bin/stratosphere run
--jarfile ./examples/stratosphere-java-examples-0.5-WordCount.jar
file://pwd/hamlet.txt file://pwd/wordcount-result.txt

?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.