stratosphere / stratosphere.github.io Goto Github PK
View Code? Open in Web Editor NEWThis repository hosts the stratosphere.eu website.
Home Page: stratosphere.eu
This repository hosts the stratosphere.eu website.
Home Page: stratosphere.eu
I cannot make a pull request for this section, because I am on mobile internet and cannot afford to clone the stratosphere.github.io
repository. I have pasted my text below. Sorry Ufuk, for causing additional work.
Analysis programs in Stratosphere's are regular Java Programs that implement transformations on data sets (e.g., filtering, , mapping, joining, grouping). The data sets are initially created from certain sources (e.g., by reading files, or from collections). The results are returned by sinks, which may for example write the data to (distributed) files, or print it to the command line. The sections on the program skeleton and transformations show the general template of a program and describe the available transformations.
Stratosphere programs can run in a variety of contexts, for example locally as standalone programs, locally embedded in other programs, or on clusters of many machines (see [program skeleton] how to define different environments). All programs are executed lazily: When the program is run and the transformation method on the data set is invoked, it creates a specific transformation operation. That transformation operation is only executed once program execution is triggered on the environment. Whether the program is executed locally or on a cluster depends on the environment of the program.
In contrast to the Stratospheres Record API, the Java API is strongly typed: All data sets and transformations accept typed elements rather than generic records. This allows to catch typing errors very early and supports safe refactoring of programs.
http://stratosphere.eu/quickstart/java.html
the curl ... .sh url is wrong
docs/0.5/programming_guides/hadoop_compatability.html
are not finished yet.
The new Java has put the sub navigation on top of the page. I think that other pages should also consider that.
This message from our mailing list, posted by @fhueske might be a good skeleton:
Similar to Spark, Stratosphere is a complete data processing system, i.e., it has a programming API, a program compiler (optimizer), and an own execution runtime.
It is also an alternative for Hadoop MapReduce and in several design points quite similar to Spark:
However, Stratosphere is also different in some aspects:
Stratosphere and Spark can be rather seen as alternatives.
We do not build on any of Sparks components as we have our own programming API and execution engine.
The website does not emphasize enough that Stratosphere has its own MapReduce runtime and does not use Hadoop's MapReduce.
At the beginning I also through that Stratosphere uses Hadoop.
Some statements like
"It combines the strengths of MapReduce/Hadoop with powerful programming abstractions in Java" on the first page.
and
"Stratosphere for Hadoop 1/Hadoop 2" in the download section are very misleading.
In my opinion in the download section we should completely leave out the term "Hadoop" and use "HDFS" instead.
Maybe we can prevent questions like the most recent one:
https://groups.google.com/forum/#!topic/stratosphere-dev/-WSxxtsdCSo
$ bin/stratosphere run
--jarfile ./examples/stratosphere-java-examples-0.5-WordCount.jar
--arguments 1 file://pwd
/hamlet.txt file://pwd
/wordcount-result.txt
should be
$ bin/stratosphere run
--jarfile ./examples/stratosphere-java-examples-0.5-WordCount.jar
file://pwd
/hamlet.txt file://pwd
/wordcount-result.txt
?
I think you don't have to specify the "run" command anymore:
http://stratosphere.eu/docs/0.5/program_execution/cli_client.html
The example section needs to be updated for the new Java API and refactored Java examples.
We need to update the example documentation on the website for this.
We should also link from the API documentation to examples that show the API features in action.
In: http://stratosphere.eu/docs/0.5/programming_guides/java.html
The following links in the page are showing a 404 error. Please check.
http://stratosphere.eu/docs/0.4/program_execution/local_executer.html
http://stratosphere.eu/docs/0.4/program_execution/remote_executer.html
Just got this response from GitHub after pushing a small update:
The page build completed successfully, but returned the following warning:
GitHub Pages recently underwent some improvements (https://github.com/blog/1715-faster-more-awesome-github-pages) to make your site faster and more awesome, but we've noticed that stratosphere.eu isn't properly configured to take advantage of these new features. While your site will continue to work just fine, updating your domain's configuration offers some additional speed and performance benefits. Instructions on updating your site's IP address can be found at https://help.github.com/articles/setting-up-a-custom-domain-with-github-pages#step-2-configure-dns-records, and of course, you can always get in touch with a human at [email protected]. For the more technical minded folks who want to skip the help docs: your site's DNS records are pointed to a deprecated IP address.
For information on troubleshooting Jekyll see:
https://help.github.com/articles/using-jekyll-with-pages#troubleshooting
If you have any questions please contact us at https://github.com/contact.
The Delta Iteration documentation in the Scala API should be updated.
It only says:
"This is tad bit prototypical right now. Please contact us through one of the channels here if you are interested in working with it."
We should at least remove the link to the contact page and add links to the general iteration documentation and the Delta Iteration Scala Example.
Please mention that the environments and data sets are in eu.stratosphere.api.java
.
I think more people will have noticed the following issue with the navbar on the left:
It still kinda works, but I find it inconvenient. The problem did also occur before, but wasn't so bad. I suspect that now almost everybody working from a Laptop will experience it... especially since people will likely look into the Java API.
The questions are the following:
The snapshot quickstarts are only in sonatype, not in maven central. To directly use the 0.5-SNAPSHOT quickstarts, one needs to add the sonatype repo to the known repositories.
The website does currently not describe how to do so.
As described in the program skeleton section, Stratosphere programs can be executed on clusters (or local mini clusters) by using the RemoteEnvironment
. Alternatively, programs can be packaged into JAR Files (Java Archives) for execution. Packaging the program is a prerequisite to executing them through the [command line interface](link to CLI docs) or the [web client](link to web client docs).
To support execution from a packaged JAR file via the command line interface or the web client, a program must use the environment obtained by ExecutionEnvironment.getExecutionEnvironment()
. This environment will act as the cluster's environment when the JAR is submitted to the command line interface or the web client. If the Stratosphere program is invoked differently than through these interfaces, the environment will act like a local environment.
To package the program, simply export all involved classes as a JAR file. The JAR file's manifest must point to the class that contains the program's entry point (the class with the public void main(String[])
method). The simplest way to do this is by putting the main-class entry into the manifest (such as main-class: eu.stratosphere.example.MyProgram
). The main-class attribute is the same one that is used by the Java Virtual Machine to find the main method when executing a JAR files through the command java -jar pathToTheJarFile
. Most IDEs offer to include that attribute automatically when exporting JAR files.
The Java API supports additionally packaging programs as Plans. This method resembles the way that the Record API and Scala API package programs. Instead of defining a progam in the main method and calling execute()
on the environment, plan packaging returns the Program Plan, which is a description of the program's data flow. To do that, the program must implement the eu.stratosphere.api.common.Program
interface, defining the getPlan(String...)
method. The strings passed to that method are the command line arguments. The program's plan can be created from the environment via the ExecutionEnvironment#createProgramPlan()
method. When packaging the program's plan, the JAR manifest must point to the class implementing the eu.stratosphere.api.common.Program
interface, instead of the class with the main method.
The overall procedure to invoke a packaged program is as follows:
eu.stratosphere.api.common.Program
, then the system calls the getPlan(String...)
to obtain the program plan and it will execute that plan. The getPlan(String...)
method was the only possible way of defining a program in the Record API and is also supported in the new Java API.eu.stratosphere.api.common.Program
interface, the system will invoke the class' main method.
I can help with the required Jekyll code, but I have no clue how to nicely integrate it.
Can someone please confirm that the debian package is updated for release-0.5?
docs/0.5/program_execution/cluster_execution.html
is not finished yet
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.