## Program Packaging & Distributed Execution
As described in the program skeleton section, Stratosphere programs can be executed on clusters (or local mini clusters) by using the RemoteEnvironment
. Alternatively, programs can be packaged into JAR Files (Java Archives) for execution. Packaging the program is a prerequisite to executing them through the [command line interface](link to CLI docs) or the [web client](link to web client docs).
Packaging Programs
To support execution from a packaged JAR file via the command line interface or the web client, a program must use the environment obtained by ExecutionEnvironment.getExecutionEnvironment()
. This environment will act as the cluster's environment when the JAR is submitted to the command line interface or the web client. If the Stratosphere program is invoked differently than through these interfaces, the environment will act like a local environment.
To package the program, simply export all involved classes as a JAR file. The JAR file's manifest must point to the class that contains the program's entry point (the class with the public void main(String[])
method). The simplest way to do this is by putting the main-class entry into the manifest (such as main-class: eu.stratosphere.example.MyProgram
). The main-class attribute is the same one that is used by the Java Virtual Machine to find the main method when executing a JAR files through the command java -jar pathToTheJarFile
. Most IDEs offer to include that attribute automatically when exporting JAR files.
Packaging Programs through Plans
The Java API supports additionally packaging programs as Plans. This method resembles the way that the Record API and Scala API package programs. Instead of defining a progam in the main method and calling execute()
on the environment, plan packaging returns the Program Plan, which is a description of the program's data flow. To do that, the program must implement the eu.stratosphere.api.common.Program
interface, defining the getPlan(String...)
method. The strings passed to that method are the command line arguments. The program's plan can be created from the environment via the ExecutionEnvironment#createProgramPlan()
method. When packaging the program's plan, the JAR manifest must point to the class implementing the eu.stratosphere.api.common.Program
interface, instead of the class with the main method.
Summary
The overall procedure to invoke a packaged program is as follows:
- The JAR's manifest is searched for a main-class or program-class attribute. If both attributes are found, the program-class attribute takes precedence over the main-class attribute. Both the command line client and the web client support a parameter to pass the entry point class name manually for cases where the JAR manifest contains neither attribute.
- If the entry point class implements the
eu.stratosphere.api.common.Program
, then the system calls the getPlan(String...)
to obtain the program plan and it will execute that plan. The getPlan(String...)
method was the only possible way of defining a program in the Record API and is also supported in the new Java API.
- If the entry point class does not implement the
eu.stratosphere.api.common.Program
interface, the system will invoke the class' main method.