amplab / spark-ec2 Goto Github PK

Scripts used to setup a Spark cluster on EC2

License: Apache License 2.0

Shell 43.36% Python 56.23% R 0.40%

spark-ec2's Introduction

Please note: spark-ec2 is no longer under active development and the project has been archived. All the existing code, PRs and issues are still accessible but are now read-only. If you're looking for a similar tool that is under active development, we recommend you take a look at Flintrock.

EC2 Cluster Setup for Apache Spark

spark-ec2 allows you to launch, manage and shut down Apache Spark [1] clusters on Amazon EC2. It automatically sets up Apache Spark and HDFS on the cluster for you. This guide describes how to use spark-ec2 to launch clusters, how to run jobs on them, and how to shut them down. It assumes you've already signed up for an EC2 account on the Amazon Web Services site.

spark-ec2 is designed to manage multiple named clusters. You can launch a new cluster (telling the script its size and giving it a name), shutdown an existing cluster, or log into a cluster. Each cluster is identified by placing its machines into EC2 security groups whose names are derived from the name of the cluster. For example, a cluster named test will contain a master node in a security group called test-master, and a number of slave nodes in a security group called test-slaves. The spark-ec2 script will create these security groups for you based on the cluster name you request. You can also use them to identify machines belonging to each cluster in the Amazon EC2 Console.

[1] Apache, Apache Spark, and Spark are trademarks of the Apache Software Foundation.

Before You Start

Create an Amazon EC2 key pair for yourself. This can be done by logging into your Amazon Web Services account through the AWS console, clicking Key Pairs on the left sidebar, and creating and downloading a key. Make sure that you set the permissions for the private key file to 600 (i.e. only you can read and write it) so that ssh will work.
Whenever you want to use the spark-ec2 script, set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to your Amazon EC2 access key ID and secret access key. These can be obtained from the AWS homepage by clicking Account > Security Credentials > Access Credentials.

Launching a Cluster

Go into the ec2 directory in the release of Apache Spark you downloaded.
Run ./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> launch <cluster-name>, where <keypair> is the name of your EC2 key pair (that you gave it when you created it), <key-file> is the private key file for your key pair, <num-slaves> is the number of slave nodes to launch (try 1 at first), and <cluster-name> is the name to give to your cluster.

For example:
```
export AWS_SECRET_ACCESS_KEY=AaBbCcDdEeFGgHhIiJjKkLlMmNnOoPpQqRrSsTtU
export AWS_ACCESS_KEY_ID=ABCDEFG1234567890123
./spark-ec2 --key-pair=awskey --identity-file=awskey.pem --region=us-west-1 --zone=us-west-1a launch my-spark-cluster
```
After everything launches, check that the cluster scheduler is up and sees all the slaves by going to its web UI, which will be printed at the end of the script (typically http://<master-hostname>:8080).

You can also run ./spark-ec2 --help to see more usage options. The following options are worth pointing out:

--instance-type=<instance-type> can be used to specify an EC2 instance type to use. For now, the script only supports 64-bit instance types, and the default type is m3.large (which has 2 cores and 7.5 GB RAM). Refer to the Amazon pages about EC2 instance types and EC2 pricing for information about other instance types.
--region=<ec2-region> specifies an EC2 region in which to launch instances. The default region is us-east-1.
--zone=<ec2-zone> can be used to specify an EC2 availability zone to launch instances in. Sometimes, you will get an error because there is not enough capacity in one zone, and you should try to launch in another.
--ebs-vol-size=<GB> will attach an EBS volume with a given amount of space to each node so that you can have a persistent HDFS cluster on your nodes across cluster restarts (see below).
--spot-price=<price> will launch the worker nodes as Spot Instances, bidding for the given maximum price (in dollars).
--spark-version=<version> will pre-load the cluster with the specified version of Spark. The <version> can be a version number (e.g. "0.7.3") or a specific git hash. By default, a recent version will be used.
--spark-git-repo=<repository url> will let you run a custom version of Spark that is built from the given git repository. By default, the Apache Github mirror will be used. When using a custom Spark version, --spark-version must be set to git commit hash, such as 317e114, instead of a version number.
If one of your launches fails due to e.g. not having the right permissions on your private key file, you can run launch with the --resume option to restart the setup process on an existing cluster.

Launching a Cluster in a VPC

Run ./spark-ec2 -k <keypair> -i <key-file> -s <num-slaves> --vpc-id=<vpc-id> --subnet-id=<subnet-id> launch <cluster-name>, where <keypair> is the name of your EC2 key pair (that you gave it when you created it), <key-file> is the private key file for your key pair, <num-slaves> is the number of slave nodes to launch (try 1 at first), <vpc-id> is the name of your VPC, <subnet-id> is the name of your subnet, and <cluster-name> is the name to give to your cluster.

For example:
```
export AWS_SECRET_ACCESS_KEY=AaBbCcDdEeFGgHhIiJjKkLlMmNnOoPpQqRrSsTtU
export AWS_ACCESS_KEY_ID=ABCDEFG1234567890123
./spark-ec2 --key-pair=awskey --identity-file=awskey.pem --region=us-west-1 --zone=us-west-1a --vpc-id=vpc-a28d24c7 --subnet-id=subnet-4eb27b39 --spark-version=1.1.0 launch my-spark-cluster
```

Running Applications

Go into the ec2 directory in the release of Spark you downloaded.
Run ./spark-ec2 -k <keypair> -i <key-file> login <cluster-name> to SSH into the cluster, where <keypair> and <key-file> are as above. (This is just for convenience; you could also use the EC2 console.)
To deploy code or data within your cluster, you can log in and use the provided script ~/spark-ec2/copy-dir, which, given a directory path, RSYNCs it to the same location on all the slaves.
If your application needs to access large datasets, the fastest way to do that is to load them from Amazon S3 or an Amazon EBS device into an instance of the Hadoop Distributed File System (HDFS) on your nodes. The spark-ec2 script already sets up a HDFS instance for you. It's installed in /root/ephemeral-hdfs, and can be accessed using the bin/hadoop script in that directory. Note that the data in this HDFS goes away when you stop and restart a machine.
There is also a persistent HDFS instance in /root/persistent-hdfs that will keep data across cluster restarts. Typically each node has relatively little space of persistent data (about 3 GB), but you can use the --ebs-vol-size option to spark-ec2 to attach a persistent EBS volume to each node for storing the persistent HDFS.
Finally, if you get errors while running your application, look at the slave's logs for that application inside of the scheduler work directory (/root/spark/work). You can also view the status of the cluster using the web UI: http://<master-hostname>:8080.

Configuration

You can edit /root/spark/conf/spark-env.sh on each machine to set Spark configuration options, such as JVM options. This file needs to be copied to every machine to reflect the change. The easiest way to do this is to use a script we provide called copy-dir. First edit your spark-env.sh file on the master, then run ~/spark-ec2/copy-dir /root/spark/conf to RSYNC it to all the workers.

The configuration guide describes the available configuration options.

Terminating a Cluster

Note that there is no way to recover data on EC2 nodes after shutting them down! Make sure you have copied everything important off the nodes before stopping them.

Go into the ec2 directory in the release of Spark you downloaded.
Run ./spark-ec2 destroy <cluster-name>.

Pausing and Restarting Clusters

The spark-ec2 script also supports pausing a cluster. In this case, the VMs are stopped but not terminated, so they lose all data on ephemeral disks but keep the data in their root partitions and their persistent-hdfs. Stopped machines will not cost you any EC2 cycles, but will continue to cost money for EBS storage.

To stop one of your clusters, go into the ec2 directory and run ./spark-ec2 --region=<ec2-region> stop <cluster-name>.
To restart it later, run ./spark-ec2 -i <key-file> --region=<ec2-region> start <cluster-name>.
To ultimately destroy the cluster and stop consuming EBS space, run ./spark-ec2 --region=<ec2-region> destroy <cluster-name> as described in the previous section.

Limitations

Support for "cluster compute" nodes is limited -- there's no way to specify a locality group. However, you can launch slave nodes in your <clusterName>-slaves group manually and then use spark-ec2 launch --resume to start a cluster with them.

If you have a patch or suggestion for one of these limitations, feel free to contribute it!

Accessing Data in S3

Spark's file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form s3n://<bucket>/path. To provide AWS credentials for S3 access, launch the Spark cluster with the option --copy-aws-credentials. Full instructions on S3 access using the Hadoop input libraries can be found on the Hadoop S3 page.

In addition to using a single input file, you can also use a directory of files as input by simply giving the path to the directory.

This repository contains the set of scripts used to setup a Spark cluster on EC2. These scripts are intended to be used by the default Spark AMI and is not expected to work on other AMIs. If you wish to start a cluster using Spark, please refer to http://spark-project.org/docs/latest/ec2-scripts.html

spark-ec2 Internals

The Spark cluster setup is guided by the values set in ec2-variables.sh.setup.sh first performs basic operations like enabling ssh across machines, mounting ephemeral drives and also creates files named /root/spark-ec2/masters, and /root/spark-ec2/slaves. Following that every module listed in MODULES is initialized.

To add a new module, you will need to do the following:

Create a directory with the module's name.
Optionally add a file named init.sh. This is called before templates are configured and can be used to install any pre-requisites.

Add any files that need to be configured based on the cluster setup to templates/. The path of the file determines where the configured file will be copied to. Right now the set of variables that can be used in a template are:

{{master_list}}
{{active_master}}
{{slave_list}}
{{zoo_list}}
{{cluster_url}}
{{hdfs_data_dirs}}
{{mapred_local_dirs}}
{{spark_local_dirs}}
{{spark_worker_mem}}
{{spark_worker_instances}}
{{spark_worker_cores}}
{{spark_master_opts}}

You can add new variables by modifying deploy_templates.py.

Add a file named setup.sh to launch any services on the master/slaves. This is called after the templates have been configured. You can use the environment variables $SLAVES to get a list of slave hostnames and /root/spark-ec2/copy-dir to sync a directory across machines.
Modify spark_ec2.py to add your module to the list of enabled modules.

spark-ec2's People

Contributors

Stargazers

Watchers

Forkers

reactormonk mdagost shivaram hiconversion hasnainv prateek-s zhaozhang lckung serialx saurfang paulomagalhaes megatron-me-uk piskvorky brandwatchltd pluribus-labs denbkh srikanth-git jeffusan snuderl felixmaximilian guang ianthomas prashantprakash harshcs09 pphilips parnaudo weconnectarchive ralphjiang howardlinus lpablocanopy jithinjustin liamclarkenz craiig bayusetyatmoko acvogel apivovarov anirvanc cg2v fabuzaid21 manojmallela ewasserman liuzhster zorrofox rluta parkx408 ralic nfsantos codedeft zhuj fbonicel yh2642 kuchen1984 cloudcomputingcourse tartavull babartareen robdoherty2 qdrk thesamet cederigo pradeep1288 hivehq richardxin jkbradley etrain trackuity aravindraokk-iit datastark a1153tm nchammas thisisdhaas cflowe sameeragarwal konkit kettlewell ar-ms ledjon jamborta prithvirajbilla jakequist thunterdb ankitgoyalstl ixaxaar kbudwal-placed alexsmirnov arogers1 sandeep-telsiz merlin83 edgenetworks ahmed-menshawy mithun12000 gaumire cdkglobal larryxiao sandy4321 rkass omdv aoracle 87sanchavan eddieroger edumobi

spark-ec2's Issues

Cannot run tasks on two differnet nodes

HI all
i am creating an ec2 cluster using 2.0 branch.
The cluster is created with 4 cores
When created, i am connecting to each slave , kicking off exactly the same application with the following command:

oot@ip-172-31-4-154 bin]$ ./spark-submit --master spark://ec2-54-186-158-159.us-west-2.compute.amazonaws.com:7077 --executor-cores 1 /root/pyscripts/dataprocessing_Sample.py file:///root/pyscripts/tree_addhealth.csv

But the second app is being kept in waiting, even though only 2/4 cores are in use. I am getting this in to the logs

17/02/18 21:00:57 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/02/18 21:01:12 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Could you please advise why? I can provide as much information as you need .....

kr
marco

How can launch Spark 2.0.1 and Hadoop 2.7?

Right now it seems the latest hadoop version in spark-ec2 is 2.4, but actually in spark download page, it can be up to 2.7, also it is available in aws s3 http://s3.amazonaws.com/spark-related-packages. So the question is - how to launch such Spark cluster and if it does not support is there any workaround?

--hadoop-major-version=HADOOP_MAJOR_VERSION
Major version of Hadoop. Valid options are 1 (Hadoop
1.0.4), 2 (CDH 4.2.0), yarn (Hadoop 2.4.0) (default:
yarn)

VPC/Subnet requirements not documented

I don't believe that all of the requirements to use spark-ec2 with an existing VPC & subnet are documented. If I create a VPC & subnet, then use those to run spark-ec2 as documented here, I get an error ending with

Waiting for cluster to enter 'ssh-ready' state...........
Error:
 Failed to determine hostname of Instance:i-0909d1ed1af09cd09.
Please check that you provided --private-ips if necessary

More info on SO at http://stackoverflow.com/questions/42654336/how-do-i-resolve-failed-to-determine-hostname-of-instance-error-using-spark-ec

question about dependency updates

I am using branch-2.0.0 and I see it is using:
Spark 2.0,
Scala 2.10.6.
java 1.7 (jre)
java 1.6 (jdk)
aws cli (0.? I forget which)

I spent a little bit of time trying to update java(1.8),scala(2.11),aws(1.10) libraries but didn't have much success after the upgrades...
Just thought I'd log the request, and also see if anyone else had success with this.

Hadoop 2.6 support for Spark 1.6

Is there any reason that you only support Hadoop 2.4 for Spark 1.6?
It will be lovely if you could also support Hadoop 2.6.

Include Python 3 in pre-baked AMI

This would be very, very nice in 2016/2017. Or at least provide some instructions in README on how to do so.

S3 Unable to unmarshall response

Anyone have experience with this kind of error?

16/11/15 00:42:56 WARN TaskSetManager: Lost task 4.0 in stage 2.0 (TID 266, 10.9.248.105): com.amazonaws.AmazonClientException: Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler). Response Code: 200, Response Text: OK at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3480) at com.amazonaws.services.s3.Ama

Spark 2.1 Supported?

Is Spark 2.1 supported? It is not in the VALID_SPARK_VERSIONS in branch-2.0. What changes are needed for spark_ec2.py to support Spark 2.1?

It would be nice to have an option to use java8

Thanks

Andy

cluster setup error: unknown spark version

I faced following issue while running ./spark-ec2 --key-pair=<> --identity-file=<> --region=us-west --instance-type=t2.micro -s 2 launch test-cluster:

[...]
Initializing spark
--2016-07-28 03:58:47--  http://s3.amazonaws.com/spark-related-packages/spark-1.6.2-bin-hadoop1.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.40.74
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.40.74|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-07-28 03:58:47 ERROR 404: Not Found.

ERROR: Unknown Spark version
spark/init.sh: line 137: return: -1: invalid option
return: usage: return [n]
Unpacking Spark
tar (child): spark-*.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
rm: cannot remove `spark-*.tgz': No such file or directory
mv: missing destination file operand after `spark'
Try `mv --help' for more information.
[...]

hadoop minor version not supported

It would be excellent to have support for specifying a specific hadoop minor version. Spark itself is distributed for 2.4.0 and 2.6.0, but there is no way to currently use 2.6.0 here.

Using spark_version='1.6.2' results in partial installation (?)

Specifically, I get these messages during launch of the cluster, and these files are indeed not in place once the cluster starts up:

./spark-ec2/spark-standalone/setup.sh: line 22: /root/spark/bin/stop-all.sh: No such file or directory
./spark-ec2/spark-standalone/setup.sh: line 27: /root/spark/bin/start-master.sh: No such file or directory

Indeed, no spark web interface on port 8080 either

Documentation incorrect regarding missing "ec2" directory

The documentation appears to be incorrect in at least the branch-1.6 & branch-2.0 branches.
At https://github.com/amplab/spark-ec2#launching-a-cluster, the doc says "Go into the ec2 directory in the release of Apache Spark you downloaded." Problem is, there is no ec2 directory in the Spark distribution.

http://stackoverflow.com/a/38882774/969237 says "Download the official ec2 directory as detailed in the Spark 2.0.0 documentation." (in Edit 2). Problem is, the official Spark documentation (now at 2.1), at http://spark.apache.org/docs/latest/, links to https://github.com/amplab/spark-ec2, which takes me right back here. No help.

I'm suspecting that what was formerly the ec2 directory in an Apache Spark distribution is now the root directory of https://github.com/amplab/spark-ec2, but I'm not familiar enough with this stuff to know.

Please update the documentation so that I can follow the installation instructions.

Easy way to start cluster with Java 1.8?

Scripts work great, but EC2 machines have Java 1.7 on them rather than Java 1.8.

I can see #12 adds some functionality related to this, but it's not clear to me exactly how to take advantage.

Do I need to build a new AMI and pass this on the command like (--ami) rather than using the versions defaulted from ami-list in the repo? Is there anywhere documented the exact process to build this new AMI? I can see the create_image.sh script, but usage isn't entirely clear to me.

Thanks!
Adam

Documentation wrong link

In the documentation there is a broken link reference:
"If you wish to start a cluster using Spark, please refer to http://spark-project.org/docs/latest/ec2-scripts.html"

Is there a handy Hadoop cluster running?

With Spark cluster launched and started with ec2 script, is there the Hadoop cluster ready to go, or just an easy command to start it? The reason I am asking for this is - I would like to enable log and save that in hadoop FS which is accessible for Spark workers?

Missing eu-central-1 (Frankfurt) ami-list

There is no ami-list entry for eu-central-1 (Frankfurt)

ephemeral-hdfs does not work with hadoop-major-version yarn

When creating cluster using --hadoop-major-version yarn I noticed the hdfs does not function.
I relaunched the cluster using --hadoop-major-version 2 and hdfs works fine.

root@... $ /root/ephemeral-hdfs/bin/hadoop fs -cp "s3n://[redacted]" /data

OpenJDK 64-Bit Server VM warning: You have loaded library /root/ephemeral-hdfs/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2015-10-01 17:58:43,640 WARN  [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-10-01 17:58:44,525 INFO  [main] s3native.NativeS3FileSystem (NativeS3FileSystem.java:open(561)) - Opening 's3n://[redacted].avro' for reading
2015-10-01 17:58:44,696 WARN  [Thread-4] hdfs.DFSClient (DFSOutputStream.java:run(627)) - DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /data._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1430)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2684)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at org.apache.hadoop.ipc.Client.call(Client.java:1410)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
        at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
cp: File /data._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

SPARK_HOME?

I am trying to connect to the cluster using sparklyr but I need to define an environment variable: SPARK_HOME.

I cannot find any clue as to what spark home should be?
root/spark looks like it's just config files?

Also the start up script says that everything has finished and Spark Stand alone cluster is started but the default UI at 8080 isn't working?

Branch-1.6 missing latest spark stable, and wrong DEFAULT_SPARK_EC2_BRANCH set

The spark_ec2.py script in branch-1.6 is missing spark versions 1.6.1 and 1.6.2. Furthermore, the DEFAULT_SPARK_EC2_BRANCH is set to branch-1.5 inside the branch-1.6.

I forked the project and fixed the issues I was seeing, and verified it by successfully launching a cluster running 1.6.2. Then I made a pull request last week (#37) for the changes I made, but it seems like nobody has looked at it yet. It would be great to either have that merged or some other fix to be committed so users can run the latest stable version of spark on EC2 clusters.

Missing Spark 1.5.1 / Hadoop 1 binary

Prior to Spark 1.5.1 there has been a pre-built version of Spark, packaged with Hadoop 1, stored in http://s3.amazonaws.com/spark-related-packages - e.g. for Spark 1.5.0 we had spark-1.5.0-bin-hadoop1.tgz. However this seems to no longer be the case. I'm not sure if this is an accidental omission or whether it reflects a change in policy, although I suspect the former since the Scala 2.11 version spark-1.5.1-bin-hadoop1-scala2.11.tgz exists.

Either way, since Hadoop 1 is the default, it would be good if there was a warning or some kind of documentation that gave a hint in this direction - otherwise you have to keep a very close eye on the setup script to notice the 404.

Simple spark-ec2 launch fails

Hi,

I'm trying to setup a simple spark cluster with 1 slave (using branch-2.0)
I use the following commands

export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
./spark-ec2 --region=us-west-1 -k key_name -i ~/.ssh/my.key -s 1 launch es5

I tried several times and I always end of with this log:

/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-yyy.us-west-1.compute.internal
************************************************************/
Persistent HDFS installed, won't start by default...
[timing] persistent-hdfs setup:  00h 00m 05s
Setting up mapreduce
Pseudo-terminal will not be allocated because stdin is not a terminal.
RSYNC'ing /root/mapreduce/conf to slaves...
ec2-xxx.us-west-1.compute.amazonaws.com
[timing] mapreduce setup:  00h 00m 01s
Setting up spark-standalone
RSYNC'ing /root/spark/conf to slaves...
ec2-xxx.us-west-1.compute.amazonaws.com
RSYNC'ing /root/spark-ec2 to slaves...
ec2-xxx.us-west-1.compute.amazonaws.com
ec2-xxx.us-west-1.compute.amazonaws.com: no org.apache.spark.deploy.worker.Worker to stop
no org.apache.spark.deploy.master.Master to stop
starting org.apache.spark.deploy.master.Master, logging to /root/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-ip-yyy.us-west-1.compute.internal.out
ec2-xxx.us-west-1.compute.amazonaws.com: starting org.apache.spark.deploy.worker.Worker, logging to /root/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ip-zzz.us-west-1.compute.internal.out
[timing] spark-standalone setup:  00h 00m 30s
Setting up rstudio
spark-ec2/setup.sh: line 110: ./rstudio/setup.sh: No such file or directory
[timing] rstudio setup:  00h 00m 01s
Setting up ganglia
RSYNC'ing /etc/ganglia to slaves...
ec2-xxx.us-west-1.compute.amazonaws.com
Shutting down GANGLIA gmond:                               [FAILED]
Starting GANGLIA gmond:                                    [  OK  ]
Shutting down GANGLIA gmond:                               [FAILED]
Starting GANGLIA gmond:                                    [  OK  ]
Connection to ec2-xxx.us-west-1.compute.amazonaws.com closed.
Shutting down GANGLIA gmetad:                              [FAILED]
Starting GANGLIA gmetad:                                   [  OK  ]
Stopping httpd:                                            [FAILED]
Starting httpd: httpd: Syntax error on line 154 of /etc/httpd/conf/httpd.conf: Cannot load /etc/httpd/modules/mod_authz_core.so into server: /etc/httpd/modules/mod_authz_core.so: cannot open shared object file: No such file or directory
                                                           [FAILED]
[timing] ganglia setup:  00h 00m 01s
Connection to ec2-xxx-us-west-1.compute.amazonaws.com closed.
Spark standalone cluster started at http://ec2-xxx-us-west-1.compute.amazonaws.com:8080
Ganglia started at http://ec2-xxx-us-west-1.compute.amazonaws.com:5080/ganglia
Done!

I noticed these error parts:

spark-ec2/setup.sh: line 110: ./rstudio/setup.sh: No such file or directory

Shutting down GANGLIA gmond:                               [FAILED]
Starting GANGLIA gmond:                                    [  OK  ]
Shutting down GANGLIA gmond:                               [FAILED]
Starting GANGLIA gmond:                                    [  OK  ]

Starting httpd: httpd: Syntax error on line 154 of /etc/httpd/conf/httpd.conf: Cannot load /etc/httpd/modules/mod_authz_core.so into server: /etc/httpd/modules/mod_authz_core.so: cannot open shared object file: No such file or directory

So lots of errors, is there anything I'm doing wrong?
(also tried branch 1.6 or with eu-west with the same result)

Cheers & thanks for your work

Simple setup fails. - Duplicate (hence closed)

First attempt at trying to get a spark cluster running. Ran the following command:

./spark-ec2 --instance-type=t1.micro --region=us-west-2 --zone=us-west-2c --ebs-vol-size=8 -k myeky -i ~/.ssh/mykey.pem -s 1 --vpc-id=vpc-myvpc launch microcluster

In the end I cannot connect to the spark dashboard. Was trying to figure out what's going and while going through the output landed on this:

--2016-11-02 04:15:11--  http://s3.amazonaws.com/spark-related-packages/spark-1.6.2-bin-hadoop1.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.1.51
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.1.51|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-11-02 04:15:11 ERROR 404: Not Found.

ERROR: Unknown Spark version
spark/init.sh: line 137: return: -1: invalid option
return: usage: return [n]
Unpacking Spark
tar (child): spark-*.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
rm: cannot remove `spark-*.tgz': No such file or directory
mv: missing destination file operand after `spark'
Try `mv --help' for more information.
[timing] spark init:  00h 00m 01s

duplicate of #43

Add eu-central-1 AMI images

Hey everybody,

due to company policy restrictions, we are just allowed to use the eu-central-1 AWS region -- unfortunately, the Spark-EC2 image is not added there...

Would be really great for me if you could add it!

All the best,
Sebastian

How to launch Spark 2.0.1 clusters with spark-ec2 scripts?

When I upgraded to Spark 2.0, I took the 2.0 branch from this repository and it allowed me to spin up Spark-2.0 clusters on Amazon EC2. However, there does not seem to be 2.0.1 in any of the valid versions in any of the branches. Can you help me?

branch-2.0 should use scala 2.11

http://spark.apache.org/downloads.html says:

Note: Starting version 2.0, Spark is built with Scala 2.11 by default. Scala 2.10 users should download the Spark source package and build with Scala 2.10 support.

It appears as though branch-2.0 still uses scala 2.10, as evidenced from these lines from the log (produced from a spark-ec2 ... launch ... invocation):

Initializing scala
Unpacking Scala
--2017-03-07 23:53:26--  http://s3.amazonaws.com/spark-related-packages/scala-2.10.3.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.225.123
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.225.123|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 30531249 (29M) [application/x-compressed]
Saving to: ‘scala-2.10.3.tgz’

100%[===========================================================================================>] 30,531,249  2.47MB/s   in 12s

2017-03-07 23:53:39 (2.39 MB/s) - ‘scala-2.10.3.tgz’ saved [30531249/30531249]

I'm still trying to effort whether this is the root cause for the errors I'm seeing when attempting to run-example --master ... SparkPi 10, which look like the following (note java.io.InvalidClassException messages below).

In any case, however, it still seems as though scala 2.11 should be the version installed on master & slave.

17/03/07 20:02:58 INFO spark.SparkContext: Running Spark version 2.0.2
17/03/07 20:02:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/03/07 20:02:59 INFO spark.SecurityManager: Changing view acls to: matthew
17/03/07 20:02:59 INFO spark.SecurityManager: Changing modify acls to: matthew
17/03/07 20:02:59 INFO spark.SecurityManager: Changing view acls groups to:
17/03/07 20:02:59 INFO spark.SecurityManager: Changing modify acls groups to:
17/03/07 20:02:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(matthew); groups with view permissions: Set(); users  with modify permissions: Set(matthew); groups with modify permissions: Set()
17/03/07 20:02:59 INFO util.Utils: Successfully started service 'sparkDriver' on port 51946.
17/03/07 20:02:59 INFO spark.SparkEnv: Registering MapOutputTracker
17/03/07 20:02:59 INFO spark.SparkEnv: Registering BlockManagerMaster
17/03/07 20:02:59 INFO storage.DiskBlockManager: Created local directory at /private/var/folders/8c/4kr7cmf109b4778xj0sxct8w0000gn/T/blockmgr-f990ffdf-481d-463c-9e6f-ee7b328bc85c
17/03/07 20:02:59 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
17/03/07 20:02:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/03/07 20:03:00 INFO util.log: Logging initialized @2630ms
17/03/07 20:03:00 INFO server.Server: jetty-9.2.z-SNAPSHOT
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@52045dbe{/jobs,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@674658f7{/jobs/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c8eee0f{/jobs/job,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@565b064f{/jobs/job/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26425897{/stages,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@73163d48{/stages/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58c34bb3{/stages/stage,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56a4479a{/stages/stage/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62163b39{/stages/pool,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@20a8a64e{/stages/pool/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62f4ff3b{/storage,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1698fc68{/storage/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4504d271{/storage/rdd,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@207b8649{/storage/rdd/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65b3a85a{/environment,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@34997338{/environment/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@57eda880{/executors,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2b5825fa{/executors/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@53d1b9b3{/executors/threadDump,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2cae1042{/executors/threadDump/json,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@163d04ff{/static,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7c209437{/,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2262b621{/api,null,AVAILABLE}
17/03/07 20:03:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7e928e2f{/stages/stage/kill,null,AVAILABLE}
17/03/07 20:03:00 INFO server.ServerConnector: Started ServerConnector@4678a2eb{HTTP/1.1}{0.0.0.0:4040}
17/03/07 20:03:00 INFO server.Server: Started @2810ms
17/03/07 20:03:00 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/03/07 20:03:00 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.86.165:4040
17/03/07 20:03:00 INFO spark.SparkContext: Added JAR file:/Users/matthew/Documents/github/SciSpike/smartcity-cluster/spark-2.0.2/./examples/jars/scopt_2.11-3.3.0.jar at spark://192.168.86.165:51946/jars/scopt_2.11-3.3.0.jar with timestamp 1488938580346
17/03/07 20:03:00 INFO spark.SparkContext: Added JAR file:/Users/matthew/Documents/github/SciSpike/smartcity-cluster/spark-2.0.2/./examples/jars/spark-examples_2.11-2.0.2.jar at spark://192.168.86.165:51946/jars/spark-examples_2.11-2.0.2.jar with timestamp 1488938580347
17/03/07 20:03:00 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://ec2-52-55-118-26.compute-1.amazonaws.com:7077...
17/03/07 20:03:00 INFO client.TransportClientFactory: Successfully created connection to ec2-52-55-118-26.compute-1.amazonaws.com/52.55.118.26:7077 after 91 ms (0 ms spent in bootstraps)
17/03/07 20:03:00 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20170308020300-0004
17/03/07 20:03:00 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20170308020300-0004/0 on worker-20170308013313-172.31.47.189-42583 (172.31.47.189:42583) with 2 cores
17/03/07 20:03:00 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20170308020300-0004/0 on hostPort 172.31.47.189:42583 with 2 cores, 1024.0 MB RAM
17/03/07 20:03:00 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51948.
17/03/07 20:03:00 INFO netty.NettyBlockTransferService: Server created on 192.168.86.165:51948
17/03/07 20:03:00 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.deploy.DeployMessages$ExecutorUpdated; local class incompatible: stream classdesc serialVersionUID = 3598161183190952796, local class serialVersionUID = 1654279024112373855
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:259)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:308)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:258)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:257)
	at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:578)
	at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
	at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
	at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)
17/03/07 20:03:00 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.86.165, 51948)
17/03/07 20:03:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.86.165:51948 with 366.3 MB RAM, BlockManagerId(driver, 192.168.86.165, 51948)
17/03/07 20:03:00 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.86.165, 51948)
17/03/07 20:03:01 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7c18432b{/metrics/json,null,AVAILABLE}
17/03/07 20:03:01 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
17/03/07 20:03:01 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
17/03/07 20:03:01 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@46cf05f7{/SQL,null,AVAILABLE}
17/03/07 20:03:01 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7cd1ac19{/SQL/json,null,AVAILABLE}
17/03/07 20:03:01 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a80515c{/SQL/execution,null,AVAILABLE}
17/03/07 20:03:01 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1c807b1d{/SQL/execution/json,null,AVAILABLE}
17/03/07 20:03:01 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c8b96ec{/static/sql,null,AVAILABLE}
17/03/07 20:03:01 INFO internal.SharedState: Warehouse path is 'file:/Users/matthew/Documents/github/SciSpike/smartcity-cluster/spark-2.0.2/spark-warehouse'.
17/03/07 20:03:01 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
17/03/07 20:03:01 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
17/03/07 20:03:01 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
17/03/07 20:03:01 INFO scheduler.DAGScheduler: Parents of final stage: List()
17/03/07 20:03:01 INFO scheduler.DAGScheduler: Missing parents: List()
17/03/07 20:03:01 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
17/03/07 20:03:01 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
17/03/07 20:03:01 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1169.0 B, free 366.3 MB)
17/03/07 20:03:01 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.86.165:51948 (size: 1169.0 B, free: 366.3 MB)
17/03/07 20:03:01 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
17/03/07 20:03:02 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
17/03/07 20:03:02 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
17/03/07 20:03:17 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:03:32 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:03:47 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:04:02 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:04:17 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:04:32 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:04:47 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:05:02 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:05:03 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.deploy.DeployMessages$ExecutorUpdated; local class incompatible: stream classdesc serialVersionUID = 3598161183190952796, local class serialVersionUID = 1654279024112373855
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:259)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:308)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:258)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:257)
	at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:578)
	at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
	at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
	at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)
17/03/07 20:05:03 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20170308020300-0004/1 on worker-20170308013313-172.31.47.189-42583 (172.31.47.189:42583) with 2 cores
17/03/07 20:05:03 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20170308020300-0004/1 on hostPort 172.31.47.189:42583 with 2 cores, 1024.0 MB RAM
17/03/07 20:05:03 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.deploy.DeployMessages$ExecutorUpdated; local class incompatible: stream classdesc serialVersionUID = 3598161183190952796, local class serialVersionUID = 1654279024112373855
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:259)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:308)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:258)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:257)
	at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:578)
	at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
	at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
	at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)
17/03/07 20:05:17 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:05:32 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/03/07 20:05:47 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

sparc

I am using spark-1.6.1-prebuilt-with-hadoop-2.6 on mac. I am using the spark-ec2 script to launch a cluster in Amazon VPC.

The setup.sh script [run first thing on master after launch] uses pssh and tries to install it via 'yum install -y pssh'. This step always fails on the master AMI that the script uses by default as it is not able to find it in the repo mirrors - hits 403, PYCURL ERROR 22.

Logging into the master and trying it manually does not work. I tried yum install -y python-pip and hit same issue. I tried editing the epel.repo as per amazon linux ami faq suggestion, but it didn't help. May be getting overridden.

For now, I have changed the script to not use pssh as a workaround. But would like to understand and fix the root cause.

README.md references non-existent configuration guide

Under the Configuration section on the README, it links to a configuration guide, at location: https://github.com/amplab/spark-ec2/blob/branch-1.6/configuration.html

That file does not exist.

Submission scripts are broken on ec2 clusters started with spark-ec2 with private IP's

See this issue in JIRA. Fix coming shortly. @tkunicki

Feature request: Scala 2.11 support

There is work to enable multiple Scala versions support. But so far only Scala 2.10.3 is supported. Is there interest and/or a roadmap for Scala 2.11 support?

Connect to slave web UI broken 2.0.1

Connect to worker web broken.
From the master UI page, linked to worker page:
http://xxxx.us-west-1.compute.amazonaws.com:8081

But it is broken.

EC2 setup does not work for any user but root

Hi,

I was trying to use the spark-ec2 script from Spark to create a new Spark cluster wit an user different than root (--user=ec2-user). Unfortunately the part of the script that attempts to copy the templates into the target machines fail because it tries to rsync /etc/* and /root/*

This is the full traceback

rsync: recv_generator: mkdir "/root/spark-ec2" failed: Permission denied (13)
*** Skipping any contents from this failed directory ***

sent 95 bytes received 17 bytes 224.00 bytes/sec
total size is 1444 speedup is 12.89
rsync error: some files/attrs were not transferred (see previous errors) (code 2 3) at main.c(1039) [sender=3.0.6]
Traceback (most recent call last):
File "/home/ec2-user/spark-1.4.0/ec2/spark_ec2.py", line 1455, in
main()
File "/home/ec2-user/spark-1.4.0/ec2/spark_ec2.py", line 1447, in main
real_main()
File "/home/ec2-user/spark-1.4.0/ec2/spark_ec2.py", line 1283, in real_main
setup_cluster(conn, master_nodes, slave_nodes, opts, True)
File "/home/ec2-user/spark-1.4.0/ec2/spark_ec2.py", line 785, in setup_cluster
modules=modules
File "/home/ec2-user/spark-1.4.0/ec2/spark_ec2.py", line 1049, in deploy_files
subprocess.check_call(command)
File "/usr/lib64/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['rsync', '-rv', '-e', 'ssh -o StrictHos tKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/ec2-user/.ssh/sparkclus terkey_us_east.pem', '/tmp/tmpT4Iw54/', u'[email protected] zonaws.com:/']' returned non-zero exit status 23

Is there a workaround for this? I want to improve security of our operations by avoiding user root on the instances.

Mailing List: is there a mailing list for this project

HI
is there a mailing list for this project? or at least few email addresses i can contact when i find issues?
I'd rather not waste your useful time by creating Issues just to get somene to assist me
kind regards
Marco

set temp dir on workers (SPARK_WORKER_DIR)

Hi,

I am running spark 1.6 with an extra disk attached (--ebs-vol-size), trying to make spark workers to write to that disk. Seems spark has the option SPARK_WORKER_DIR, that would do that, but it does not seem to pick it up in this setting (I set it in /root/spark/conf/spark-env.sh). Is there a way to achieve this?

(SPARK_LOCAL_DIRS works on the master node though)

Running Low on Storage when Building Specific Spark Version

Hi,

I am having trouble creating the spark cluster with a custom spark version.
I am doing:

ec2/spark-ec2 --key-pair=<key-name> --identity-file=<key-file> --region=eu-west-1 --zone=eu-west-1a --vpc-id=<vpc-id>  --subnet-id=<subnet-id> --copy-aws-credentials --hadoop-major-version=2 --instance-profile-name=<instance-profile-name> --slaves=1 -v 4f894dd6906311cb57add6757690069a18078783 launch cluster_test

-v is using a specific git commit with the given hash (e.g. Spark Version 1.5.1)

When the cluster nodes are started spark is cloned from git (into /root folder) and built. After a while, the script stops because of "no space left on device" warnings.
When I login into master and check the space left:

>df
Dateisystem          1K‐Blöcke   Benutzt Verfügbar Ben% Eingehängt auf
/dev/xvda1             8256952   6693968   1479128  82% /
tmpfs                  3816808         0   3816808   0% /dev/shm
/dev/xvdb            433455904   1252616 410184984   1% /mnt
/dev/xvdf            433455904    203012 411234588   1% /mnt2

So there are 1.4 GB left on device, but when trying to download a big file, it fails again with the "no space left on device" message.

I realised that the inodes are the restricting factor here:

df -i
Dateisystem           Inodes   IUsed   IFree IUse% Eingehängt auf
/dev/xvda1            524288  524288       0  100% /
tmpfs                 954202       1  954201    1% /dev/shm
/dev/xvdb            27525120      12 27525108    1% /mnt
/dev/xvdf            27525120      11 27525109    1% /mnt2

Can someone help me increasing the root disk volume? It might be good to increase the standard volume size such that spark can be built.

Default spark version not found on s3

I just ran the following:

./spark-ec2 -k dev-spark-cluster -i dev-spark-cluster.pem -s 2 --instance-type=m3.2xlarge --region=us-east-1 --zone=us-east-1a launch dev-pdna-spark-cluster

on a clone of 1.6 branch, and got the following error:

Initializing spark
--2016-09-21 12:58:29--  http://s3.amazonaws.com/spark-related-packages/spark-1.6.2-bin-hadoop1.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.10.144
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.10.144|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2016-09-21 12:58:29 ERROR 404: Not Found.

ERROR: Unknown Spark version
spark/init.sh: line 137: return: -1: invalid option
return: usage: return [n]
Unpacking Spark
tar (child): spark-*.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
rm: cannot remove `spark-*.tgz': No such file or directory
mv: missing destination file operand after `spark'
Try `mv --help' for more information.
[timing] spark init:  00h 00m 00s

I'm guessing that I can fix this by specifying a spark version that does exist, but putting this here because docs state that it should default to a valid version.

Can we have AMI for CA-Central region AWS ?

Making a node running spark-ec2 the master node

The title pretty much says everything. I want to be able to use spark-ec2 to (in a single command):

Create a set of slaves
Launch Spark on the node spark-ec2 is running on
Launch Spark on the slaves
Register each slave with the master

If this already isn't implemented (I've seen taken a quick glance through the source and it doesn't seem like it is), I'll probably go about doing it myself.

It seems like this is a good place to start (line 702 of spark_ec2.py, in launch()).

# Launch or resume masters
if existing_masters:
    print("Starting master...")
    for inst in existing_masters:
        if inst.state not in ["shutting-down", "terminated"]:
            inst.start()
    master_nodes = existing_masters

Ideally, all I'd have to do is supply the current node as a master, then leave the rest of the script in it's current state. Further modifications could be made when stopping the cluster (i.e. ensuring we don't also stop the master). @hyviquel and @rxin , how does this sound?

UnknownHostException

Submitting a job:
/asidev/spark-latest/bin/spark-submit --class com.cxxxxx.ABC --master spark://ip-10-100-111-111.ec2.internal:7077

Has the following exception. Those servers are in private VPC.

Running Spark using the REST application submission protocol.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/10/28 12:30:53 INFO RestSubmissionClient: Submitting a request to launch an application in spark://ip-10-100-111-111.ec2.internal:7077.
Exception in thread "main" java.net.UnknownHostException: ip-10-100-111-111.ec2.internal
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$postJson(RestSubmissionClient.scala:214)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:89)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:85)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.deploy.rest.RestSubmissionClient.createSubmission(RestSubmissionClient.scala:85)
at org.apache.spark.deploy.rest.RestSubmissionClient$.run(RestSubmissionClient.scala:417)
at org.apache.spark.deploy.rest.RestSubmissionClient$.main(RestSubmissionClient.scala:430)
at org.apache.spark.deploy.rest.RestSubmissionClient.main(RestSubmissionClient.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:199)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Fingerprint check

Hello, I'm trying out your tool on 2.0-branch using osx 10.10.5

The machines are created in EC2, but then the script is unable to proceed with the rest of the provisioning. (I get the below error repeatedly).
However if I manually ssh in to the machines the script unblocks and proceeds. If I had to guess I would say it's related to the prompt for authenticity/fingerprint check.

Warning: SSH connection error. (This could be temporary.)
Host: ec2-54-69-107-29.us-west-2.compute.amazonaws.com
SSH return code: 255
SSH output: ssh: connect to host ec2-54-69-107-29.us-west-2.compute.amazonaws.com port 22: Connection refused

.

spark-ec2 scripts with spark-2.0.0-preview

spark-2.0.0-preview doesnt have spark-ec2 scripts shipped with it.
I tried tweaking the spark-ec2 scripts from older releases to accomodate spark-2.0.0-preview but couldnt get those to work.

Here's the relevant email thread from the user/dev mailing lists:

Shivaram Venkataraman [email protected] via spark.apache.org
1:06 PM (16 minutes ago)

Can you open an issue on https://github.com/amplab/spark-ec2 ? I
think we should be able to escape the version string and pass the
2.0.0-preview through the scripts

Shivaram

On Tue, Jun 14, 2016 at 12:07 PM, Sunil Kumar
[email protected] wrote:

Hi,

The spark-ec2 scripts are missing from spark-2.0.0-preview. Is there a
workaround available ? I tried to change the ec2 scripts to accomodate
spark-2.0.0...If I call the release spark-2.0.0-preview, then it barfs
because the command line argument : --spark-version=spark-2.0.0-preview
gets translated to spark-2.0.0-preiew (-v is taken as a switch)...If I call
the release spark-2.0.0, then it cant find it in aws, since it looks for
http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-bin-hadoop2.4.tgz
instead of
http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-preview-bin-hadoop2.4.tgz

Any ideas on how to make this work ? How can I tweak/hack the code to look
for spark-2.0.0-preview in spark-related-packages ?

thanks

Hive context

Hi,

I have created a cluster using branch-1.6, when I try to create a HiveContext I get:

("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o332))

would it be possible to user a spark jar where Hive is enabled?

AMI IDs up to date?

Just checking if it's known whether or not the AMI IDs are up to date in this repo? I have tried locating the ami ids listed here for us-west-1 and us-west-2, but I could not find the same ids on AWS. The main spark ec2 setup still loads the AMIs from this git repo.

Should I just point the script to use the amazon linux AMI id?

Thanks

Connection to web UI broken for 1.6.2.

After the cluster is launched. I tried to connect to the web UI by:
http://:8080
But it does not work. I checked the security groups, it seems the port 8080 is correct open to everywhere.

Exception: pyspark does not support any application options

So I launched an ec2 cluster via spark-ec2 and it worked well, until when I tried to launch an iPython session, the error came out as: "Exception in thread "main" java.lang.IllegalArgumentException: pyspark does not support any application options." Actually the same launch command worked just a few days ago and it fails now. Any solutions for that? Thanks.

I ran the 2.0.0 version spark-ec2 script

whilst starting the cluster it kept getting:

Permission denied (publickey).
lost connection

any ideas how to debug?

Error when using a more up-to-date AMI

Hi all
I am trying to launch an ec2 cluster using a more up to date AMI:ami-c928c1a9
Here's my command

root@9f2c58d4fbe6:/spark-ec2# ./spark-ec2 -k ec2AccessKey -i ec2AccessKey.pem -s 2 --ami=ami-c928c1a9 --region us-west-2 launch MMTestCluster4

I am launching this from a docker container running Ubuntu 16.06 and i am getting this exception

Connection to ec2-54-187-145-15.us-west-2.compute.amazonaws.com closed.
Deploying files to master...
Warning: Permanently added 'ec2-54-187-145-15.us-west-2.compute.amazonaws.com,54.187.145.15' (ECDSA) to the list of known hosts.
protocol version mismatch -- is your shell clean?
(see the rsync man page for an explanation)
rsync error: protocol incompatibility (code 2) at compat.c(176) [sender=3.1.1]
Traceback (most recent call last):
File "./spark_ec2.py", line 1534, in
main()
File "./spark_ec2.py", line 1526, in main
real_main()
File "./spark_ec2.py", line 1362, in real_main
setup_cluster(conn, master_nodes, slave_nodes, opts, True)
File "./spark_ec2.py", line 846, in setup_cluster
modules=modules
File "./spark_ec2.py", line 1121, in deploy_files
subprocess.check_call(command)
File "/usr/lib/python2.7/subprocess.py", line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['rsync', '-rv', '-e', 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i ec2AccessKey.pem', '/tmp/tmp

I am willing to help sort out this issues as i am skilled in python and i am a user of scala/pyton aws API.
Please give me some hints / starting points, and also if possible a test environment as it's goign to cost me loads of money to keep on creating large instances (and then destroy them) on my aws account

thanks in advance and regards
Marco

how to remain conf upon spark cluster stop/start via spark-ec2

Some conf change requires cluster restartup to take effect, say 'SPARK_WORKER_OPTS'; but during stop/start spark cluster via spark-ec2, it seems it re-setup cluster and flushes all of conf. So is there a way to keep them during stop/start cluster via spark-ec2? BTW, somehow I can not stop and start cluster via Spark stop-all/start-all scripts.

Errors when deploying spark on EC2 with specified AMI-ID

Hi guys,

I was just trying to deploy spark on EC2 with the following commands:

/spark-ec2 --key-pair=key_spark_oregon --identity-file=key_spark_oregon.pem -a ami-9abea4fb --instance-type=t2.micro --region=us-west-2 --zone=us-west-2a launch my-spark-cluster

So I used an AMI "ami-9abea4fb" here, which is a Ubuntu HVM, then I got the following information:

Setting up security groups...
Searching for existing cluster my-spark-cluster in region us-west-2...
Launching instances...
Launched 1 slave in us-west-2a, regid = r-10c232c9
Launched master in us-west-2a, regid = r-11c232c8
Waiting for AWS to propagate instance metadata...
Waiting for cluster to enter 'ssh-ready' state.........
Cluster is now in 'ssh-ready' state. Waited 213 seconds.
Generating cluster's SSH key on master...
Warning: Permanently added 'ec2-52-39-228-214.us-west-2.compute.amazonaws.com,52.39.228.214' (ECDSA) to the list of known hosts.
Please login as the user "ubuntu" rather than the user "root".

Connection to ec2-52-39-228-214.us-west-2.compute.amazonaws.com closed.
Warning: Permanently added 'ec2-52-39-228-214.us-west-2.compute.amazonaws.com,52.39.228.214' (ECDSA) to the list of known hosts.
Transferring cluster's SSH key to slaves...
ec2-52-39-221-224.us-west-2.compute.amazonaws.com
Warning: Permanently added 'ec2-52-39-221-224.us-west-2.compute.amazonaws.com,52.39.221.224' (ECDSA) to the list of known hosts.
Please login as the user "ubuntu" rather than the user "root".

Cloning spark-ec2 scripts from https://github.com/amplab/spark-ec2/tree/branch-1.5 on master...
Warning: Permanently added 'ec2-52-39-228-214.us-west-2.compute.amazonaws.com,52.39.228.214' (ECDSA) to the list of known hosts.
Please login as the user "ubuntu" rather than the user "root".

Connection to ec2-52-39-228-214.us-west-2.compute.amazonaws.com closed.
Deploying files to master...
Warning: Permanently added 'ec2-52-39-228-214.us-west-2.compute.amazonaws.com,52.39.228.214' (ECDSA) to the list of known hosts.
protocol version mismatch -- is your shell clean?
(see the rsync man page for an explanation)
rsync error: protocol incompatibility (code 2) at compat.c(174) [sender=3.1.0]
Traceback (most recent call last):
File "./spark_ec2.py", line 1528, in
main()
File "./spark_ec2.py", line 1520, in main
real_main()
File "./spark_ec2.py", line 1356, in real_main
setup_cluster(conn, master_nodes, slave_nodes, opts, True)
File "./spark_ec2.py", line 844, in setup_cluster
modules=modules
File "./spark_ec2.py", line 1119, in deploy_files
subprocess.check_call(command)
File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['rsync', '-rv', '-e', 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i key_spark_oregon.pem', '/tmp/tmpmC48WQ/', u'[email protected]:/']' returned non-zero exit status 2

Why do I get this error? Is there somebody who could help me on this? Thanks a lot.

Ganglia fails to start

Using branch-1.6. Looks like this: https://issues.apache.org/jira/browse/SPARK-8338