Comments (8)
Can you please provide the following so we can further troubleshoot:
- Which user account (vagrant or root) was the command executed as?
- Can you provide the exact command that was used?
- Can you tell me which version of the vagrant image do you have? You can tell when you run vagrant up if it's the latest. You can also tell by running "vagrant box list". Do you have 0.0.5 or something else?
Thanks
from spark-kafka-cassandra-applying-lambda-architecture.
Hi,
I just checked the version of the box and is not the latest, is 0.05:
==> default: A newer version of the box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' is available! You currently
==> default: have version '0.0.5'. The latest is version '0.0.6'. Run
==> default: vagrant box update
to update.
Sorry I just made a git clone today and I supposed the vagrant box was the latest I will issue a vagrant update and try again.
The user is vagrant
The command issued is the submit for the first yarn job example:
./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar
Thanks!
from spark-kafka-cassandra-applying-lambda-architecture.
Great. Give 0.0.6 a try however even if you're on 0.0.5 everything should still work. The point of the updated scripts is so that transition is seamless even if you're on an older box so we don't constantly ask to download a large image. It seems like this is a legitimate problem when updating from 0.0.5. I'm looking into it.
Having said that, there are advantages for moving onto the newer image. Everything should be streamlined and any issues reported earlier will have been addressed. Please let me know how 0.0.6 works for you.
Thanks
from spark-kafka-cassandra-applying-lambda-architecture.
I upgraded to 0.0.6 successfully, but when using vagrant up, the box is stuck starting with the messages:
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...
.
.
At the end it shows the message:
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.
Afterwards I am able to connect with vagrant ssh (password: vagrant), but when trying spark submit it fails because the resource manager is not up.
Thanks
from spark-kafka-cassandra-applying-lambda-architecture.
Hi,
I removed the box and the git folder to start from scratch and executed again git clone, then vagrant up and downloaded the box, this time 0.0.6
Then vagrant ssh, moved to the /vagrant directory and executed fixes.sh
After that I tried again the spark-submit command, the job seems to be running now, I got an exception however:
User class threw exception: java.lang.IllegalArgumentException: Pathname /lambda-pluralsight:9000/lambda/batch1 from hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1 is not a valid DFS filename.
java.lang.IllegalArgumentException: Pathname /lambda-pluralsight:9000/lambda/batch1 from hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1 is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
The offending line is:
activityByProduct.write.partitionBy("timestamp_hour").mode(SaveMode.Append).parquet("hdfs:///lambda-pluralsight:9000/lambda/batch1")
First I removed the namenode url, but then I have multiple exceptions of containers exiting with error -1. I assume it was due hdfs permission so I ended using the path "lambda/batch1". The job created the files under the /user/vagrant directory in hdfs. Anyway the resource manager keeps crashing during the job executions
from spark-kafka-cassandra-applying-lambda-architecture.
@dmcarba would you mind pointing out exactly which clip from the course you're trying to run so I can use the same code and setup. Are you running a spark-submit or through the IDE. Perhaps in Zeppelin?
Note from the exception there seems to be something off with the path used:
hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1
Also regarding the fixes script. Did you run that as the root user? The script is executed correctly if you run
vagrant provision
You don't need to run it yourself. vagrant provision
will handle it for you.
Thanks
from spark-kafka-cassandra-applying-lambda-architecture.
I used the vagrant provision command as you pointed and increased the vm memory to 8192 in the vagrantfile.
I am testing the BatchJob, the first yarn example in lesson 2 using the spark-submit command
I also had to change the parquet destination path in the code removing the namenode url, from
"hdfs:///lambda-pluralsight:9000/lambda/batch1"
to
"hdfs:///lambda/batch1"
Now the spark job finishes correctly , all the parquet files are created in the expected path.
I think this issue can be closed.
Thank you for your support!
from spark-kafka-cassandra-applying-lambda-architecture.
I have set sensible defaults for Spark now so it should work even with the constrained 4GB image but if you have the luxury to go up to 8GB by updating the vagrant file then that's also great. Closing this. Thanks for the feedback.
from spark-kafka-cassandra-applying-lambda-architecture.
Related Issues (20)
- Error when saving to hdfs HOT 1
- An established connection was aborted by the software in your host machine HOT 1
- Vagrant UP error - Help #14 HOT 4
- Make sure you port forward port 8988 to the guest HOT 1
- Module 2 - Log producer demo
- vagrant up - unable to unpackage box properly error HOT 1
- VM box fails to start with vagrant up HOT 3
- Vagrant UP error - Help #14 - Please reopen this ticket HOT 3
- spark-kafka-cassandra-applying-lambda-architecture IO issue for Parquet HOT 1
- vagrant up gives error "is_port_open.rb:21:in `initialize' ... HOT 1
- Unable to login into VM HOT 1
- vagrant up is not working-(Errno::ECONNABORTED) -`readpartial': An established connection was aborted by the software in your host machine.
- Chapter 3 - "Saving to HDFS and Executing...." location 6:20mins HOT 2
- Chapter 5 - state management using... - Zeppelin error - missing parameter type HOT 3
- Chapter 5 - Advanced Streaming Operations: Evaluating Approximation Performance with Zeppelin: Demo HOT 1
- Error executing BatchJob and save data to Cassandra
- The command 'vagrant up' failed. HOT 3
- asks for password and username
- Facing a heap issue in BatchJob.scala script HOT 1
- issue with vagrant up
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-kafka-cassandra-applying-lambda-architecture.