Giter VIP home page Giter VIP logo

Comments (8)

aalkilani avatar aalkilani commented on July 20, 2024

Can you please provide the following so we can further troubleshoot:

  1. Which user account (vagrant or root) was the command executed as?
  2. Can you provide the exact command that was used?
  3. Can you tell me which version of the vagrant image do you have? You can tell when you run vagrant up if it's the latest. You can also tell by running "vagrant box list". Do you have 0.0.5 or something else?

Thanks

from spark-kafka-cassandra-applying-lambda-architecture.

dmcarba avatar dmcarba commented on July 20, 2024

Hi,

I just checked the version of the box and is not the latest, is 0.05:

==> default: A newer version of the box 'aalkilani/spark-kafka-cassandra-applying-lambda-architecture' is available! You currently
==> default: have version '0.0.5'. The latest is version '0.0.6'. Run
==> default: vagrant box update to update.

Sorry I just made a git clone today and I supposed the vagrant box was the latest I will issue a vagrant update and try again.

The user is vagrant

The command issued is the submit for the first yarn job example:

./bin/spark-submit --master yarn --deploy-mode cluster --class batch.BatchJob /vagrant/spark-lambda-1.0-SNAPSHOT-shaded.jar

Thanks!

from spark-kafka-cassandra-applying-lambda-architecture.

aalkilani avatar aalkilani commented on July 20, 2024

Great. Give 0.0.6 a try however even if you're on 0.0.5 everything should still work. The point of the updated scripts is so that transition is seamless even if you're on an older box so we don't constantly ask to download a large image. It seems like this is a legitimate problem when updating from 0.0.5. I'm looking into it.

Having said that, there are advantages for moving onto the newer image. Everything should be streamlined and any issues reported earlier will have been addressed. Please let me know how 0.0.6 works for you.

Thanks

from spark-kafka-cassandra-applying-lambda-architecture.

dmcarba avatar dmcarba commented on July 20, 2024

I upgraded to 0.0.6 successfully, but when using vagrant up, the box is stuck starting with the messages:

==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...
default: Warning: Authentication failure. Retrying...
.
.

At the end it shows the message:

Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.

Afterwards I am able to connect with vagrant ssh (password: vagrant), but when trying spark submit it fails because the resource manager is not up.

Thanks

from spark-kafka-cassandra-applying-lambda-architecture.

dmcarba avatar dmcarba commented on July 20, 2024

Hi,

I removed the box and the git folder to start from scratch and executed again git clone, then vagrant up and downloaded the box, this time 0.0.6

Then vagrant ssh, moved to the /vagrant directory and executed fixes.sh

After that I tried again the spark-submit command, the job seems to be running now, I got an exception however:

User class threw exception: java.lang.IllegalArgumentException: Pathname /lambda-pluralsight:9000/lambda/batch1 from hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1 is not a valid DFS filename.
java.lang.IllegalArgumentException: Pathname /lambda-pluralsight:9000/lambda/batch1 from hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1 is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)

The offending line is:

activityByProduct.write.partitionBy("timestamp_hour").mode(SaveMode.Append).parquet("hdfs:///lambda-pluralsight:9000/lambda/batch1")

First I removed the namenode url, but then I have multiple exceptions of containers exiting with error -1. I assume it was due hdfs permission so I ended using the path "lambda/batch1". The job created the files under the /user/vagrant directory in hdfs. Anyway the resource manager keeps crashing during the job executions

from spark-kafka-cassandra-applying-lambda-architecture.

aalkilani avatar aalkilani commented on July 20, 2024

@dmcarba would you mind pointing out exactly which clip from the course you're trying to run so I can use the same code and setup. Are you running a spark-submit or through the IDE. Perhaps in Zeppelin?

Note from the exception there seems to be something off with the path used:
hdfs://lambda-pluralsight:9000/lambda-pluralsight:9000/lambda/batch1

Also regarding the fixes script. Did you run that as the root user? The script is executed correctly if you run
vagrant provision

You don't need to run it yourself. vagrant provision will handle it for you.

Thanks

from spark-kafka-cassandra-applying-lambda-architecture.

dmcarba avatar dmcarba commented on July 20, 2024

I used the vagrant provision command as you pointed and increased the vm memory to 8192 in the vagrantfile.

I am testing the BatchJob, the first yarn example in lesson 2 using the spark-submit command

I also had to change the parquet destination path in the code removing the namenode url, from
"hdfs:///lambda-pluralsight:9000/lambda/batch1"
to
"hdfs:///lambda/batch1"

Now the spark job finishes correctly , all the parquet files are created in the expected path.

I think this issue can be closed.

Thank you for your support!

from spark-kafka-cassandra-applying-lambda-architecture.

aalkilani avatar aalkilani commented on July 20, 2024

I have set sensible defaults for Spark now so it should work even with the constrained 4GB image but if you have the luxury to go up to 8GB by updating the vagrant file then that's also great. Closing this. Thanks for the feedback.

from spark-kafka-cassandra-applying-lambda-architecture.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.