Giter VIP home page Giter VIP logo

spark-jupyter-aws's People

Contributors

nchammas avatar piercingdan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-jupyter-aws's Issues

s3 access problem

When I input this code in my jupyter notebook
iris_raw_RDD = sc.textFile('s3n://BucketName/iris_data.csv')
iris_raw_RDD.take(5)

the error will occur
An error occurred while calling o71.partitions.
: java.io.IOException: No FileSystem for scheme: s3n

Do you know how can I fix this problem

Potential simplifications to the guide

Hey @PiercingDan! Thanks for writing up this guide and featuring Flintrock in it prominently. I have a couple of suggestions that may help simplify the guide.

  1. From Using your own AMI:

    Note that it is important that since flintrock is designed to install and configure Spark every time, it is important that you delete the Spark folder and other files before saving your AMI or you will encounter errors with flintrock.

    Actually, you can tell Flintrock not to install Spark as follows:

    flintrock launch my-cluster --no-install-spark
    

    You can do the same for HDFS, though Flintrock by default does not install HDFS so you'd only do that to override a configuration in Flintrock's config.yaml.

  2. Most, if not all, of the setup code can be captured in a script and deployed automatically using a combination of Flintrock's run-command and copy-file commands. Have you considered using them?

    For example, you can capture your setup code in a script called piercingdan-quickstart.sh and then deploy it to the cluster as follows:

    flintrock copy-file my-cluster ./piercingdan-quickstart.sh /tmp/
    flintrock run-command my-cluster 'chmod u+x /tmp/piercingdan-quickstart.sh'
    flintrock run-command my-cluster '/tmp/piercingdan-quickstart.sh'
    

    If you host the script on GitHub, you can even do away with copy-file and download the script directly from GitHub onto the cluster with run-command. You can also use the --master-only option if what you're doing doesn't need to hit the whole cluster.

    Another alternative is to use --ec2-user-data, but I recommend using that only if you're comfortable with EC2.

    By capturing this work in a script that can easily be deployed, you can save your readers from having to create and maintain their own AMIs which, in my view, is a big pain, especially if people are changing things from time to time or working in multiple regions.

    Finally, you can also just flintrock stop your cluster if the cost of the EBS root drives is acceptable. That will eliminate the cost of the running instances and leave behind a cluster that's ready to use with a quick flintrock start.

Misformatted text

It looks like you meant to put a code block around this section:

[ec2-user@privateipaddress]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 10G 0 disk └─xvda1 202:1 0 10G 0 part / xvdf 202:80 0 30G 0 disk └─xvdf1 202:81 0 30G 0 part

then close the code block for hte rest of the list:

  1. Make a mount point sudo mkdir /oldvol, then mount the attached volume sudo mount /dev/xvdf1 /oldvol.

or it didn't close properly. I don't have a nice envt where I can pull and submit a PR, so hopefully you can give it a quick fix.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.