Giter VIP home page Giter VIP logo

datasciencetoolbox's Introduction

Data Science Toolbox

License: MIT

If you're a data scientist, installing all the software you need can be quite involved. The goal of the Data Science Toolbox is to provide a virtual environment that will enable you to start doing data science in a matter of minutes.

The Data Science Toolbox is currently being revived for the upcoming second edition of Data Science at the Command Line. At the moment there's only a basic Docker image (datasciencetoolbox/dsatcl2e), which is based on Ubuntu 20.04 and includes tools such as:

  • jq
  • xmlstarlet
  • GNU parallel
  • xsv
  • pup
  • vowpal wabbit

Under the hood, this project employs Packer, Ansible, and Docker. We'll soon add support for other platforms such as Vagrant, VirtualBox, VMware, and AWS. Expect many breaking changes in the coming months as we're learning this on-the-fly. Stay tuned.

License

The Data Science Toolbox is licensed under the MIT License.

datasciencetoolbox's People

Contributors

jeroenjanssens avatar ncarchedi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datasciencetoolbox's Issues

Notebook location

notebooks directory should be located in /vagrant (mapped to current directory on host system) instead of /home/vagrant

No data from Dutch Radio

The link provided in Chapter 3, part 3.5, does no longer lead to an xlsx file. The text in the book gives a nice output, but on the webpage (https://datascienceatthecommandline.com/2e/chapter-3-obtaining-data.html#converting-microsoft-excel-spreadsheets-to-csv) it shows an error:

Let’s demonstrate in2csv using a spreadsheet that contains the 2000 most popular songs according to an annual Dutch marathon radio program Top 2000. To extract its data, you invoke in2csv as follows:

$ curl "https://www.nporadio2.nl/data/download/TOP-2000-2020.xlsx" > top2000.xls
x
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 41744  100 41744    0     0  46024      0 --:--:-- --:--:-- --:--:-- 46228
$ in2csv top2000.xlsx | tee top2000.csv | trim
BadZipFile: File is not a zip file

Who is Danny Vera? The most popular song is supposed to be Bohemian Rhapsody, of course. Well, at least Queen appears plenty of times in the Top 2000 so I can’t really complain:

$ csvgrep top2000.csv --columns ARTIEST --regex '^Queen$' | csvlook -I ➊
StopIteration:
StopIteration:

Attempting command dst setup base results in dst: command not found

Hi,
I am following the instructions on Windows 10 to set the iPython notebook, but can't due this error:
vagrant@data-science-toolbox:~$ dst setup base No command 'dst' found, did you mean: Command 'tst' from package 'pvm-examples' (universe) Command 'dpt' from package 'pkg-perl-tools' (universe) Command 'dsh' from package 'dsh' (universe) Command 'gst' from package 'gnu-smalltalk' (universe) Command 'ds9' from package 'saods9' (universe) Command 'dt' from package 'ditrack' (universe) Command 'dat' from package 'liballegro4-dev' (universe) Command 'dsc' from package 'dsc-statistics-collector' (universe) Command 'dist' from package 'nmh' (universe) Command 'dot' from package 'graphviz' (main) dst: command not found
How to overcome it?

code for redefined csvlook has to be inserted in one line

For anybody else having a problem with the redefined csvlook in chapter 2.3.7 Managing Output:
After pasting it into the terminal, before executing, you have to delete all the newlines so that

csvlook () {
        /usr/bin/csvlook "$@" | trim | sed 's/- | -/──┼──/g;s/| -/├──/g;s/- |/──
┤/;s/|/│/g;2s/-/─/g'
}

becomes
csvlook () {/usr/bin/csvlook "$@" | trim | sed 's/- | -/──┼──/g;s/| -/├──/g;s/- |/──┤/;s/|/│/g;2s/-/─/g'}

How do I add beaker notebook?

I tried adding beaker notebook to the toolbox and adjusted the vagrant file to say:

config.vm.network "forwarded_port", guest: 8888, host: 8888
config.vm.network "forwarded_port", guest: 8800, host: 8800
config.vm.network "forwarded_port", guest: 8801, host: 8801

But when I try to run beaker notebook, I get a connection refused error when i try to bring up local host in the browser. How should I write the file to connect to beaker-notebook?

IPython 2.0

Can the version of IPython bundles with this box be upgraded to 2.0?

Password problem

Hey, small issue but not able to reset forgotten password for ipython notebook created using dst.
How to change ipython password ?

Thanks

Incorrect Cloud information

On your Cloud setup page, in the instructions for setting up a Security Group, you have the following:

"These settings cannot be changed once the EC2 instance is running."

This is incorrect. I routinely change the port settings on running instances.

it is possible to create an image that uses GPU and more resources from host?

Hi just wonder if can I do a modification on the docker file to pass-thru GPU adn more resources to make EDA and ETL more faster, or if you already have one image on registry like this, what I'm struggling is in execution files sometimes with csvkit fails to parse or sniff due to memory restrictions I did a and update on docker image but is not as faster as on my host for simple heads, in the other hand to put it simple leverage my modest nvidia GPU on the image could make even faster processing on Jupiter. thanks

Error on vagrant up

I've installed a couple other vagrant boxes and haven't ran into this.

I do:

vagrant init data-science-toolbox/dst

I get a Vagrantfile. Then

vagrantup

And I get:

Bringing machine 'default' up with 'virtualbox' provider...
There are errors in the configuration of this machine. Please fix
the following errors and try again:

vm:

  • The box 'data-science-toolbox/dst' could not be found.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.