Giter VIP home page Giter VIP logo

storytelling-with-data's Introduction

Storytelling with Data

DOI

Welcome! This repository contains course materials for the Dartmouth Course Storytelling with Data (PSYC 81.09). The syllabus may be found here. Feel free to follow along with the course materials (whether you are officially enrolled in the course or just visiting!), submit comments and suggestions, etc. An outline of the course materials, including links to lecture and discussion videos and assignments may be found here. A YouTube playlist of students' data stories for the 2020W term may be found here, a playlist for the 2021S term may be found here, and a playlist for the 2022W term may be found here.

Dartmouth student instructions

If you are officially enrolled in this course as a Dartmouth student, please sign up for access to the course's Slack workspace (you need to join using your @dartmouth.edu email address). You can ask questions and get help with all aspects of the course via Slack. You'll also submit your first two assignments using Slack.

A note about this Open Course

This course is taught as an Open Course, meaning that the course is designed from the ground up to be shareable and accessible to anyone. To that end, all code for this course should be written in Python and organized in a Jupyter notebook. Any data you analyze must be shareable with all other students in the course, and ideally it should be shareable with the public. All code and other student-generated materials will be shared publicly.

Getting help

Data science is a tricky, rewarding, and often frustrating business. Luckily for us data scientists, there are many places to get help! Examples include:

  • Google-- searchable portal to of all human knowledge. Most Internet things are reachable through here, and it's a great place to start your search. You can often find code that other people have written that solves a similar problem to the one you're working on, or a tutorial that teaches you how to solve a particular class of problems.
  • ChatGPT-- a free-to-use AI system that can help generate code, come up with project ideas, flesh out or clean up stories, and more. For a similar Dartmouth-hosted (more private, and still useful!) tool, check out Dartmouth Chat. (Note: you'll need a Dartmouth NetID to access it.)
  • Stack Overflow-- an open and searchable web forum for asking and answering questions about a wide variety of technical topics.
  • Wikipedia-- community-curated encyclopedia. Wikipedia is a good resource for learning about the background of a technique, looking up equations, etc. It's not a good source for tutorials.
  • Slack-- course chatroom for Dartmouth students. A good place for to ask questions, post ideas, etc., to other members of the class.
  • My lab also maintains a public repository of tutorials on a variety of topics here.
  • The last (but hopefully not least) option if you're feeling stuck, unhappy with how things are progressing, looking for fun new ideas to revitalize your project and get you interested in science again, etc. is to reach out to me. If you're a Dartmouth person you attend to my regular office hours on Zoom, email me or message me on Slack.
  • Important-- chances are good that if you're feeling lost, you're not the only one! If you learn something useful, please share it via Slack or by opening a GitHub issue.

Where to find nice datasets

In todays "Big Data" world, there are an abundance of high-quality, free datasets to enjoy and explore. Below is a short list of websites that are great resources for data (each contains links to many datasets):

storytelling-with-data's People

Contributors

akiahw avatar alexwells-22 avatar amaragordon avatar andrewheusser avatar annemarija avatar bohanmeng avatar camartin95 avatar coachharney avatar elisabrosera avatar emilyapp2 avatar fanruishao avatar i-laya avatar igleonaitis avatar ishaliu avatar jeremymanning avatar jjanelee97 avatar karimkhalil-byte avatar kearadennehy avatar lillianzhao avatar maddyrlee avatar mksong17 avatar ngreenstein avatar nhwang1325 avatar paxtonfitzpatrick avatar philiplindsay avatar rky19 avatar saruulijile2 avatar scottstuart11 avatar shinarjain avatar slenihan55 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

storytelling-with-data's Issues

Set up a gallery or media sharing account for the class

To consider: would it be valuable to have some sort of shareable gallery for the examples produced by the class?

Pros:

  • Shareable thing that students could show off, e.g. to prospective employers
  • Anyone could easily peek at and/or build on projects that the class is working on
  • Might be a nice way to promote the class and/or doing "open science"

Cons:

  • Privacy concerns?
  • Some minimal level of documentation will be needed to make the resource useful, and not all projects currently meet that minimum

UVLT (SIP) data brainstorm from class

Ideas:

  • Prediction (regression, deep learning) about donors and volunteers
  • Data cleaning (multiple entries per person)
  • PDFs --> convert anything useful into computer-readable formats
  • Get more data? How do marketing efforts differ with income?
  • Confounds and how to deal with them
  • Do donors come from specific towns?
  • How many people are actually donating?
  • Where do most donations come from?
  • Donation amounts by various demographics or characteristics

Libraries or approaches to explore:

  • regression (sklearn)
  • deep learning
  • 3d plots
  • images in ads (google cloud image processing?)
  • synthetic data, simulations

unable to push to GitHub from jupyter-dev

I've been getting the "This version can load notebook formats or earlier." error whenever I try to open notebooks on the main jupyterhub server, so I've been using jupyter-dev instead for the last week or so. I went to push some commits just now and got this error:

[f0028ph@polaris-old storytelling-with-data]$ git push
error: The requested URL returned error: 403 Forbidden while accessing https://github.com/paxtonfitzpatrick/storytelling-with-data.git/info/refs

fatal: HTTP request failed

Following the link brings you to a page that says

Please upgrade your git client.
GitHub.com no longer supports git over dumb-http: https://github.com/blog/809-git-dumb-http-transport-to-be-turned-off-in-90-days

This sounds like it might be the problem I'm having, since jupyter-dev also has git version 1.7.1 installed https://ask.cronweekly.com/d/18-github-com-no-longer-supports-git-over-dumb-http

I tried changing the remote URL to ssh ([email protected]:paxtonfitzpatrick/storytelling-with-data.git) but that didn't work either and now I'm getting a different error:

[f0028ph@polaris-old storytelling-with-data]$ git push
Permission denied (publickey).
fatal: The remote end hung up unexpectedly

Would it be possible to upgrade the version of git on jupyter-dev to >=1.7.9?

losing internet connectionn with JupyterHub open in browser locks you out

If you disconnect from the internet (e.g., eduroam going down momentarily, changing WiFi networks, letting computer go to sleep) while signed into JupyterHub, the connection to the server is lost, which is expected.

However, if you refresh the page or go to jupyter.dartmouth.edu again, you're automatically redirected to your signed-in account, at https://jupyter.dartmouth.edu/user/kiewit%5C<NetID>/tree? and get the error message:
404: Not Found. You are requesting a page that does not exist.

From this page, you're unable to sign out and sign back in to reset the connection.

The way I've gotten around this is to open jupyter.dartmouth.edu in an incognito window, sign in there, then refresh my signed-in page. Is there a better way to do this, or a way to keep it from happening?

Set up nbgrader (potentially)

For future iterations of the course, nbgrader could be a cleaner way for students to submit assignments.

Pros:

  • No need to use github (as far as I can tell...), or at least no need to submit pull requests
  • Course assignments not shared with full class (not sure if this is a pro or con; students may want to use each each others' code)
  • clean way of organizing assignments
  • potentially would make it clearer which parts of assignments students should fill in (e.g. instructions and assignments become a single notebook)

Cons:

  • Doesn't build familiarity with GitHub, which is used later in the course and in "industry" settings
  • Less sharing
  • Less of an "open" course, e.g. for external participants
  • Potentially another fail point for the course; would need a more stable jupyterhub solution

look into JupyterHub

A JupyterHub server might be much more convenient than having each student install and maintain a Docker instance. Everyone would then be able to access a common Docker image through their web browser.

Mysteries to solve and/or deal with elegantly:

  • How would code management and collaborations work?
  • Is there an easy way we could support collaborative coding
  • Could all materials produced be shared and accessed via git?
  • Could we host this on Discovery? Or an Amazon Cloud or Google Cloud computer? (Need to look into pricing for the Amazon/Google options)
  • Where would datasets live?

Look into other server options

We currently use jupyter.dartmouth.edu. This seems to work better than having each student install and run things on their computer through Docker. Other potentially nice solutions:

  • get an antsle server (more control, but more work to maintain)
  • build a Google Cloud or AWS cloud kubernetes cluster (more control, potentially expensive...but maybe there's a way to implement cheaply)
  • Google Colaboratory (less setup, cheap, but need to re-install custom packages every time a notebook is open/run)
  • Could remove the GitHub part of the course and just have students download the relevant files, create a zip folder, and submit them via slack or canvas (but then old code wouldn't become part of the course)

๐Ÿค”

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.