povertyaction / github-training Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 24.0 967 KB

IPA learning resources for GitHub

License: MIT License

Stata 100.00%

github-training's People

Contributors

Stargazers

Watchers

github-training's Issues

Incorporate Maria's advice into the code check guide

See #8. We didn't end up incorporating Maria's advice into the code check guide. She wrote:

I worked by myself on a code-check, and I had many different do-files that corresponded to different stages of cleaning several modules (ie stage1_module1.do, stage2_module1.do, stage1_module2.do, etc.). What was helpful (and I didn't start off doing this) was to commit every time I completed the code-check on one of the dofiles and note the changes I made in the commit message. That way, it was easy to cross-check dofiles for the same module/stage of cleaning across survey rounds. The changes that I had made to one round likely applied to another, and this way they were easy to see and implement. What I had done prior was to check all of the dofiles in a particular subfolder (ie. round1/stage1/...) and then commit. This was perhaps more efficient in the moment, but then I had to do some digging to find the changes I made to individual files (so in the end, I didn't save time).

Let's add it!

Train on GitExtensions "Reset all changes" button

When we introduce GitExtensions, we should be clear about the "Reset all changes" and "Reset unstaged changes" buttons in the commit window. These apply to all the files among the working changes, not just the one currently in view (not just the one that's selected in the window).

Interactive Training

Let's incorporate an example of how to go about running/following an interactive intro to GitHub training, perhaps building off the model from this class:
http://training.github.com/web/free-classes/

*See http://training.github.com/ for the materials behind the class.

Resize images in user guide

The image sizes didn't transfer well from the Word version of the GitHub User Guide. @hdiamondpollock, would you mind tackling this?

Add best practices for dissociating private/binary data from public/version-controlled code

One issue I struggle with is what I should do with large datasets. Git is not wired up to do large binary diffs, and GitHub has a hard limit of 100 MB/file and 1 GB/repo. Right now I add raw-ish data to the repo under the assumption that I won't modify it, and processed data in a temp data folder that is .gitignored. Even my raw-ish data gets me dangerously close to GitHub's limit, though.

For some projects, the datasets are so large that GitHub is right out, so we store the datasets on Dropbox and manually (yeuch!) sync them with .gitignored data folders in the repo. (Referring to a Dropbox folder from within Stata code is tricky to do in a portable way, and storing the repo in Dropbox would be a disaster.)

I'm also divided on whether to store data as DTAs or CSVs. Some practitioners recommend using CSVs as far into the pipeline as possible, because they are plain text, so they are portable and diffable. On the other hand DTAs have features like labels and notes which are desirable.

Advice would be appreciated.

Additional External Resources

Look through GitHub chatter page.

Example: advanced git branching http://pcottle.github.io/learnGitBranching/

GitHub in terms of Dropbox/Box

Not sure if this would be useful, and maybe this would feed into another intro guide, but I think it'd be nice to create a resource like "GitHub in Dropbox/Box terms." GitHub has key differences with Dropbox/Box of course, but it also has similarities (files exist in cloud, users download them locally, etc.). Explaining GitHub in terms of tools we all know well might help.

Write up list of Common GH Pitfalls

e.g. making changes directly on GH online, and then pushing changes to server locally will result in error. Would first have to pull changes in, then push.

Potentially include in User Guide, or other doc.

Additional FAQ Ideas

How to scrub repo of PII. Matt email [Oct 31, 2014]:

The process of scrubbing a repo's history of a file that shouldn't
have been pushed...

Read everything on this page:
https://help.github.com/articles/remove-sensitive-data

I went with BFG over git-filter-branch:
http://rtyley.github.io/bfg-repo-cleaner/. It was pretty easy, and I'd
recommend it. Though in some cases rebasing may be sufficient? A
couple of comments on BFG:

Download the .jar file and run it through Java in the shell:

java -jar bfg.jar

This didn't work for me from the Git shell; I had to run cmd.exe. To
run the java command, you have to have the JRE installed (you probably
do already), and you need the shell to be able to find it. That last
part may require you to update your %PATH% environment variable or
specify the full filename of java.exe (rather than just "java").

To see all the BFG options:

java -jar bfg.jar --help

When you run BFG, specify the --private flag.

Last but not least, remember to think about cached views and pull
requests (see the first link above). For instance, even if your clean
history is pushed, old files may still be accessible if the user can
specify their SHA. You'll need to contact GitHub support to take down
cached views, etc.

povertyaction / github-training Goto Github PK

github-training's People

Contributors

Stargazers

Watchers

Forkers

github-training's Issues

Incorporate Maria's advice into the code check guide

Train on GitExtensions "Reset all changes" button

Interactive Training

Resize images in user guide

Add best practices for dissociating private/binary data from public/version-controlled code

Additional External Resources

GitHub in terms of Dropbox/Box

Write up list of Common GH Pitfalls

Additional FAQ Ideas

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent