povertyaction / github-training Goto Github PK
View Code? Open in Web Editor NEWIPA learning resources for GitHub
License: MIT License
IPA learning resources for GitHub
License: MIT License
See #8. We didn't end up incorporating Maria's advice into the code check guide. She wrote:
I worked by myself on a code-check, and I had many different do-files that corresponded to different stages of cleaning several modules (ie stage1_module1.do, stage2_module1.do, stage1_module2.do, etc.). What was helpful (and I didn't start off doing this) was to commit every time I completed the code-check on one of the dofiles and note the changes I made in the commit message. That way, it was easy to cross-check dofiles for the same module/stage of cleaning across survey rounds. The changes that I had made to one round likely applied to another, and this way they were easy to see and implement. What I had done prior was to check all of the dofiles in a particular subfolder (ie. round1/stage1/...) and then commit. This was perhaps more efficient in the moment, but then I had to do some digging to find the changes I made to individual files (so in the end, I didn't save time).
Let's add it!
When we introduce GitExtensions, we should be clear about the "Reset all changes" and "Reset unstaged changes" buttons in the commit window. These apply to all the files among the working changes, not just the one currently in view (not just the one that's selected in the window).
Let's incorporate an example of how to go about running/following an interactive intro to GitHub training, perhaps building off the model from this class:
http://training.github.com/web/free-classes/
*See http://training.github.com/ for the materials behind the class.
The image sizes didn't transfer well from the Word version of the GitHub User Guide. @hdiamondpollock, would you mind tackling this?
One issue I struggle with is what I should do with large datasets. Git is not wired up to do large binary diffs, and GitHub has a hard limit of 100 MB/file and 1 GB/repo. Right now I add raw-ish data to the repo under the assumption that I won't modify it, and processed data in a temp data folder that is .gitignored. Even my raw-ish data gets me dangerously close to GitHub's limit, though.
For some projects, the datasets are so large that GitHub is right out, so we store the datasets on Dropbox and manually (yeuch!) sync them with .gitignored data folders in the repo. (Referring to a Dropbox folder from within Stata code is tricky to do in a portable way, and storing the repo in Dropbox would be a disaster.)
I'm also divided on whether to store data as DTAs or CSVs. Some practitioners recommend using CSVs as far into the pipeline as possible, because they are plain text, so they are portable and diffable. On the other hand DTAs have features like labels and notes which are desirable.
Advice would be appreciated.
Look through GitHub chatter page.
Example: advanced git branching http://pcottle.github.io/learnGitBranching/
Not sure if this would be useful, and maybe this would feed into another intro guide, but I think it'd be nice to create a resource like "GitHub in Dropbox/Box terms." GitHub has key differences with Dropbox/Box of course, but it also has similarities (files exist in cloud, users download them locally, etc.). Explaining GitHub in terms of tools we all know well might help.
e.g. making changes directly on GH online, and then pushing changes to server locally will result in error. Would first have to pull changes in, then push.
Potentially include in User Guide, or other doc.
How to scrub repo of PII. Matt email [Oct 31, 2014]:
The process of scrubbing a repo's history of a file that shouldn't
have been pushed...
Read everything on this page:
https://help.github.com/articles/remove-sensitive-data
I went with BFG over git-filter-branch:
http://rtyley.github.io/bfg-repo-cleaner/. It was pretty easy, and I'd
recommend it. Though in some cases rebasing may be sufficient? A
couple of comments on BFG:
java -jar bfg.jar
This didn't work for me from the Git shell; I had to run cmd.exe. To
run the java command, you have to have the JRE installed (you probably
do already), and you need the shell to be able to find it. That last
part may require you to update your %PATH% environment variable or
specify the full filename of java.exe (rather than just "java").
java -jar bfg.jar --help
Last but not least, remember to think about cached views and pull
requests (see the first link above). For instance, even if your clean
history is pushed, old files may still be accessible if the user can
specify their SHA. You'll need to contact GitHub support to take down
cached views, etc.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.