Giter VIP home page Giter VIP logo

github-training's Issues

Write up list of Common GH Pitfalls

e.g. making changes directly on GH online, and then pushing changes to server locally will result in error. Would first have to pull changes in, then push.

Potentially include in User Guide, or other doc.

GitHub in terms of Dropbox/Box

Not sure if this would be useful, and maybe this would feed into another intro guide, but I think it'd be nice to create a resource like "GitHub in Dropbox/Box terms." GitHub has key differences with Dropbox/Box of course, but it also has similarities (files exist in cloud, users download them locally, etc.). Explaining GitHub in terms of tools we all know well might help.

Additional FAQ Ideas

How to scrub repo of PII. Matt email [Oct 31, 2014]:

The process of scrubbing a repo's history of a file that shouldn't
have been pushed...

Read everything on this page:
https://help.github.com/articles/remove-sensitive-data

I went with BFG over git-filter-branch:
http://rtyley.github.io/bfg-repo-cleaner/. It was pretty easy, and I'd
recommend it. Though in some cases rebasing may be sufficient? A
couple of comments on BFG:

  1. Download the .jar file and run it through Java in the shell:

java -jar bfg.jar

This didn't work for me from the Git shell; I had to run cmd.exe. To
run the java command, you have to have the JRE installed (you probably
do already), and you need the shell to be able to find it. That last
part may require you to update your %PATH% environment variable or
specify the full filename of java.exe (rather than just "java").

  1. To see all the BFG options:

java -jar bfg.jar --help

  1. When you run BFG, specify the --private flag.

Last but not least, remember to think about cached views and pull
requests (see the first link above). For instance, even if your clean
history is pushed, old files may still be accessible if the user can
specify their SHA. You'll need to contact GitHub support to take down
cached views, etc.

Train on GitExtensions "Reset all changes" button

When we introduce GitExtensions, we should be clear about the "Reset all changes" and "Reset unstaged changes" buttons in the commit window. These apply to all the files among the working changes, not just the one currently in view (not just the one that's selected in the window).

Add best practices for dissociating private/binary data from public/version-controlled code

One issue I struggle with is what I should do with large datasets. Git is not wired up to do large binary diffs, and GitHub has a hard limit of 100 MB/file and 1 GB/repo. Right now I add raw-ish data to the repo under the assumption that I won't modify it, and processed data in a temp data folder that is .gitignored. Even my raw-ish data gets me dangerously close to GitHub's limit, though.

For some projects, the datasets are so large that GitHub is right out, so we store the datasets on Dropbox and manually (yeuch!) sync them with .gitignored data folders in the repo. (Referring to a Dropbox folder from within Stata code is tricky to do in a portable way, and storing the repo in Dropbox would be a disaster.)

I'm also divided on whether to store data as DTAs or CSVs. Some practitioners recommend using CSVs as far into the pipeline as possible, because they are plain text, so they are portable and diffable. On the other hand DTAs have features like labels and notes which are desirable.

Advice would be appreciated.

Incorporate Maria's advice into the code check guide

See #8. We didn't end up incorporating Maria's advice into the code check guide. She wrote:

I worked by myself on a code-check, and I had many different do-files that corresponded to different stages of cleaning several modules (ie stage1_module1.do, stage2_module1.do, stage1_module2.do, etc.). What was helpful (and I didn't start off doing this) was to commit every time I completed the code-check on one of the dofiles and note the changes I made in the commit message. That way, it was easy to cross-check dofiles for the same module/stage of cleaning across survey rounds. The changes that I had made to one round likely applied to another, and this way they were easy to see and implement. What I had done prior was to check all of the dofiles in a particular subfolder (ie. round1/stage1/...) and then commit. This was perhaps more efficient in the moment, but then I had to do some digging to find the changes I made to individual files (so in the end, I didn't save time).

Let's add it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.