Git is a version control (VCS) system for tracking changes to projects. Version control systems are also called revision control systems or source code management (SCM) systems. These projects can be large-scale programs like the Linux kernel, but they can also be smaller scale projects like your own R development, homework assignments, papers, or thesis. There are many other VCSs available (subversion and Mercurial are currently used extensively in opens source projects) but Git is one of the easier ones to set up. It is also well supported by the GitHub ecosystem, and UI has recently set up its own GitHub servise.

Getting and Installing the Git Client

Installing the Command Line Tools

Git is available on the CSG-managed Linux workstations.

You can also install Git on your own Linux system. Git is available for installation as a package within most Linux distributions and can be installed that way (e.g. using yum or dnf on Red Hat systems or apt-get on Debian systems, or their graphical interfaces). You can also install from source code.

Instructions for installing on Windows or MacOS are available in http://happygitwithr.com/

Other Interfaces

There is a graphical interface git-gui that may be useful.

A number of programming editors Interated Develop[ment Enviornments (IDEs) include Git support. Several Emacs Git modes are available. RStudio has integrated Git support.

Using Git

Before You Start

Before starting to use Git you should first tell Git who you are so that you can be identified when you make contributions to projects. This can be done by specifying your name and email address.

git config --global user.name "Your Name Comes Here"
git config --global user.email your_email@yourdomain.example.com

These two commands only need to be executed once. The information you provide is included in a .gitconfig file in your home directory and is used to label your actions in the Git log.

If you have spaces in the name you provide then be sure to enclose the name in quotation marks.

Starting a Project with Git

You can create a directory for your new project using the mkdir command.

mkdir myproject

or you can work with a a project directory that already exists. You can enter that directory with

cd myproject

Once you are in the project directory, you need to initialize your Git repository with the command git init so that Git knows that it needs to track changes here.

luke@nokomis ~/myproject% git init
Initialized empty Git repository in /home/luke/myproject/.git/

If you type ls -a you will see a directory named .git has been added. This directory stores all of the history information and other configuration data. Don’t touch this directory.

Adding Files

Now try editing a file. For example, let’s create a file named code.R containing some R code. For example, you might add the lines

x <- rnorm(100)
hist(x)

Once you are finished editing the file, you can return to Git and type git status:

luke@nokomis ~/myproject% git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       code.R
nothing added to commit but untracked files present (use "git add" to track)

Notice that the code.R file is listed under Untracked files. This is because you need to tell Git that you want to track the changes in this file. This can be done using git add code.R:

luke@nokomis ~/myproject% git add code.R 

Running git status again after this command produces

luke@nokomis ~/myproject% git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file: code.R
#

Now code.R is listed under Changes to be committed and is considered a new file.

Now that you have added your new code file you can commit the change using git commit. The git commit command requires that you provide a short message about what the changes are and this can be done using the -m switch. If you do not use this switch git will open an editor session for you to enter a message. Make sure you write something informative (but concise) or else you won’t have any idea what you did when you look at it later.

luke@nokomis ~/myproject% git commit -m "Initial code for plotting histogram"
Created initial commit 7425153: Initial code for plotting histogram
 1 files changed, 2 insertions(+), 0 deletions(-)
 create mode 100644 code.R

After committing the change you can run git status. It should say something like

luke@nokomis ~/myproject% git status
# On branch master
nothing to commit (working directory clean)

Notice that the first line says On branch master. The master branch is considered the main line of development. It is possible to have other branches of development but we will skip that for now.

Suppose you want to change the code in your code.R file and print a summary of the data instead of plotting a histogram. You can load the file in our favorite editor and add the line

summary(x)

and delete the line

hist(x)

Once you are done editing the file, you can save/close it and run git diff to see a summary of the changes.

luke@nokomis ~/myproject% git diff
diff --git a/code.R b/code.R
index de54497..32a52f4 100644
--- a/code.R
+++ b/code.R
@@ -1,2 +1,2 @@
 x <- rnorm(100)
-hist(x)
+summary(x)

The output shown here is the Unix diff format and it shows what lines were added, deleted, or changed. If a line has a - in front of it, that line was changed. If a line has a + in front of it, that line was added. Here you have removed the hist(x) line and added the summary(x) line.

In order to commit this change to the history, you need to add the file using git add again and then commit it with a commit message.

luke@nokomis ~/myproject% git add code.R
luke@nokomis ~/myproject% git commit -m "Do summary instead of histogram"
Created commit 6d8dafe: Do summary instead of histogram
 1 files changed, 1 insertions(+), 1 deletions(-)

Since the combination of git add followed by git commit is so common there is a shortcut: git commit -a.

Now there are two revisions in your project history. You can see the complete project history by using the git log command:

luke@nokomis ~/myproject% git log
commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:15:18 2009 -0600

    Do summary instead of histogram

commit 7425153349ff276d89731ac8092597e7dc67520d
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:07:22 2009 -0600

    Initial code for plotting histogram

The git log command gives you the commit identifier, the author of the commit, the date of the commit, and the short message that you provided with each commit. Reverting a Commit

Now suppose you dicide that the summary is not that useful and that you’d rather do the histogram like you had it before. You can resolve this situation with the git revert command. Notice in the Git log that the most recent commit has identifier 6d8dafe72a198ed63d11be8592c39bcd14179a6b (NOTE: this identifier string may be different on your computer!). If you want to reverse the change that this commit introduced you can run

git revert --no-edit 6d8dafe

and the code.R file will be reverted back to the version just before that commit. Note that you do not have to type in the entire identifier string at the command line—the shortest unique substring will suffice. Usually, using the first 7 characters is more than enough. Now when you run git log you get

luke@nokomis ~/myproject% git log
commit bd21cc08168a38a86931416b701b3c248d0c36f7
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:21:34 2009 -0600

    Revert "Do summary instead of histogram"
    
    This reverts commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b.

commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:15:18 2009 -0600

    Do summary instead of histogram

commit 7425153349ff276d89731ac8092597e7dc67520d
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:07:22 2009 -0600

    Initial code for plotting histogram

Notice that history is not erased, but the revert is officially in the log.

Using a Remote Repository

For collaboration or backup it is useful to have your local repository linked to a remote repository, for example one on a GitHub server.

It is easiest to start with a repository created on the GitHub server. Cloning this repository creates a new local repository that is linked to the remote one you cloned.

Using a sample repository in https://github.uiowa.edu/STAT4580-Spring-2017/STAT4580 you can clone and enter the repository with

git clone https://github.uiowa.edu/STAT4580-Spring-2017/STAT4580.git STAT4580
cd STAT4580

You can check the remote repository you are connected to with

git remote -v

Now you can edit and add files and commit your changes. Once you are done, you can push your changes with

git push

Before you continue working you should pull any changed you might have committed from another computer or that might have been committed by a collaborator:

git pull

It is also possible to connect a repository you created locally with a remote repository, but starting with the remote one is simpler.

Getting Help

You can read the help page for a specific command by calling git help. For example, if you wanted to read the help page for git status, you could call

git help status

Inserting help in between git and the command name will retrieve the help page for that command

Summary of Essential Git Commands

  • git status: check status and see what has changed
  • git add: add a changed file or a new file to be committed
  • git diff: see the changes between the current version of a file and the version of the file most recently committed
  • git commit: commit changes to the history
  • git log: show the history for a project
  • git revert: undo a change introduced by a specific commit
  • git clone: clone a remote repository
  • git pull: pull changes from a remote repoository
  • git push: push changes to a remote repository

Collaboration with Git

Some ways for collaborating using Git:

Branching and Merging

With git you can create different branches of development in addition to the master branch that is created by default when you call git init. These side branches can be used to test out new ideas or to explore past history.

For example, suppose you wanted to rewind the state of the repository to a previous commit, but you don’t want to revert all of the commits after that point to get to that previous commit (in other words, you want to preserve the project history as is). The simplest thing to do is to create a new branch at the point in the history that you’re interested in exploring.

Creating a new branch can be done with the git checkout command. First take a look at the git log to see where in the history you want to create a new branch.

luke@nokomis ~/myproject% git log
commit bd21cc08168a38a86931416b701b3c248d0c36f7
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:21:34 2009 -0600

    Revert "Do summary instead of histogram"
    
    This reverts commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b.

commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:15:18 2009 -0600

    Do summary instead of histogram

commit 7425153349ff276d89731ac8092597e7dc67520d
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:07:22 2009 -0600

    Initial code for plotting histogram

Suppose we want to go back and see the state of the project after the second commit. We can create a new branch starting at the commit with identifier 6d8dafe...:

luke@nokomis ~/myproject% git checkout -b test 6d8dafe
Switched to a new branch "test"

Note again the short version of the identifier. Now you are on the test branch. Running git log gives

luke@nokomis ~/myproject% git log
commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:15:18 2009 -0600

    Do summary instead of histogram

commit 7425153349ff276d89731ac8092597e7dc67520d
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:07:22 2009 -0600

    Initial code for plotting histogram

Now you are on this branch you can do some work. Create a text file doc.txt containing

This file documents the code for this project

After creating the doc.txt file git status gives

luke@nokomis ~/myproject% git status
# On branch test
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       doc.txt
nothing added to commit but untracked files present (use "git add" to track)

You can add the file using git add and then commit it to the project using git commit as before:

luke@nokomis ~/myproject% git add doc.txt 
luke@nokomis ~/myproject% git commit -m "Add documentation file"
Created commit e9dd8b1: Add documentation file
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 doc.txt

Running git log now gives

luke@nokomis ~/myproject% git log
commit e9dd8b1dd106e514088e53c1ae922fe59bd995b2
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:53:26 2009 -0600

    Add documentation file

commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:15:18 2009 -0600

    Do summary instead of histogram

commit 7425153349ff276d89731ac8092597e7dc67520d
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:07:22 2009 -0600

    Initial code for plotting histogram

But wait! What about the history that you have on the “master” branch? It’s still there, but you cannot see it while you are on the “test” branch. To switch back to the “master” branch you can call git checkout.

luke@nokomis ~/myproject% git checkout master
Switched to branch "master"

and run git log again.

A useful tool for viewing the branch structure of a Git archive is gitk. Running gitk with the --all switch indicates that gitk should show all branches.

Suppose you feel the work done on the test branch is in fact useful and you want to merge that into your master branch. You can call git merge to merge two branches together. You need to do this from a checkout of the master branch; you can check the branch you are on with git branch:

luke@nokomis ~/myproject% git branch
* master
  test

The asterisk indicates you are on the master branch, so you can do the merge:

luke@nokomis ~/myproject% git merge test
Merge made by recursive.
 doc.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 doc.txt

Running git log gives us the history of the two branches merged together with a merge commit.

luke@nokomis ~/myproject% git log
commit 8d4705f0a90d710c8f5361114ae83273724ad8bd
Merge: bd21cc0... e9dd8b1...
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 08:06:57 2009 -0600

    Merge branch 'test'

commit e9dd8b1dd106e514088e53c1ae922fe59bd995b2
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:53:26 2009 -0600

    Add documentation file

commit bd21cc08168a38a86931416b701b3c248d0c36f7
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:21:34 2009 -0600

    Revert "Do summary instead of histogram"
    
    This reverts commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b.

commit 6d8dafe72a198ed63d11be8592c39bcd14179a6b
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:15:18 2009 -0600

    Do summary instead of histogram

commit 7425153349ff276d89731ac8092597e7dc67520d
Author: Luke Tierney <luke@stat.uiowa.edu>
Date:   Wed Jan 21 07:07:22 2009 -0600

    Initial code for plotting histogram

Because we merged the test branch into the master branch we no longer need it. A branch can be deleted with the -d switch to git branch.

luke@nokomis ~/myproject% git branch -d test
Deleted branch test.
luke@nokomis ~/myproject% git branch
* master

Be careful when deleting branches; if you have not merged that branch into the “master” then deleting a branch will lose all of the history associated with that branch.

Some Useful Resources

Acknowledgements

This introduction is adapted from Roger Peng’s Bare Bones Git Intro for Biostat 776.