git checkout a GitHub Pull Request

Sometimes, reviewing code is easier done with the full diff in front of you. Web UIs like GitHub, Phabricator, or Stash all have their strengths, but often times I find myself opening up my editor and following along. This allows me to better understand the scope of the changes being made, and lets me navigate using an interface I'm more confident in—making me feel more productive. I can then easily jump around the code base, allowing me to understand what effects their diff will have on other code, look up things I'm not 100% sure about, and so on.

As it turns out, GitHub provides an abstraction to Pull Requests via a refs/pull/ object. If you're not familiar with Git internals, a ref is simply a file in .git/refs, which points to a commit hash.

The process is detailed in this GitHub Help page. Essentially, all you need to do is:

# The PR ID of the PR you're looking to checkout.
# 
# Example: "https://github.com/<user>/<repo>/pull/<id>"
PR_ID=42

# Fetch the ref and create the branch pr-${PR_ID}
git fetch origin pull/"${PR_ID}"/head:pr-"${PR_ID}"

# Now checkout the branch
git checkout pr-"${PR_ID}"

You can use this to do some neat stuff besides using it to browse the contents of a Pull Request. Anything that you would normally script on your local repository, you can now script—even with Pull Requests made from a GitHub fork! Pull Requests can now just be treated as a normal branch.

For example, recently I had a potential use case on a project I was working on, where we were introducing a code formatter requirement (think clang-format, black, gofmt, rustfmt, etc.). I won't spend time arguing about the merits of having (or not using) a formatter here—the point here is to illustrate one way refs/pull/ can be used.

Adding a code formatter, and subsequently enforcing it on a code base, typically requires that all the code would be formatted in a giant patch. This could have been performed incrementally on each diff, but our tooling wasn't advanced enough to do so. As there were non-trivial PRs open, we also didn't want to cause undue churn and introduce a bunch of merge conflicts. This not only wastes everybody's time, but results in some annoyance as developers are forced to resolve all these conflicts. Thus, we decided to keep things simple and only format files that weren't currently modified in any open PRs.

On the CI servers, in order to ensure that files weren't being "unformatted" after the giant format patch, we introduced checks that would fail the CI job if files were being committed that would result in a diff should the formatter run again. This would remind the developer to format their code before requesting a Code Review.

However, for the files that were currently modified by a PR, the idea was to relax the formatting requirements. Once those PRs were merged, the plan was to incrementally enable the CI checks on those files when they wouldn't cause conflicts, until we reached 100% coverage. We could take advantage of the ability to exclude certain files from the formatters, and so we needed to generate an exclusion list of all the files currently modified by an open PR.

You could do it using a bash script that looks something like this:

#!/bin/bash

# A list of Active PR IDs, which can be fetched via
# GitHub's API (or a JavaScript selector on the page)
ACTIVE_PRS=(42 121 311)
# The output file to write to
MODIFIED_FILES_OUTPUT="exclusion_list.txt"

# Truncate the file before writing to it
echo "" >| "${MODIFIED_FILES_OUTPUT}"

for PR in "${ACTIVE_PRS[@]}"; do
  git fetch origin pull/"${PR}"/head:pr-"${PR}"
  git diff --name-only develop...pr-"${PR}" | tee -a "${MODIFIED_FILES_OUTPUT}"
done

# Dedupe the modified files
sort "${MODIFIED_FILES_OUTPUT}" -o "${MODIFIED_FILES_OUTPUT}" -u

This takes advantage of the ... diff syntax, which finds the the last common commit with the first commit given to the command, and then diffs the second commit against that.

It's basically equivalent performing the following:

git diff $(git-merge-base A B) B

As files in the exclusion list were formatted, they could be removed, thus enabling the CI checks for it. Once the exclusion list reached 0 files, then we knew that we had successfully formatted all the files in the repository.

In the end, we just ended up biting the bullet and doing a giant reformat over a holiday (with lots of warning beforehand). But if you ever need to do something like this, perhaps this might be useful to avoid people tearing their hair out.