Staff Software Engineer at GitHub working on Git.
The open source Git project just released Git 2.45 with features and bug fixes from over 96 contributors, 38 of them new. We last caught up with you on the latest in Git back when 2.44 was released.
To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.
Preliminary reftable support
Git 2.45 introduces preliminary support for a new reference storage backend called “reftable,” promising faster lookups, reads, and writes for repositories with any number of references.
If you’re unfamiliar with our previous coverage of the new reftable format, don’t worry, this post will catch you up to speed (and then some!). But if you just want to play around with the new reference backend, you can initialize a new repository with --ref-format=reftable
like so:
$ git init --ref-format=reftable /path/to/repo
Initialized empty Git repository in /path/to/repo/.git
$ cd /path/to/repo
$ git commit --allow-empty -m 'hello reftable!'
[main (root-commit) 2eb0810] hello reftable!
$ ls -1 .git/reftable/
0x000000000001-0x000000000002-565c6bf0.ref
tables.list
$ cat .git/reftable/tables.list
0x000000000001-0x000000000002-565c6bf0.ref
With that out of the way, let’s jump into the details. If you’re new to this series, or didn’t catch our initial coverage of the reftable feature, don’t worry, here’s a refresher. When we talk about references in Git, we’re referring to the branches and tags that make up your repository. In essence, a reference is nothing more than a name (like refs/heads/my-feature
, or refs/tags/v1.0.0
) and the object ID of the thing that reference points at.
Git has historically stored references in your repository in one of two ways: either “loose” as a file inside of $GIT_DIR/refs
(like $GIT_DIR/refs/heads/my-feature
) or “packed” as an entry inside of the file at $GIT_DIR/packed_refs
.
For most repositories today, the existing reference backend works fine. For repositories with a truly gigantic number of references, however, the existing backend has some growing pains. For instance, storing a large number of references as “loose” can lead to directories with a large number of entries (slowing down lookups within that directory) and/or inode exhaustion. Likewise, storing all references in a single packed_refs
file can become expensive to maintain, as even small reference updates require a significant I/O-cost to rewrite the entire packed_refs
file on each update.
That’s where the reftable format comes in. Reftable is an entirely new format for storing Git references. Instead of storing loose references, or constantly updating a large packed_refs
file, reftable implements a binary format for storing references that promises to achieve:
- Near constant-time lookup for individual references, and near constant-time verification that a given object ID is referred to by at least one reference.
- Efficient lookup of entire reference namespaces through prefix compression.
- Atomic reference updates that scale with the size of the reference update, not the number of overall references.
The reftable format is incredibly detailed (curious readers can learn more about it in more detail by reading the original specification), but here’s a high-level overview. A repository can have any number of reftables (stored as *.ref
files), each of which is organized into variable-sized blocks. Blocks can store information about a collection of references, refer to the contents of other blocks when storing references across a collection of blocks, and more.
The format is designed to both (a) take up a minimal amount of space (by storing reference names with prefix compression) and (b) support fast lookups, even when reading the .ref
file(s) from a cold cache.
Most importantly, the reftable format supports multiple *.ref
files, meaning that each reference update transaction can be processed individually without having to modify existing *.ref
files. A separate compaction process describes how to “merge” a range of adjacent *.ref
files together into a single *.ref
file to maintain read performance.
The reftable format was originally designed by Shawn Pearce for use in JGit to better support the large number of references stored by Gerrit. Back in our Highlights from Git 2.35 post, we covered that an implementation of the reftable format had landed in Git. In that version, Git did not yet know how to use the new reftable code in conjunction with its existing reference backend system, meaning that you couldn’t yet create repositories that store references using reftable.
In Git 2.45, support for a reftable-powered storage backend has been integrated into Git’s generic reference backend system, meaning that you can play with reftable on your own repository by running:
$ git init --ref-format=reftable /path/to/repo
[source, source, source, source, source, source, source, source, source, source]
Preliminary support for SHA-1 and SHA-256 interoperability
Returning readers of this series will be familiar with our ongoing coverage of the Git project’s hash function transition. If you’re new around here, or need a refresher, don’t worry!
Git identifies objects (the blobs, trees, commits, and tags that make up your repository) by a hash of their contents. Since its inception, Git has used the SHA-1 hash function to hash and identify objects in a repository.
However, the SHA-1 function has known collision attacks (e.g., Shattered, and Shambles), meaning that a sufficiently motivated attacker can generate a colliding pair of SHA-1 inputs, which have the same SHA-1 hash despite containing different contents. (Many providers, like GitHub, use a SHA-1 implementation that detects and rejects inputs that contain the telltale signs of being part of a colliding pair attack. For more details, see our post, SHA-1 collision detection on GitHub.com).
Around this time, the Git project began discussing a plan to transition from SHA-1 to a more secure hash function that was not susceptible to the same chosen-prefix attacks. The project decided on SHA-256 as the successor to Git’s use of SHA-1 and work on supporting the new hash function began in earnest. In Git 2.29 (released in October 2020), Git gained experimental support for using SHA-256 instead of SHA-1 in specially-configured repositories. That feature was declared no longer experimental in Git 2.42 (released in August 2023).
One of the goals of the hash function transition was to introduce support for repositories to interoperate between SHA-1 and SHA-256, meaning that repositories could in theory use one hash function locally, while pushing to another repository that uses a different hash function.
Git 2.45 introduces experimental preliminary support for limited interoperability between SHA-1 and SHA-256. To do this, Git 2.45 introduces a new concept called the “compatibility” object format, and allows you to refer to objects by either their given hash, or their “compatibility” hash. An object’s compatibility hash is the hash of an object as it would have been written under the compatibility hash function.
To give you a better sense of how this new feature works, here’s a short demo. To start, we’ll initialize a repository in SHA-256 mode, and declare that SHA-1 is our compatibility hash function:
$ git init --object-format=sha256 /path/to/repo
Initialized empty Git repository in /path/to/repo/.git
$ cd /path/to/repo
$ git config extensions.compatObjectFormat sha1
Then, we can create a simple commit with a single file (README
) whose contents are “Hello, world!”:
$ echo 'Hello, world!' >README
$ git add README
$ git commit -m "initial commit"
[main (root-commit) 74dcba4] initial commit
Author: A U Thor <author@example.com>
1 file changed, 1 insertion(+)
create mode 100644 README
Now, we can ask Git to show us the contents of the commit object we just created with cat-file
. As we’d expect, the hash of the commit object, as well as its root tree are computed using SHA-256:
$ git rev-parse HEAD | git cat-file --batch
74dcba4f8f941a65a44fdd92f0bd6a093ad78960710ac32dbd4c032df66fe5c6 commit 202
tree ace45d916e870ce0fadbb8fc579218d01361da4159d1e2b5949f176b1f743280
author A U Thor <author@example.com> 1713990043 -0400
committer C O Mitter <committer@example.com> 1713990043 -0400
initial commit
But we can also tell git rev-parse
to output any object IDs using the compatibility hash function, allowing us to ask for the SHA-1 object ID of that same commit object. When we print its contents out using cat-file
, its root tree OID is a different value (starting with 7dd4941980
instead of ace45d916e
), this time computed using SHA-1 instead of SHA-256:
$ git rev-parse --output-object-format=sha1 HEAD
2a4f4a2182686157a2dc887c46693c988c912533
$ git rev-parse --output-object-format=sha1 HEAD | git cat-file --batch
2a4f4a2182686157a2dc887c46693c988c912533 commit 178
tree 7dd49419807b37a3afd2f040891a64d69abb8df1
author A U Thor <author@example.com> 1713990043 -0400
committer C O Mitter <committer@example.com> 1713990043 -0400
initial commit
Support for this new feature is still considered experimental, and many features may not work quite as you expect them to. There is still much work ahead for full interoperability between SHA-1 and SHA-256 repositories, but this release delivers an important first step towards full interoperability support.
[source]
- If you’ve ever scripted around your repository, then you have no doubt used
git rev-list
to list commits or objects reachable from some set of inputs.rev-list
can also come in handy when trying to diagnose repository corruption, including investigating missing objects.In the past, you might have used something like
git rev-list --missing=print
to gather a list of objects which are reachable from your inputs, but are missing from the local repository. But what if there are missing objects at the tips of your reachability query itself? For instance, if the tip of some branch or tag is corrupt, then you’re stuck:$ git rev-parse HEAD | tr 'a-f1-9' '1-9a-f' >.git/refs/heads/missing $ git rev-list --missing=print --all | grep '^?' fatal: bad object refs/heads/missing
Here, Git won’t let you continue, since one of the inputs to the reachability query itself (
refs/heads/missing
, via--all
) is missing. This can make debugging missing objects in the reachable parts of your history more difficult than necessary.But with Git 2.45, you can debug missing objects even when the tips of your reachability query are themselves missing, like so:
$ git rev-list --missing=print --all | grep '^?' ?70678e7afeacdcba1242793c3d3d28916a2fd152
[source]
-
One of Git’s lesser-known features are “reference logs,” or “reflogs” for short. These reference logs are extremely useful when asking questions about the history of some reference, such as: “what was main pointing at two weeks ago?” or “where was I before I started this rebase?”.
Each reference has its own corresponding reflog, and you can use the
git reflog
command to see the reflog for the currently checked-out reference, or for an arbitrary reference by runninggit reflog refs/heads/some/branch
.If you want to see what branches have corresponding reflogs, you could look at the contents of .git/logs like so:
$ find .git/logs/refs/heads -type f | cut -d '/' -f 3-
But what if you’re using reftable? In that case, the reflogs are stored in a binary format, leaving tools like
find
out of your reach.Git 2.45 introduced a new sub-command
git reflog list
to show which references have corresponding reflogs available to them, regardless of whether or not you are using reftable.[source]
-
If you’ve ever looked closely at Git’s diff output, you might have noticed the prefixes
a/
andb/
used before file paths to indicate the before and after versions of each file, like so:$ git diff HEAD^ -- GIT-VERSION-GEN diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN index dabd2b5b89..c92f98b3db 100755 --- a/GIT-VERSION-GEN +++ b/GIT-VERSION-GEN @@ -1,7 +1,7 @@ #!/bin/sh GVF=GIT-VERSION-FILE -DEF_VER=v2.45.0-rc0 +DEF_VER=v2.45.0-rc1 LF=' '
In Git 2.45, you can now configure alternative prefixes by setting the
diff.srcPrefix
anddiff.dstPrefix
configuration options. This can come in handy if you want to make clear which side is which (by setting them to something like “before” and “after,” respectively). Or if you’re viewing the output in your terminal, and your terminal supports hyperlinking to paths, you could change the prefix to./
to allow you to click on filepaths within a diff output.[source]
-
When writing a commit message, Git will open your editor with a mostly blank file containing some instructions, like so:
# Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # # On branch main # Your branch is up to date with 'origin/main.
Since 2013, Git has supported customizing the comment character to be something other than the default #. This can come in handy, for instance, if you’re trying to refer to a GitHub issue by its numeric shorthand (e.g.
#12345
). If you write#12345
at the beginning of a line in your commit message, Git will treat the entire line as a comment and ignore it.In Git 2.45, Git allows not just any single ASCII character, but any arbitrary multi-byte character or even an arbitrary string. Now, you can customize your commit message template by setting
core.commentString
(orcore.commentChar
, the two are synonyms for one another) to your heart’s content.[source]
-
Speaking of comments,
git config
learned a new option to help document your.gitconfig
file. The.gitconfig
file format allows for comments beginning with a#
character, meaning that everything following that#
until the next newline will be ignored.The
git config
command gained a new--comment
option, which allows specifying an optional comment to leave at the end of the newly configured line, like so:$ git config --comment 'to show the merge base' merge.conflictStyle diff3 $ tail -n 2 .git/config [merge] conflictStyle = diff3 # to show the merge base
This can be helpful when tweaking some of Git’s more esoteric settings to try and remember why you picked a particular value.
[source]
-
Sometimes when you are rebasing or cherry-picking a series of commits, one or more of those commits become “empty” (i.e., because they contain a subset of changes that have already landed on your branch).
When rebasing, you can use the
--empty
option to specify how to handle these commits.--empty
supports a few options: “drop” (to ignore those commits), “keep” (to keep empty commits), or “stop” which will halt the rebase and ask for your input on how to proceed.Despite its similarity to
git rebase
,git cherry-pick
never had an equivalent option to--empty
. That meant that if you were cherry-picking a long sequence of commits, some of which became empty, you’d have to type eithergit cherry-pick --skip
(to drop the empty commit), orgit commit --allow-empty
(to keep the empty commit).In Git 2.45,
git cherry-pick
learned the same--empty
option fromgit rebase
, meaning that you can specify the behavior once at the beginning of yourcherry-pick
operation, instead of having to specify the same thing each time you encounter an empty commit.[source]
The rest of the iceberg
That’s just a sample of changes from the latest release. For more, check out the release notes for 2.45, or any previous version in the Git repository.