Skip to content

Fix handling of pseudorefs with different refdb backends #7111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pks-gitlab
Copy link

We have a bunch of references that we treat like pseudo-refs. Those
references are (sometimes) read and written by going to the filesystem
directly, at other times they are read and written via the refdb. This
works alright with the "files" ref storage format given that any root
reference never gets packed into the "packed-refs" file, and thus they
would always be accessible a loose ref if present.

The behaviour is wrong though when considering alternate backends like
the "reftable" backend. All references except for pseudo-refs must be
read via the backend, and that includes root refs.

Historically this part of Git has been ill-defined, and it wasn't quite
clear which refs are considered pseudo-refs in the first place. This was
clarified in 6fd80375640 (Documentation/glossary: redefine pseudorefs as
special refs, 2024-05-15): there only are two pseudorefs, "FETCH_HEAD"
and "MERGE_HEAD". The reason why those two references are considered
special is that they may contain additional data that doesn't fit into
the normal reference format.

In any case, our current handling of a couple of root references is
broken in this new world. This pull request fixes most of it, with the only exception being how we initialize the refdb. That part is handled in #7102.

pks-t added 9 commits July 18, 2025 14:06
We have a bunch of references that we treat like pseudo-refs. Those
references are (sometimes) read and written by going to the filesystem
directly, at other times they are read and written via the refdb. This
works alright with the "files" ref storage format given that any root
reference never gets packed into the "packed-refs" file, and thus they
would always be accessible a loose ref if present.

The behaviour is wrong though when considering alternate backends like
the "reftable" backend. All references except for pseudo-refs must be
read via the backend, and that includes root refs.

Historically this part of Git has been ill-defined, and it wasn't quite
clear which refs are considered pseudo-refs in the first place. This was
clarified in 6fd80375640 (Documentation/glossary: redefine pseudorefs as
special refs, 2024-05-15): there only are two pseudorefs, "FETCH_HEAD"
and "MERGE_HEAD". The reason why those two references are considered
special is that they may contain additional data that doesn't fit into
the normal reference format.

In any case, our current handling of a couple of root references is
broken in this new world.

Fix this for "ORIG_HEAD" by exclusively going through the refdb to read
and write that reference. Rename the define accordingly to clarify that
it is a reference and not a file.
Fix handling of "REVERT_HEAD" by exclusively reading and writing it via
the reference database.
Fix handling of "CHERRY_PICK_HEAD" by exclusively reading and writing it
via the reference database.
The GIT_STASH_FILE define contains the path to the stash reference.
While we know that this used to be a file with the "files" backend, it's
not a standalone file with the "reftable" backend anymore.

Rename the macro to GIT_STASH_REF to indicate that this is a proper ref.
Introduce the GIT_HEAD_REF define so that we can clearly indicate that
we're talking about the "HEAD" reference and not necessarily a file.
Note that there still are a couple of places where GIT_HEAD_FILE is
being used:

  - `git_repository_create_head()`: This function is used to create HEAD
    when initializing a new repository. This should get fixed eventually
    so that we create HEAD via the refdb, but this is a more involved
    refactoring that will be handled in a separate patch series.

  - `repo_init_head()`: Likewise.

  - `conditional_match_onbranch()`: This function is used to decide
    whether or not an `includeIf.onbranch` condition matches. This will
    be fixed in subsequent commits.

Other than that there shouldn't be any more references to GIT_HEAD_FILE.
When initializing the "files" refdb we read a couple of values from the
Git configuration. Unfortunately, this causes a chicken-and-egg problem
when reading configuration with "includeIf.onbranch" conditionals: we
need to instantiate the refdb to evaluate the condition, but we need to
read the configuration to initialize the refdb.

We currently work around the issue by reading the "HEAD" file directly
when evaluating the conditional include. But while that works with the
"files" backend, any other backends that store "HEAD" anywhere else will
break.

Prepare for a fix by deferring reading the configuration. We really only
need to be able to execute `git_refdb_lookup()`, so all we need to
ensure is that we can look up a branch without triggering any config
reads.
With the preceding commit we have refactored the "files" backend so that
it can be both instantiated and used to look up a reference without
reading any configuration. With this change in place we don't cause
infinite recursion anymore when using the refdb to evaluate "onbranch"
conditions.

Refactor the code to use the refdb to look up "HEAD". Note that we
cannot use `git_reference_lookup()` here, as that function itself tries
to normalize reference names, which in turn involves reading the Git
configuration. So instead, we use the lower-level `git_refdb_lookup()`
function, as we don't need the normalization anyway.
Expose a function to read a loose reference. This function will be used
in a subsequent commit to read pseudo-refs on the generic refdb layer.
Regardless of which reference storage format is used, pseudorefs will
always be looked up via the filesystem as loose refs. This is because
pseudorefs do not strictly follow the reference format and may contain
additional metadata that is not present in a normal reference.

We don't honor this in `git_reference_lookup()` though but instead defer
to the refdb to read such references. This obviously works just fine
with the "files" backend, but any other backend would have to grow
custom logic to handle reading pseudorefs.

Refactor `git_reference_lookup_resolved()` so that it knows to always
read pseudorefs as loose references. This allows refdb implementations
to not care about pseudoref handling at all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants