hound-search/hound

Hound's git usage is bad in multiple ways. Solutions inside

Open

#249 opened on Apr 15, 2017

View on GitHub
 (3 comments) (1 reaction) (0 assignees)JavaScript (5,470 stars) (606 forks)batch import
bughelp wanted

Description

There's multiple bugs with how hound is using Git, and it's often doing bad things for no reason, and there's multiple outstanding issues & PRs which try to build on this bad behavior, rather than fixing the underlying issues.

Some of those issues/PRs are:

  • #207 filed by me to stop using --depth, added in 2e9ceb9e5f327a37a9d091648497a48250ee130c by @kellegous (and we should use --single-branch as noted by @TobiX).
  • PR #221 fixed an issue with using "pull" but introduced another issue by hardcoding the master branch
  • PR #248 by @havocbane to add the ability to specify a branch, rather than master.
  • #228 appears to be an issue with the hardcoding of origin/master, maybe.

So, looking at what the git driver does:

  • clone: git clone --depth 1
  • update: git fetch --prune --no-tags --depth 1 origin +master:remotes/origin/master && git reset --hard origin/master
  • state: git rev-parse HEAD

The issues with this are that it shouldn't use --depth, as noted in my #207, and that it's hardcoding the master branch, furthermore this whole --no-tags option combined with how the clone doesn't do what the author intended. We clone all the tags initially, but then we just don't update them, so e.g. with gc/repack we still have to maintain all those stale tags.

I've hacked houndd locally to use a git wrapper script which fixes up its bad git usage, the way this works is:

  • clone:

    git clone --single-branch && git --git-dir=/.git config remote.origin.tagOpt --no-tags && git --git-dir=/.git tag -l | xargs git --git-dir=/.git tag -d

This way we clone whatever branch the HEAD points to on the remote side, e.g. master, or trunk or whatever. Then right after the clone we delete all the tags, they won't be fetched again due to the --no-tags tagOpt.

  • update:

    git fetch && git reset --hard @{u}

There's no reason to supply any arguments to fetch, the ref info takes care of all that, nor as noted in #207 should we use the inefficient --depth=1, and there's no reason for --no-tags since it's in our config at this point.

We then reset to @{u}, not a hardcoded master, this will work whatever the HEAD branch is.

  • state: Unchanged, nothing wrong with git rev-parse HEAD.

The wrapper script I'm using is the following, it's slightly more complex because it works before & after d99d1db. The insteadOf line is specific to my site, for reasons I won't go into I'm munging the repo targets.

#!/usr/bin/perl
use strict;
use warnings;
use autodie qw(:all);

my $orig_args = "@ARGV";
my $args = $orig_args;

# Because of https://github.com/etsy/hound/issues/207, and
# --single-branch is what we actually want.
$args =~ s/clone.*?\K--depth 1/--single-branch/;
# We also need to handle pull, see
# https://github.com/etsy/hound/commit/d99d1db
$args =~ s[(?=^(?:clone|pull|fetch))][-c "url.http://git.example.com/git/.insteadOf=ssh://git.example.com/gitroot/" ];
$args =~ s/pull$/fetch/;
# ... and handle the new bad fetch & reset commands
$args =~ s/ fetch\K.*//; # No need to give fetch *any* args
$args =~ s/^reset\K.*/ --hard \@{u}/;

# sudo tail -f /var/log/messages | grep hound-gitwrapper
system(
    "logger",
    "-t",
    "hound-gitwrapper",
    "git with args <$args>" . ($args ne $orig_args ? " (munged from <$orig_args>)" : ""),
);

system "/usr/bin/git $args";

# NOTE: I am intentionally not using /usr/bin/git here, but git, so
# this gets fed into this same script again for syslogging!
if ($orig_args =~ /^clone /) {
    my ($repo_path) = $args =~ m[ (vcs-[0-9a-f]+)];

    system "git --git-dir=$repo_path/.git config remote.origin.tagOpt --no-tags";
    # Will succeed if there are no tags since -l will return an empty list
    system "git --git-dir=$repo_path/.git tag -l | xargs /usr/bin/git --git-dir=$repo_path/.git tag -d";
} elsif ($orig_args eq 'pull') {
    system "git reset --hard \@{u}";
}

I'm running houndd via supervisor and setting environment = PATH="/usr/lib/houndd/bin:/usr/bin" and dropping this as git in /usr/lib/houndd/bin works for me, fixes the bug with not cloning repos with a non-master main branch, reduces load on our git server due to not using --depth=1, and with this running for-each-ref on all the repos shows that only the main branch ref is being maintained, in the data dir:

$ find . -name '.git' -exec git --git-dir={} for-each-ref \;|grep -v remotes|awk '{print $3}'|sort|uniq -c|sort -nr
    254 refs/heads/master
      2 refs/heads/trunk
      1 refs/heads/frunk

I don't have the want/Go skills to easily patch git.go, and I need to maintain this wrapper anyway because I'm doing some further magic (dispatching to LB'd git slaves) which won't ever get upstreamed anyway, but wanted to file this to show what the solution to almost all the complains people have with git & hound in the aforementioned issues is.

Contributor guide