The Grumpy Troll

Ramblings of a grumpy troll.

Git Aliases & Shell

Today I took a look at one particular git repository’s configuration and saw something slightly off in the configuration for a credential helper, dating from an old experiment with AWS CodeCommit. I decided to dig deeper to figure out what the actual rules are for shell commands inside git configuration files.

This side-diversion took a bit longer than expected. It’s a Sunday. Ah well. I’ve seen too much cargo-culted incorrect information online, so it was time to figure out an accurate answer. If you see Leaning Toothpick Syndrome markedly increasing the count of backslashes, then a little suspicion is warranted.

In fairness to those who are confused, including me before today: the git logic tries to implement magic “do the right thing” fix-ups at a level beneath the GIT_TRACE reporting, so that it’s pretty hidden until you either read the source or use a strace(1) style of tool. This makes the simple cases right, even when they’re wrong, but makes the fancier cases obtuse.

I decided to focus on looking into how [alias] rules are handled, on the (correct) assumption that it’s the same mechanisms involved.

For clarity: avoid cleverness inside configuration files!

If the steps stated here matter, then the complexity really should be moved out of ~/.gitconfig and into a separate helper script. When you start having to worry about where strings are parsed and how they’re handled in which situations, the correct response is to back away slowly, not breaking eye-contact, and create a very short script to handle things instead.

Notwithstanding correctness, sometimes you need to know how to tame the beast.

git config documentation

Let’s start by reading the actual documentation; this is found in git-config(5):

alias.*
Command aliases for the git(1) command wrapper - e.g. after defining alias.last = cat-file commit HEAD, the invocation git last is equivalent to git cat-file commit HEAD. To avoid confusion and troubles with script usage, aliases that hide existing Git commands are ignored. Arguments are split by spaces, the usual shell quoting and escaping is supported. A quote pair or a backslash can be used to quote them.

If the alias expansion is prefixed with an exclamation point, it will be treated as a shell command. For example, defining alias.new = !gitk --all --not ORIG_HEAD, the invocation git new is equivalent to running the shell command gitk --all --not ORIG_HEAD. Note that shell commands will be executed from the top-level directory of a repository, which may not necessarily be the current directory. GIT_PREFIX is set as returned by running git rev-parse –show-prefix from the original current directory. See git-rev-parse(1).

That’s a decent start but things go wrong mysteriously when we try to do something non-trivial. A simple for-loop appears to fail at the "$@" stage, but why?

Also under “Syntax”:

The following escape sequences (beside \" and \\) are recognized: \n for newline character (NL), \t for horizontal tabulation (HT, TAB) and \b for backspace (BS). Other char escape sequences (including octal escape sequences) are invalid.

Shell side-digression

Something which will become important below is a quirky corner of POSIX shell handling. Despite POSIX giving us the convention of -- to terminate option processing and treat following parameters as non-option entries, when you invoke sh -c FOO -- arg1 you are not using -- to terminate options.

Instead, per the specification, in -c handling, the first non-option parameter is the string providing the input to be parsed for shell commands (here FOO) and the next parameter provides the name of the command! It’s argv[0] pass-through. Thus sh -c FOO -- arg1 will invoke a shell, with $0 equal to --, $1 of arg1 and then parse and handle FOO.

After the first non-option, no other strings are examined to see if they might be options; there is no permutation.

Thus sh -c FOO BAR -u does not risk -u telling the shell to treat unset variable references as errors. Instead, FOO is invoked in a context where argv=["BAR", "-u"].

What actually happens with git

Git’s config parser and the shell-or-other invocation are the two layers to worry about.

The config parser uses both ; and # as comment markers, so an ; if unquoted will terminate things. Given that ; is a sub-list terminator in shell syntax, this is a little unfortunate.

A shell is not necessarily used when an alias is treated as a shell command. That’s very careful documentation wording, above. “it will be treated as a shell command” does not mean that a shell will be used, merely that it’s a shell command to be handled. Git will often resort to using a shell, but it’s not a commitment to do so.

Git parses the configuration file handling basic stripping of comments and handling of the \n substitutions and quoting, before the alias mechanisms come into play.

Git can split a string into separate fields, for invoking as a command, without needing to go near a shell to do so. The alias.c:split_cmdline() function handles this, splitting on whitespace while not splitting within quoted strings. It handles single and double-quotes, with double-quotes supporting a backslash escape purely for avoiding breaking out of quoted state. No other escapes are supported, but the configuration parsing has already handled some. You can have:

[alias]
  foo = !"bar\nbaz"

and the value of stored for the alias foo will be:

!bar
baz

Instead, the quotes handling for split_cmdline are for quotes inside the original quotes; this is what lets us write:

[alias]
  foo = "!printf '%s\\n' first \"sec ond\" third"

and invoke:

% GIT_TRACE=true git foo
20:50:18.430074 git.c:654               trace: exec: git-foo
20:50:18.430779 run-command.c:637       trace: run_command: git-foo
20:50:18.432151 run-command.c:637       trace: run_command: 'printf '\''%s\n'\'' first "sec ond" third'
first
sec ond
third

Note here that we used \\ to avoid having the git config parser handle the \n. A strace(1) shows us:

% strace -ff git foo
[...]
[pid 16724] execve("/bin/sh", ["/bin/sh", "-c", "printf '%s\\n' first \"sec ond\" th"..., "printf '%s\\n' first \"sec ond\" th"...], [/* 62 vars */] <unfinished ...>
[...]
% strace -ff sh -c "printf '%s\n'" first 'sec ond' third
execve("/bin/sh", ["sh", "-c", "printf '%s\\n'", "first", "sec ond", "third"], [/* 61 vars */]) = 0

Here, the sequence \\n is simply how strace is showing that \n as a two-character sequence is the string being passed through.

Returning to Git’s codebase: Without an exclamation mark, split_cmdline() is used and the results put into the current process’s argv[] vector after the initial git, and argument processing effectively then restarts, letting git decide again what should be done.

Without an exclamation mark, that’s it. All done, quick and clean and simple. Everything after here assumes that the entry starts with an exclamation mark.

What the exclamation mark means is really “run this entry as a sub-command now, and exit”. This is found in git.c:handle_alias(). The invocation is then run_command() on a struct with .use_shell set. The entire string of the alias value is passed into this as a single string, no whitespace handling. split_cmdline() does not apply.

But just having .use_shell set still doesn’t mean that a shell is used. What it means is that run-command.c:prepare_shell_cmd() will be used to construct the shell command, which might mean that a shell will be used.

What prepare_shell_cmd() does is look to see if the value of the command might be a single word without any special characters. Those special characters are backtick itself or any of: |&;<>()$\"' \t\n*?[#~=%

Since a whitespace is in the list of characters, simply having two words is enough to trigger invoking the shell.

Once the shell is invoked, the exact form invoked depends upon whether or not the invoker of the command provided parameters on the command-line.

In all cases, the shell is invoked with at least four parameters in argv. The first two are ["sh", "-c"]. The third will be the string from the configuration, if and only if no parameters were supplied to git by the invoker. If parameters were provided, then instead the third parameter for the shell is modified, by adding the five characters "$@" to the end of it. (Five: SPACE, QUOTATION MARK, DOLLAR SIGN, COMMERCIAL AT, QUOTATION MARK.) Git tries to be clever and assumes that you’ll want the arguments available at the end of whatever string is given.

This works for simple commands. But for a text which tries to handle arguments itself, it’s a hindrance to be worked around.

The fourth of the always-present parameters for the shell is the text from the alias definition. Repeated, as the name of the shell.

Any elements in the new shell’s argv after that are passed through from the invoker of git.

What this means for us

Handling the auto-inserted "$@" is simple enough, once you know that it needs to be handled: simply end your alias definition with a shell comment character, the octothorpe # (Unicode NUMBER SIGN).

This is correct alias definition for ~/.gitconfig:

[alias]
  wibble = "!set -x\necho $#\nfor x in \"$@\"; do echo \": {$x}\"; done #"
  wobble = "!for x in \"$@\"; do echo \": {$x}\"; done #"

Here, the git configuration file parsing has resulted in the list of aliases containing an entry for wibble and one for wobble, where the stored strings are:

!set -x
echo $#
for x in "$@"; do echo ": {$x}"; done #

and the same but without the first two lines.

We can invoke git wibble:

% git wibble
+ echo 0
0
% git wibble foo 'bar baz'
+ echo 2
2
+ for x in '"$@"'
+ echo ': {foo}'
: {foo}
+ for x in '"$@"'
+ echo ': {bar baz}'
: {bar baz}

Those two invocations using git wobble instead (to make this a little more condensed and easier to scan) would have resulted in these argv arrays being processed (using single-quotes for strings to avoid introducing backslashes):

first = [
  'sh',
  '-c',
  'for x in "$@"; do echo ": {$x}"; done #',
  'for x in "$@"; do echo ": {$x}"; done #',  # ignored, shell $0
  NULL,
  ]

second = [
  'sh',
  '-c',
  'for x in "$@"; do echo ": {$x}"; done # "$@"',
  'for x in "$@"; do echo ": {$x}"; done #',  # ignored, shell $0
  'foo',      # $1
  'bar baz',  # $2
  NULL,
  ]

Credential Helpers

The AWS documentation currently up at https://docs.aws.amazon.com/codecommit/latest/userguide/setting-up-https-unixes.html has you run:

git config --global credential.helper '!aws codecommit credential-helper $@'

I’ve not used CodeCommit in a while and am not setting it up again now to confirm, but I believe this is wrong. The same mechanisms are in play for how git invokes commands, (except that without an exclamation mark, if not given a complete path, the command tried for "foo" will be "git credential-foo") so you’ll end up with git invoking:

['sh', '-c',
 'aws codecommit credential-helper $@ "$@"',
 'aws codecommit credential-helper $@',  # ignored, shell $0
 'get']

and the shell then invoking:

['aws', 'codecommit', 'credential-helper', 'get', 'get']

Clearly the aws codecommit credential-helper is ignoring extraneous parameters, and relying upon the available sub-commands being any of ["get", "store", "erase"], none of which contain whitespace, so $@ degenerating to $* here is harmless.

Conclusion

It’s easier than expected to have arbitrary shell in a git alias, you just need to know about the undocumented implicit sometimes-added "$@". That still doesn’t mean you should do so.

-The Grumpy Troll

Categories: git shell internals aws