LmCast :: Stay tuned in

Don't tug on that, you never know what it might be attached to (2016)

Recorded: Nov. 29, 2025, 1:08 a.m.

Original Summarized

The Universe of Discourse : Don't tug on that, you never know what it might be attached to

The Universe of Discourse

Mark Dominus (陶敏修)
mjd@pobox.com

About me

RSS
Atom
12 recent entries

My new git utility `what-changed-twice` needs a new name
Mystery of the quincunx's missing quincunx
The fivefold symmetry of the quince
A descriptive theory of seasons in the Mid-Atlantic
Claude and I write a utility program
A puzzle about balancing test tubes in a centrifuge
Proof by insufficient information
Willie Singletary will you please go now?
How our toy octopuses got revenge on a Philadelphia traffic court judge
Does someone really have to do the dirty jobs?
The mathematical past is a foreign country
Baseball on the Moon

Archive:
2025:
JFMAMJ
 JAS
2024:
JFMAMJ
 JASOND
2023:
JFMAMJ
 JASOND
2022:
JFMAMJ
 JASOND
2021:
JFMAMJ
 JASOND
2020:
JFMAMJ
 JASOND
2019:
JFMAMJ
 JASOND
2018:
JFMAMJ
 JASOND
2017:
JFMAMJ
 JASOND
2016:
JFMAMJ
 JASOND
2015:
JFMAMJ JASOND
2014:
JFMAMJ JASOND
2013:
JFMAMJ JASOND
2012:
JFMAMJ
 JASOND
2011:
JFMAMJ
 JASOND
2010:
JFMAMJ
 JASOND
2009:
JFMAMJ
 JASOND
2008:
JFMAMJ
 JASOND
2007:
JFMAMJ
 JASOND
2006:
JFMAMJ
  JASOND
2005: OND

Subtopics:
Mathematics245
Programming99
Language95
Miscellaneous75
Book50
Tech49
Etymology35
Haskell33
Oops30
Unix27
Cosmic Call25
Math SE25
Law22
Physics21
Perl17
Biology16
Brain15
Calendar15
Food15

Comments disabled

Fri, 01 Jul 2016

Don't tug on that, you never know what it might be attached to

This is a story about a very interesting bug that I tracked down
yesterday. It was causing a bad effect very far from where the bug
actually was.
emacsclient
The emacs text editor comes with a separate utility, called
emacsclient, which can communicate with the main editor process and
tell it to open files for editing. You have your main emacs
running. Then somewhere else you run the command
emacsclient some-files...

and it sends the main emacs a message that you want to edit
some-files. Emacs gets the message and pops up new windows for editing
those files. When you're done editing some-files you tell Emacs, by
typing C-# or something, it
it communicates back to emacsclient that the editing is done, and
emacsclient exits.
This was more important in the olden days when Emacs was big and
bloated and took a long time to start up. (They used to joke that
“Emacs” was an abbreviation for “Eight Megs And Constantly Swapping”.
Eight megs!) But even today it's still useful, say from shell scripts
that need to run an editor.
Here's the reason I was running it. I have a very nice shell script,
called also, that does something like this:

Interpret command-line arguments as patterns
Find files matching those patterns
Present a menu of the files
Wait for me to select files of interest
Run emacsclient on the selected files

It is essentially a wrapper around
menupick,
a menu-picking utility I wrote which has seen use as a component of
several other tools.
I can type
also Wizard

in the shell and get a menu of the files related to the wizard, select
the ones I actually want to edit, and they show up in Emacs. This is
more convenient than using Emacs itself to find and open them. I use
it many times a day.
Or rather, I did until this week, when it suddenly stopped working.
Everything ran fine until the execution of emacsclient, which would
fail, saying:
emacsclient: can't find socket; have you started the server?

(A socket is a facility that enables interprocess communication, in
this case between emacs and emacsclient.)
This message is familiar. It usually means that I have forgotten to
tell Emacs to start listening for emacsclient, by running M-x
server-start. (I should have Emacs do this when it starts up, but I
don't. Why not? I'm not sure.) So the first time it happened I went
to Emacs and ran M-x server-start. Emacs announced that it had
started the server, so I reran also. And the same thing happened.
emacsclient: can't find socket; have you started the server?

Finding the socket
So the first question is: why can't emacsclient find the socket?
And this resolves naturally into two subquestions: where is the
socket, and where is emacsclient looking?
The second one is easily answered; I ran strace emacsclient (hi
Julia!) and saw that the last interesting thing emacsclient did
before emitting the error message was
stat("/mnt/tmp/emacs2017/server", 0x7ffd90ec4d40) = -1 ENOENT (No such file or directory)

which means it's looking for the socket at /mnt/tmp/emacs2017/server
but didn't find it there.
The question of where Emacs actually put the socket file was a little
trickier. I did not run Emacs under strace because I felt sure that
the output would be voluminous and it would be tedious to grovel over
it.
I don't exactly remember now how I figured this out, but I think now
that I probably made an educated guess, something like: emacsclient
is looking in /mnt/tmp; this seems unusual. I would expect the
socket to be under /tmp. Maybe it is under /tmp? So I looked
under /tmp and there it was, in /tmp/emacs2017/server:
srwx------ 1 mjd mjd 0 Jun 27 11:43 /tmp/emacs2017/server

(The s at the beginning there means that the file is a “Unix-domain
socket”. A socket is an endpoint for interprocess communication. The
most familiar sort is a TCP socket, which has a TCP address, and which
enables communication over the internet. But since ancient times Unix
has also supported Unix-domain sockets, which enable communication
between two processes on the same machine. Instead of TCP addresses,
such sockets are addressed using paths in the filesystem, in this case
/tmp/emacs2017/server. When the server creates such a socket, it
appears in the filesystem as a special type of file, as here.)
I confirmed that this was the correct file by typing M-x
server-force-delete in Emacs; this immediately caused
/tmp/emacs2017/server to disappear. Similarly M-x server-start
made it reappear.
Why the disagreement?
Now the question is: Why is emacsclient looking for the socket under
/mnt/tmp when Emacs is putting it in /tmp? They used to
rendezvous properly; what has gone wrong? I recalled that there was
some environment variable for controlling where temporary files are
put, so I did
env | grep mnt

to see if anything relevant turned up. And sure enough there was:
TMPDIR=/mnt/tmp

When programs want to create tmporary files and directories, they normally do it in /tmp. But
if there is a TMPDIR setting, they use that directory instead. This
explained why emacsclient was looking for
/mnt/tmp/emacs2017/socket. And the explanation for why Emacs itself
was creating the socket in /tmp seemed clear: Emacs was failing to
honor the TMPDIR setting.
With this clear explanation in hand, I began to report the bug in
Emacs, using M-x report-emacs-bug. (The folks in the #emacs IRC
channel on Freenode suggested this. I had a bad
experience last time I tried
#emacs, and then people mocked me for even trying to get useful
information out of IRC. But this time it went pretty well.)
Emacs popped up a buffer with full version information and invited me
to write down the steps to reproduce the problem. So I wrote down
% export TMPDIR=/mnt/tmp
% emacs

and as I did that I ran those commands in the shell.
Then I wrote
In Emacs:
M-x getenv TMPDIR
(emacs claims there is no such variable)

and I did that in Emacs also. But instead of claiming there was no
such variable, Emacs cheerfully informed me that the value of TMPDIR
was /mnt/tmp.
(There is an important lesson here! To submit a bug report, you find
a minimal demonstration. But then you also try the minimal
demonstration exactly as you reported it. Because of what just
happened! Had I sent off that bug report, I would have wasted
everyone else's time, and even worse, I would have looked like a
fool.)
My minimal demonstration did not demonstrate. Something else was
going on.
Why no TMPDIR?
This was a head-scratcher. All I could think of was that
emacsclient and Emacs were somehow getting different environments,
one with the TMPDIR setting and one without. Maybe I had run them
from different shells, and only one of the shells had the setting?
I got on a sidetrack at this point to find out why TMPDIR was set in
the first place; I didn't think I had set it. I looked for it in
/etc/profile, which is the default Bash startup instructions, but it
wasn't there. But I also noticed an /etc/profile.d which seemed
relevant. (I saw later that the /etc/profile contained instructions
to load everything under /etc/profile.d.) And when I grepped for
TMPDIR in the profile.d files, I found that it was being set by
/etc/profile.d/ziprecruiter_environment.sh, which the sysadmins had
installed. So that mystery at least was cleared up.
That got me on a second sidetrack, looking through our Git history for
recent changes involving TMPDIR. There weren't any, so that was a
dead end.
I was still puzzled about why Emacs sometimes got the TMPDIR setting
and sometimes not. That's when I realized that my original Emacs
process, the one that had failed to rendezvous with emacsclient,
had not been started in the usual way. Instead of simply running
emacs, I had run
git re-edit

which invokes Git, which then runs
/home/mjd/bin/git-re-edit

which is a Perl program I wrote that does a bunch of stuff to figure
out which files I was editing recently and then execs emacs to edit
them some more. So there are several programs here that could be
tampering with the environment and removing the TMPDIR setting.
To more accurately point the finger of blame, I put some diagnostics
into the git-re-edit program to have it print out the value of
TMPDIR. Indeed, git-re-edit reported that TMPDIR was unset.
Clearly, the culprit was Git, which must have been removing TMPDIR
from the environment before invoking my Perl program.
Who is stripping the environment?
To confirm this conclusion, I created a tiny shell script,
/home/mjd/bin/git-env, which simply printed out the environment, and
then I ran git env, which tells Git to find git-env and run it.
If the environment it printed were to omit TMPDIR, I would know Git
was the culprit. But TMPDIR was in the output.
So I created a Perl version of git-env, called git-perlenv, which
did the same thing, and I ran it via git perlenv. And this time
TMPDIR was not in the output. I ran diff on the outputs of git
env and git perlenv and they were identical—except that git
perlenv was missing TMPDIR.
So it was Perl's fault! And I verified this by running perl
/home/mjd/bin/git-re-edit directly, without involving Git at all.
The diagnostics I had put in reported that TMPDIR was unset.
WTF Perl?
At this point I tried getting rid of get-re-edit itself, and ran the
one-line program
perl -le 'print $ENV{TMPDIR}'

which simply runs Perl and tells it to print out the value of the
TMPDIR environment variable. It should print /mnt/tmp, but instead
it printed the empty string. This is a smoking gun, and Perl no
longer has anywhere to hide.
The mystery is not cleared up, however. Why was Perl doing this?
Surely not a bug; someone else would have noticed such an obvious bug
sometime in the past 25 years. And it only failed for TMPDIR, not
for other variables. For example
FOO=bar perl -le 'print $ENV{FOO}'

printed out bar as one would expect. This was weird: how could
Perl's environment handling be broken for just the TMPDIR variable?
At this point I got Rik Signes and Frew Schmidt to look at it with
me. They confirmed that the problem was not in Perl generally, but
just in this Perl. Perl on other systems did not display this
behavior.
I looked in the output of perl -V, which says what version of Perl
you are using and which patches have been applied, and wasted a lot of
time looking into
CVE-2016-2381,
which seemed relevant. But it turned out to be a red herring.
Working around the problem, 1.
While all this was going on I was looking for a workaround. Finding
one is at least as important as actually tracking down the problem
because ultimately I am paid to do something other than figure out why
Perl is losing TMPDIR. Having a workaround in hand means that when
I get sick and tired of looking into the underlying problem I can
abandon it instantly instead of having to push onward.
The first workaround I found was to not use the Unix-domain socket.
Emacs has an option to use a TCP socket instead, which is useful on
systems that do not support Unix-domain sockets, such as non-Unix
systems. (I am told that some do still exist.)
You set the server-use-tcp variable to a true value, and when you
start the server, Emacs creates a TCP socket and writes a description
of it into a “server file”, usually ~/.emacs.d/server/server. Then
when you run emacsclient you tell it to connect to the socket that
is described in the file, with
emacsclient --server-file=~/.emacs.d/server/server

or by setting the EMACS_SERVER_FILE environment variable. I tried
this, and it worked, once I figured out the thing about
server-use-tcp and what a “server file” was. (I had misunderstood
at first, and thought that “server file” meant the Unix-domain socket
itself, and I tried to get emacsclient to use the right one by
setting EMACS_SERVER_FILE, which didn't work at all. The resulting
error message was obscure enough to lead me to IRC to ask about it.)
Working around the problem, 2.
I spent quite a while looking for an environment variable analogous to
EMACS_SERVER_FILE to tell emacsclient where the Unix-domain socket
was. But while there is a --socket-name command-line argument to
control this, there is inexplicably no environment variable. I hacked
my also command (responsible for running emacsclient) to look for
an environment variable named EMACS_SERVER_SOCKET, and to pass its
value to emacsclient --socket-name if there was one. (It probably
would have been better to write a wrapper for emacsclient, but I
didn't.) Then I put
EMACS_SERVER_SOCKET=$TMPDIR/emacs$(id -u)/server

in my Bash profile, which effectively solved the problem. This set
EMACS_SERVER_SOCKET to /mnt/tmp/emacs2017/server whenever I
started a new shell. When I ran also it would notice the setting
and pass it along to emacsclient with --socket-name, to tell
emacsclient to look in the right place. Having set this up I could
forget all about the original problem if I wanted to.
But but but WHY?
But why was Perl removing TMPDIR from the environment? I didn't
figure out the answer to this; Frew took it to the #p5p IRC channel
on perl.org, where the answer was eventually tracked down by Matthew
Horsfall and Zefrem.
The answer turned out to be quite subtle. One of the classic attacks
that can be mounted against a process with elevated privileges is as
follows. Suppose you know that the program is going to write to a
temporary file. So you set TMPDIR beforehand and trick it into
writing in the wrong place, possibly overwriting or destroying
something important.
When a program is loaded into a process, the dynamic loader does the
loading. To protect against this attack, the loader checks to see if
the program it is going to run has elevated privileges, say because it
is setuid, and if so it sanitizes the process’ environment to prevent
the attack. Among other things, it removes TMPDIR from the
environment.
I hadn't thought of exactly this, but I had thought of something like
it: If Perl detects that it is running setuid, it enables
a secure mode which, among other things, sanitizes the environment.
For example, it ignores the PERL5LIB environment variable that
normally tells it where to look for loadable modules, and instead
loads modules only from a few compiled-in trustworthy directories. I
had checked early on to see if this was causing the TMPDIR problem,
but the perl executable was not setuid and Perl was not running in
secure mode.
But Linux supports a feature called “capabilities”, which is a sort of
partial superuser privilege. You can give a program some of the
superuser's capabilities without giving away the keys to the whole
kingdom. Our systems were configured to give perl one extra
capability, of binding to low-numbered TCP ports, which is normally
permitted only to the superuser. And when the dynamic loader ran
perl, it saw this additional capability and removed TMPDIR from
the environment for safety.

This is why Emacs had the TMPDIR setting when run from the command
line, but not when run via git-re-edit.

Until this came up, I had not even been aware that the “capabilities”
feature existed.
A red herring
There was one more delightful confusion on the way to this happy
ending. When Frew found out that it was just the Perl on my
development machine that was misbehaving, he tried logging into his
own, nearly identical development machine to see if it misbehaved in
the same way. It did, but when he ran a system update to update Perl,
the problem went away. He told me this would fix the problem on my
machine. But I reported that I had updated my system a few hours
before, so there was nothing to update!
The elevated capabilities theory explained this also. When Frew
updated his system, the new Perl was installed without the elevated
capability feature, so the dynamic loader did not remove TMPDIR from
the environment.
When I had updated my system earlier, the same thing happened. But
as soon as the update was complete, I reloaded my system configuration, which
reinstated the capability setting. Frew hadn't done this.
Summary

The system configuration gave perl a special capability
so the dynamic loader sanitized its environment
so that when perl ran emacs,
the Emacs process didn't have the TMPDIR environment setting
which caused Emacs to create its listening socket in the usual place
but because emacsclient did get the setting, it looked in the wrong place

Conclusion
This computer stuff is amazingly complicated. I don't know how anyone
gets anything done.
[ Addendum 20160709: Frew Schmidt has written up the same
incident,
but covers different ground than I do. ]
[ Addendum 20160709: A Hacker News comment asks what changed to cause
the problem? Why was Perl losing TMPDIR this week but not the week
before? Frew and I don't know! ]

[Other articles in category /tech]
permanent link

This document details a perplexing technical issue encountered while integrating the `also` command (a shell script used to manage emacsclient) with Git. The core problem was a consistent failure of `emacsclient` to establish a connection, despite the environment seemingly configured correctly. This issue was ultimately traced to a subtle interaction between the dynamic loading mechanism within Perl and a capability granted to the Perl interpreter.

Mark Dominus, the author, describes the initial symptoms: `emacsclient` failing to find the socket, even though the `server-start` command was executed within Emacs. The initial fix—manually starting the `server-start` command—proved to be a temporary measure. The investigation quickly focused on where `emacsclient` was attempting to establish the connection and what factors might be interfering.

The investigation involved utilizing tools like `strace` to monitor `emacsclient`'s actions and examining the environment variables. The exploration revealed that the `also` command, when invoked via Git (specifically, `git-re-edit`), was crucial. This invocation triggered a dynamic loading process within Perl, and this is where the key revelation occurred.

The dynamic loader within Perl, as part of security measures, removes the `TMPDIR` environment variable from the loaded process's environment when a program is setuid (implying elevated privileges). This is standard practice to prevent malicious programs from manipulating the environment and potentially gaining unauthorized access. However, it's this very process that was causing the issue. Because `git-re-edit` ran Perl with elevated privileges, the dynamic loader stripped the `TMPDIR` setting, causing `emacsclient`, which relied on this setting to find its listening socket, to fail.

A more nuanced explanation clarifies that the system configuration included an additional 'capability' for the Perl interpreter. This capability, allowing low-numbered TCP port binding, is typically restricted to the superuser. The dynamic loader within Perl utilizes this capability, and consequently removes the `TMPDIR` setting. The author meticulously chronicles the investigation, including the efforts to ascertain whether the initial problem was due to a system update, ultimately determining that the root cause lies in the dynamic loading behavior of Perl within the context of the `git-re-edit` process and the system's capability configuration.

The key takeaway is a demonstration of a security mechanism's impact on seemingly simple configurations. The author's detailed explanation highlights the importance of understanding the interactions between dynamic loading, environment variables, and security capabilities within complex software environments, underscoring the surprising and subtle ways in which these elements can coalesce to create unexpected problems. The document concludes with a characteristically understated acknowledgement of the complexity of computer systems.