This post is about the following hypothetical scenario: Alice is suspected to be a spy from a foreign country, and so to be fluent in the language spoken in that country, called language L. Alice claims to be an innocent citizen with no knowledge whatsoever of language L. Of course, this is just what a spy would say!
You are in charge of interrogating Alice, and to figure out whether she is fluent in L or not. How do you do it?
The easy case is if you are lucky and Alice inadvertently outs herself as an L speaker. For instance, she may use an L word accidentally (e.g., cursing, or counting out loud,
etc.).
Your task is also very easy if Alice has a thick L accent in the language that you share with her.
But what is Alice is perfectly prepared, perfectly fluent in your language,
and you cannot count on her to make a spontaneous mistake?
Of course we want to avoid mistreating Alice (she might be innocent), and we can assume that she will reasonably cooperate with the interrogator (like an innocently accused citizen who wants to prove their innocence).
Here are some ideas of solutions to this problem: thanks
to louis, Ted,
olasd, and Tito for contributing some.
There are also some solutions here which are already mentioned in a
TVtropes entry with a wider
scope;
but maybe there are other possibilities.
Brain imaging
With access to enough technology, there is a solution which is
probably foolproof and requires no cooperation from Alice.
Just put her in a neuroimaging machine, and have her listen to spoken recordings
in language L. By comparing with a language that she knows, and with a language
that she doesn't, it should be possible to detect whether her brain is making
sense of what she hears or not. (I'm no expert in neurosciences though, so I
cannot promise this would work.)
In what follows, I consider this to be cheating: I assume we don't have
out-of-band access to Alice's brain, and that we must test her via normal
sensory interfaces.
The Stroop test
The Stroop test is the following
task: you are presented with a sequence of color names which are themselves
written in a different color (e.g., it could contain the word "blue" but written
in red), and you must read the sequence of the colors in which the words are
written (not read the words themselves). This turns out to be difficult, with
people often messing up and reading the words instead of naming the colors. But
of course the Stroop effect only works if the person knows the language in which
the words are written.
So if Alice's performance on this task is different between color words in
language L and nonsense words, then you can find out that she is familiar with language L.
The Stroop test may in fact have been used for this purpose historically, as discussed in this skeptics.SE question.
Alice may be able to evade detection by deliberately slowing down on nonsense
words to try to match her performance on L words -- and in particular to make
sure that she never ever translates an L word, which would be a dead giveaway.
However, I would assume that maintaining exactly the same performance (and same
error profile) between genuine mistakes (on nonsense words) and contrived
mistakes (on words of L) would be very challenging for Alice. Especially if you
collect precise timings on her performance that she herself does not have access
to.
Timing attacks
A generalization of this idea is to measure Alice's performance on other tasks
involving both L words and nonsense words, and seeing if her performance is
different across both classes. Like, test Alice's ability to memorize short
phrases, some of which are made of nonsense words and some of which are
well-formed sentences in L. Of course if Alice is fluent in L the second task
would be far easier for her.
I guess one can come up with similar tasks, e.g., if L uses an ideogram writing
system, I would expect that it would be easier, say, to find differences between
two figures, or find occurrences of a figure in another figure, or other such
tasks, whenever the figures used in the task are genuine L ideograms as opposed
to similar-looking but nonsense ideograms.
This is a bit like a timing
attack in computer security.
It also resembles a bit the implicit association
test, though I find it
difficult to adapt this specific test to a task for Alice that would also make sense if Alice has
no knowledge of L whatsoever.
Following verbal instructions
Another idea is to have Alice play a game where she must follow instructions as
quickly as possible. The catch is that some of the instructions are nonsense
instructions, and some are instructions in language L. If Alice spontaneously
reacts to one of the language L instructions, then she is unmasked. Playing
this correctly if you understand language L is a bit like a Simons
Says game, whereas of course it is
not especially difficult if you do not know language L. Of course, Alice may
again be able to avoid detection by deliberately slowing down.
A similar idea is mentioned in the TVtropes entry mentioned
above, about a
possibly apocryphal practice by the British of shouting "Achtung!" to identify
German spies. Of course variations of this idea can work if you are not
overtly interrogating Alice but watching her without her knowledge, which I
would also consider as cheating.
A related idea: inserting words from L in conversation, using them as loanwords, and seeing in conversation whether Alice understands one of them (i.e., forgot to pretend she didn't). A related strategy (asking a question in French in the middle of an interview in Spanish) was used by Ladislas de Hoyos to identify Klaus Barbie while he posed as the non-French-speaking Klaus Altmann.
Watermarked language
Another idea hidden in plain sight: how about giving some classes to Alice where you teach her language L,
and see how she performs? If she doesn't react like a real beginner
would, in particular if she uses just one word that you hadn't already taught her, then
she is unmasked.
There is a meaner variation on this idea, but which requires much more
preparation. You could design and teach Alice a constructed language L', which is very similar to L except that it is "watermarked" in many small ways that are difficult to remember, e.g., the orthography of some words is subtly different, some words have been exchanged, etc. If Alice is a spy, you would expect her to mess up at least some of the time with errors influenced by L. By contrast, if Alice is innocent, the errors she makes would not be correlated to L.
This is not a very practical solution, and also I don't know whether it would
work in practice. Still, if it does, I find it interesting that knowing
something (L) may be a handicap in properly learning something else (L').
An explicit solution
To finish with an entirely different idea, another way is watch for a
physiological response: read some erotic literature in language L to Alice and
see if she becomes excited. Ironically, this very low-tech solution
comes from a science fiction short story: I'm in Marsport Without
Hilda, by Isaac
Asimov.
Other involuntary reactions to language could also work, e.g., laughter
(e.g., with jokes), disgust (e.g., with gruesome descriptions), anger (e.g.,
with insults), etc.