Alan Turing famously proposed that an artificial intelligence system might be considered sentient if it could convince a human that it was. On this basis, the former Google researcher Brad Lemoine recently achieved notoriety (and unemployment) by claiming that the language model LaMDA had achieved sentience. However, the Turing Test does not take into account the gullibility of the human judge, and in particular the pathetic fallacy. In this light, and inspired by the parrot in the cage paradigm of Chomsky (1956) and Chomsky and Miller (1963), I propose that only a parrot in the cage would be able to distinguish between a Turing Test winning artificial intelligence and a mere statistical learner. The parrot is the subject of much interest in linguistic research, as Chomsky (1965, ch. 9) and Pinker and Bloom (1990) describe. Whether parrots or dolphins are closest to humans in terms in their communicative abilities has been a matter of some speculation.
Whether parrots or dolphins are closest to humans in terms in their communicative abilities has been a matter of some speculation. While there is no evidence of that dolphins possess a grammar per se, this possibility has not deterred several ethically questionable experiments. It is argued in Pinker and Bloom (1990, p. 87) that Parrots are simply repeating sounds, though humans typically do not recognise this but take the sound as a meaningful production of a word. In a similar manner, a generative language model would assign meanings to its random stream of words and accept them as meaningful productions. The more complex the model, the more meaningful productions it will assign. A Turing Test judge would have no way of differentiating between this and a sentient parrot. While linguistic theory posits that humans possess innate mechanisms which allow them to understand and produce language, the random stream of a parrot’s vocalisations has no particular structure to which a human could adapt. The Turing Test judge is in effect treating the output of the Parrot as if it were able to create grammatically correct sentences in response to a communicative situation in the manner of humans, when it is in fact making no sense to a human. This is an example of the pathetic fallacy.
Looking further, NLP researchers must start to think about not just how to evaluate their models, but whether this is even an appropriate question.