How to Spot Fabricated Data

Tim Pulju
Rice University

As the managing editor of a linguistics journal, I have frequent opportunity to review submissions authored by linguists who are, in a word, gullible. I refer particularly to those field linguists who have been taken in by elicited data whose preposterous nature should be evident. Apparently, many field linguists do not realize that native speakers in many cultures make fun of outsiders by teaching the language to such outsiders incorrectly. Thus, the gullible student, having learned that the correct way to say "Where is the bathroom?" in Yogad is Yu igungku ay atannang, will be entirely confused when this utterance elicits either no response, or at best an "Oh. So what?"

You might think that linguists would examine their data carefully enough to be able to spot deliberate misinformation, but alas, experience shows that this is not always the case. We at SpecGram have always managed to catch such errors, mostly due to the brilliant work of associate editors and other members of the editorial board. However, other reputable journals have not been so careful. For example, consider the following example from Hawaiian (I have omitted citation to avoid embarrassing authors and editors):

   Aia i 'Aiea i 'au'au a'e ai ia i'a i 'ai 'ia e a'u
   'It was in 'Aiea that the fish I ate had been swimming.'

This sentence is obviously phonologically ridiculous. No one would ever say such a phonologically ridiculous thing except in jest.

Even worse is the Maori sentence below:

   i aueeauee a au i aua auaa
   'Those herrings made me scream and scream.'

This utterance has twenty vowels and no consonants, which is plain silly. But not only is it phonologically ridiculous, it is also semantically preposterous. 'Those herrings made me scream and scream?' Come on, already.

Then there are grammatically ridiculous sentences. Once I was writing a book on Nez Perce, and pretty late in the process I discovered a sentence which didn't fit into my grammar. That is, my grammatical rules prohibited such sentences from occurring in the language, but here was one staring me in the face. Luckily, I realized that this was an obvious example of a fabricated sentence which I was obliged to eliminate from the data. Since then, I have become very adept at eliminating grammatically ridiculous sentences from my data. To my surprise, it usually turns out that about 70% of the corpus is grammatically ridiculous and should be excised.

I hope the above will be useful to field linguists. In closing, let me state that although I have limited myself to fabricated utterances in this article, it stands to reason that there are also fabricated languages. For example, Bella Coola, a so-called Salishan language, is obviously phonetically ridiculous through and through. I don't believe that anyone speaks Bella Coola natively, and I urge linguists to stop being taken in by those wily inhabitants of British Columbia. The same goes for anyone out there who still believes in French.

(Acknowledgments: I would like to thank Keith Slater for working with me on the early stages of this article. Also, readers might be interested in reading Aya Katz's poem "The Tribesman" in SpecGram CXLVII.1. Although that poem and this article were developed independently of each other, they are thematically very similar.)

