Another Bunch of Things You Didn’t Know You Didn’t Know—Madalena Cruz-Ferreira SpecGram Vol CLVIII, No 1 Contents <i>The Fictional Foundations of Natural Language Processing</i>—Book Announcement from Psammeticus Press

Palindromic Passivization

Overcoming the Computational Cost of
So-Called “Center-Embedding” Passives

Albrecht Brechtal,
Fortuna de Sadamente,
and Jonathan van der Meer

Co-Vice-Presidentials of Biolinguisticity at the
Autonomous Xenobiological Linguistics Entity (AXLE)

[Editor’s Note: This is the final installment in the long-running “Center Embedding” saga that has unfolded in the pages of Speculative Grammarian. We believe that Brechtal, de Sadamente, and van der Meer have given the definitive analysis of this phenomenon. On the off chance that we are mistaken, future submissions on this topic will likely be referred to Linguistica ad nauseam, Linguistica ad infinitum, or Linguistica ad hominem as appropriate. —Eds.]

Introductory Remarks

Not that many years ago, armchair was almost exclusively used as a pejorative modifier, signifying a theoretical remoteness divorced from practical experience, coupled with criticism delivered in hindsight. But in the 21st century, a researcher can gather several lifetimes’ worth of raw data, accumulated knowledge, and even digital wisdom electronically—all from the comfort of an armchair. Stores of information too vast to exist in the physical world can be munged by tiny Perl scripts into PhDs. One of the first of our acquaintance to do so was Annasusanna Awkward, PhD, of Universiteit Gnutötung in Germany. Dr. Awkward showed that the comparative frequencies of the letters e, n, and t in undergraduates’ email correspondence are inversely correlated with the emotional intensity of the message being transmitted, while the frequencies of c, f, k, and u are positively correlated. How novel!

Serious research now draws conclusions based on the methodologically opaque results of proprietary search engine algorithms. Electronic communication has shrunk the size of the globe to a few tenths of a light second. An armchair—accompanied by a laptop computer with a high-speed wireless internet connection—is the perfect place for the intellectually intense but physically mild activity of linguistic research.

The realization that we had been doing things wrong came to us as we sat in the Nakna Ankan (“Naked Duck”) pub in Kivik, Sweden—where we had been studying various xenolinguistic aspects of the famous Kungagraven—and we received a text message from Annasusanna announcing she’d finished all of her dissertation research after a long night of slinging code and wandering the internet.

All of this is by way of introduction to say that we will be following Küçük (2008), who was in turn following Slater (2006)—thus creating a conga line of efficient example acquisition—in that most of our data was collected electronically from around the world, much of it in translation, via text messaging, skype, email, or robotic carrier pigeon—all from a rented research cabin in Wassamassaw, South Carolina.

The Failure of the Center-Embedding Hypothesis

While the results are not conclusively definitive, they are definitely conclusive: Küçük must have been [+high] when she claimed that the phenomenon she was analyzing was commonly called Palinilap Cimordromic across the various languages in question. Our in-house meta-meta-search engine, which searches and statistically correlates the results from the various meta-search engines on the internet, found -0.002 instances (±14.22, p<0.5%) of Palinilap Cimordromic. So we shan’t be bringing that up again.

Similarly, all of the hypotheses and theses presented thus far (Küçük 2008, Palin 2008, Drome 2008, M.Adam 2008, Gladstone-Chamberlain 2009—synopses of which cannot be provided succinctly enough in these meager parentheses (see referencees below)), with their emphases on center embedding, are worthy only of diagnoses of neuroses in their authors, who clearly suffer emotional crises—and intellectual paralyses—as they (perhaps fail to) realize that the fundamental bases of their analyses are inadequate to provide the necessary oases of predictive comprehensibility along any of several relevant axes. That is, center embedding is neither necessary nor sufficient to explain Küçük’s data, nor is Küçük’s data necessary or sufficient to require center embedding as an explanation. This of course obviates the need for any of the further center-embeddingcentric explanations offered by the other authors.

Comprehensible center-embedding (or at least center-embedding–like) structures can and do exist with two or even three levels, with limited frequency, in certain languages. A German example, from Rambow (1992), serves as an existence proof:

  1. Weil (ich ((das Fahrrad zu reparieren) versprochen) habe).
    Because I had promised to fix the bike.

However, it is clear that massively center-embedded language is untenable for humans. Yreka Bakery of Egello College famously reported (Bakery 2006) on a case of “highly contagious” “pathological center embedding”. However, the follow-up reporting on that episode did not receive nearly the widespread coverage of the original story. The contagious strain of center embedding did not spread very far from the original department in which it emerged, thus failing to bring about the widely-anticipated Linguistic Apocalypse. As reported by Bakersman (2006) the computational cost of the pathological center embedding caused radically decreased glucose levels in the brain, eventually causing symptoms ranging from intense fatigue to aphasia to coma—side effects that ultimately limited the spread of the disease.

Vastly more devastating results of forbidden experiments in center embedding are detailed by Watson’s 1973 journalistic exposé, The Embedding, and corroborated by later revelations from actual participants in the atrocities: children forced to learn a thoroughly center-embedded language suffer mental breakdowns, and one even dies during center-embedding–induced seizures (Soles 1973); aliens and world governments conspire to abduct and murder their own citizens (Ph’theri and Zwingler 1988); tactical nuclear weapons are deployed against indigenous peoples, who practice ritual infanticidal cannibalism to save themselves (Darriand 1975); massive cover-ups occur all around.

The inescapable conclusion is that extensive center embedding, in unaltered humans, leads to cognitive and communicative breakdown, and thus does not provide a tenable explanation for the productiveif sometimes structurally complexexamples Küçük provides.

More Languages, More Data

Küçük’s analysis focused largely on business-related discourse in the past tense in E, Ere, Erre, Malayalam, Mam, Manam, Mum, and Mutum. Following the advice of The Center for Center Embedding (2009), we have collected additional data, in additional tenses, used in additional contexts, in the additional languages Efe, Éwé, Laal, Mekem, Monom, Noon, Salas, Solos, and Tennet. To that list we have additionally added additional data from Ala’ala, Eme-Eme, Ibubi, Kalak, Kök, Nauruan, Ñuñ, Oruro, Ososo, Payap, Qazaq, Siis, Tommot, and Yaqay. We especially note the presence of Mekem in the list, of which Boas wrote decades ago (early 20th C., reported in 2009):

Future linguists no doubt will discover other languages of this character, though as far as we know today Mekem is unique in its internally-reflexive construction. I dare not publish these observations today, but posterity will bear witness to the human capacity for self-reversal to which Mekem transparently testifies.

Let us now consider several additional examples, most of them collected using the aforementioned Slater Method™. Each example below is presented (in translation, as it was elicited) along with notes on the underlying source languages from which it was collected (all occurred naturally in multiple languages) and a free translation using an English passive construction.

  1. mistakes made mistakes
    Mistakes were made.
    [Kök, Mekem, Mutum, Noon]

  2. unknowns unknown the unknew the unknown unknowns
    The unknown unknowns were (actively) unknown.
    [Ere, Manam, Monom, Salas, Siis]

  3. excuses rejected excuses
    Excuses were rejected.
    [Mum, Ososo, Yaqay]

  4. manner unscientific an in data the collected the data in an unscientific manner
    The data was collected in an unscientific manner.
    [Ñuñ, Oruro, Qazaq, Tommot]

  5. scrutiny to analysis the subject not did not subject the analysis to scrutiny
    The analysis was not subjected to scrutiny.
    [Erre, Ibubi, Kalak, Mam, Tennet]

  6. print to papers several rushed several papers to print
    Several papers were rushed to print.
    [Ala’ala, E, Éwé, Solos]

  7. uncritically theory the publicized the theory uncritically
    The theory was publicized uncritically.
    [Efe, Eme-Eme, Malayalam, Payap]

  8. blame avoid to construction passive the used the passive construction to avoid blame
    The passive construction was used to avoid blame.
    [Ibubi, Laal, Mum, Nauruan, Oruro, Payap, Qazaq, Tommot]

  9. investigations conducting are conducting investigations
    Investigations are being conducted.
    [Éwé, Salas]

  10. culprits finding are finding culprits
    Culprits are being found.
    [Eme-Eme, Kök, Mutum]

  11. blame assigning is assigning blame
    Blame is being assigned.
    [Nauruan, Noon, Solos, Yaqay]

  12. jobs lose will lose jobs
    Jobs will be lost.
    [Ala’ala, Ere, Erre, Mam, Mekem, Monom, Ososo]

  13. careers ruin will ruin careers
    Careers will be ruined.
    [E, Efe, Kalak, Laal, Ñuñ]

  14. embarrassment future avoided have will have avoided future embarrassment
    Future embarrassment will have been avoided.
    [Malayalam, Manam, Siis, Tennet]

We also received a tidbit of unsolicited but intriguing data from the standard dialect of Mongolian, Khalkha (Thompson 2009):

  1. /xadgalagdax/
    to be preserved
As shall become clear later, this data point may indicate that the underlying phonological representation of the name “Khalkha” is akin to “Kha-l-kha”, which is thus perhaps best written with a newly designed, psychologically real syllabary.

An additional bit of unsolicited data (Onesimus 2010) that we didn’t know what to make of is included here for completeness. Onesimus claims the data is from a language called “Egaugnal Language”, spoken “on Earth”, and that it is somehow relevant to our understanding of the matter at hand:

  1. /mædəm aɪm ædəm/
    Drab as a fool.
    [Egaugnal Language]

  2. /ə mæn ə plæn ə kənæl pænəmɔ/
    Aloof as a bard.
    [Egaugnal Language]

The Solution

The solution—that is, the correct analysis (the unintelligible data from Onesimus notwithstanding)—is clearly that the constructions are not center-embedded, but rather palindromic. While the computational (as well as apparent emotional and human) cost of unrestricted center embedding is too high to bear, generating palindromic structures is a simple parlor trick—on par with cold reading, instantly calculating the number of letters in the words of a sentence, or extracting cube roots from ten-digit numbers—available to anyone with normal intelligence who is willing to put in a little practice.

Interestingly, Küçük’s data includes both phrase-level and word-level palindromes, while our more recent data shows only word-level palindromic structures. We have not seen, nor seen any evidence for, morpheme-, syllable-, or phoneme-level palindromic structures. We hypothesize that the agent responsible for these palindromic structures is unable to cross the morphosyntactic boundary, a hypothesis with potentially important ramifications we are unable to explore at this time.

The Root Cause

Building on the themes in our recent work and the work of close colleagues (Brechtal 2005, de Sadamente 2005, van der Meer 2005; Watanabanabe 2005), we hypothesized that the cause of the palindromic passives might be viral in nature. Since we did not have physical access to our informants, we were unable to immediately and directly confirm our suspicion via biological sampling. We were, however, able to have samples from several palindromic speakers sent by overnight courier to our lab for analysis. The results were shocking and clear: a viral agent is the root cause of the changes we are seeing.

This most recently isolated virus seems to be the result of a mutation in an older, more common virus that is endemic in most human populations and, unexpectedly, many species of cetaceans. The only symptoms of the older virus is an urge to read lists of palindromes to unwitting passersby. The recent mutation has rendered the virus more virulent in many areas, and has led to the palindromic symptoms reported by Küçük.

Analysis of samples taken by the CDC during the outbreak reported on by Bakery (2006) indicate that a series of mutations in the older palindromic virus (and possible crossover with at least two strains of avian flu virus, NA1V A and U1F) led to the true center embedding variant that so taxed its hosts.

We interpret Boas’s historical comments about Mekem in a multi-faceted way. First, if a virological infection explains Mekem’s “internally-reflexive construction”, why was it already evident in Boas’s time? There is not enough biolinguistic data to be completely certain, but one aspect is, surely, the long-standing historical trope of Europeans bringing infection to the indigenous people of the Americas, who often have lessened immunity to such novel pathogens. Another aspect, which many may yet find controversial, is that it may be the case that certain synaptic (not syntactic!) patterns induced by phonological—and in modern cases, possibly orthographic—symmetries of the name of a language make its speakers more susceptible—not to the infection itself, but to the neurobiological and linguochemical effects of the infection.

Chemolinguistically speaking, the virus is a marvel of symmetry in both form and function. The DNA, which has been fully sequenced, is palindromic as well! The method of action of the virus is to modify the sentence molecules underlying linguistic structures (see Sershen 2004) by merging them with mirror isomers of themselves, creating chemically palindromic structures, which naturally give rise to palindromic syntax. (See Figure 1.)

Figure 1. Artist’s rendering of the virus in action, modifying sentence molecules with mirror isomers of the underlying linguochemical syntax.

Conclusory Remarks

In addition to reconciling all of Küçük’s data, explaining the symptoms of the infected in Bakery’s reports of contagious pathological center embedding, and validating the theory of Sershen’s biological basis of universal grammar, this synthesis serves as some of the strongest, albeit somewhat indirect, evidence to support the idea that language (in toto and in particular features) is the consequence of an infectious agent from outer space. The infection model also explains many aspects of language contact, the spread of so-called “areal features”, the way idiolects can change in a new linguistic environment, and even the virulent strain of cognitive monolingualism Dan Everett (1983 and 2005) has reported among speakers of Pirahã along with the notorious difficulty of learning that language. This is an INCREDIBLY POWERFUL MODEL. [But see especially Pill 2008 —Eds.]


Another Bunch of Things You Didn’t Know You Didn’t Know—Madalena Cruz-Ferreira
The Fictional Foundations of Natural Language Processing—Book Announcement from Psammeticus Press
SpecGram Vol CLVIII, No 1 Contents