A Lexical
Corpus of
Alphabet Soups
Reed Steiner
Most linguists opt to use lexical corpora compiled from boring texts, such as newspapers, Wikipedia, Adam Sandler screenplays, and erotic Chomsky/Bloomfield fanfiction. However, not one lexical corpus contains the many natural constructions found within one can of Campbell’s Chicken Alphabet Soup. While a few middle-aged Wisconsin parents have taken hours from their dead-end 9-to-5 work-from-home “careers” to measure letter frequencies in a can of soup, exactly zero professional linguists have examined the soup. That leaves an entire body of language completely unreferenced. In order to advance our understanding of the English language (because, let’s face it: English is the only language that matters to linguists), linguists must study alphabet soup.
To get us started in the right direction, our research team has assembled the Soup Corpus for University Members, or SCUM. SCUM is the world’s first lexical corpus consisting solely of words and sentences found in cans of soup, and it has proven to be a valuable tool for linguists everywhere.
Since many scholars have expressed concern that SCUM will only contain random unnatural constructions, we have gone out of our way to make SCUM all-natural. To avoid unnatural constructions tainting our precious SCUM, our team of unpaid research assistants (whose names are listed in 1pt font deep within the footnotes with a light grey text color so no one can see them) poured nearly five thousand cans of soup onto the floor and recorded only the words that formed once the soup settled. The shorter students were given snorkels to check for words stuck to the floor, and any research assistants caught trying to eat the words immediately received a failing grade. Thus, all the data is completely natural and untainted by student interference. Since each word formed naturally as the soup was poured, you can rest assured that all constructions in SCUM are natural, not fabricated.
The beta website is currently only available to select universities, although our goal is to make this corpus available to linguists across the globe to improve international understanding of soup-based linguistics. We want to put SCUM in the hands of linguists everywhere.
Our team also keeps our eyes turned toward the future. The very same research assistants are now examining Scrabble tiles, although they keep dripping soup all over the “Q” tiles. Though this Scrabble corpus may take some more time, our team is confident that these untapped sources of natural language will revolutionize our understanding of English.