Unrelenting Things You Didn’t Know You Didn’t Know—Madalena Cruz-Ferreira SpecGram Vol CLXVIII, No 1 Contents The Speculative Grammarian Essential Guide to Linguistics—A Review—Don Boozer

A Final Word on Fame, Formulæ, and Linguistics

Jonathan van der Meer and Lagâri Hasan Çelebi
Center for Computational Bioinformatics and Linguistics

While we enjoyed reading the recent articles by Slater (“Strings and Things: A Unificational Meta-Theory for All Linguistics”, SpecGram CLXVII.2, 2013) and Colden (“On the Quantum Nature of Linguistic Fame”, SpecGram CLXVII.3, 2013) because they bring much needed attention to matters of fame, formulæ, and linguistics that are near and dear to our hearts, we were sorely disappointed in the timing and direction of their efforts. As a result, we have a couple of important messages for Slater and Colden: first, “Way to scoop us! We’ve been working on this exact formula for a long time!” and, secondly, “Too bad you got it all wrong!”

Slater’s meta-theory is merely about publication and theories, and thus concerns little-l linguistics, not big-L Linguistics. It is also woefully incomplete, hence the title of his paper should more properly have been “Parts of a Semi-Unificational Quasi-Meta-Theory for Most linguistics”, but we’ll let that slide for nowthough we cannot fail to mention that we agree in spirit with Colden’s “shaved monkey” critique, though we did not fully understand the substance of his disparagement in any great detail, as is often the case with Colden. (See Colden 2012g, 2011d, 2010b, 2009c, 2009x, 2008a, 2007, and 1994zz for more examples.)

Of course, given Slater’s general lack of energy for his work (see fn. 3), we suggest he either take up a less demanding academic pursuitlike rocket science or neurosurgeryor take up an addiction to energy drinks before his next outing. Colden acquits himself little better, though his ad hominem style is breathtaking and unparalleled. (See Colden 2012f, 2011c, 2010a, 2009b, 2009w, 2008q, 2007, and 1994zzz for more examples.) Nonetheless, quantum indeterminacy is the last refuge of scoundrels and linguocosmologists.

Our model of linguistic fame is significantly more nuanced, much better researched, supported by remarkably more evidence, appreciably more elegant, markedly more explanatory, andas a result of our work on this topic stretching back to the 1980sboth materially more tripendicular and notably more bitchin’ than any other contenders.

So far, the “literature” on this subject has produced two formulæ, in which ΔT represents a change in fame accrued to a theory or theoretician, ΔΘ and ΔE represent measures of the amount of data involved, Ω represents real-wold applicability, and ћ is the number of actual speakers of the language in question. These two paragons of mathematical linguistitude are:

(ΔT) (ΔE) ≥ ћ

Both attempt to capture what’s really going on here, but they miss several key ingredients that contribute to a more nuanced understanding of linguistic fame:

FA: The current level of fame of the most famous author on a paper. Fame accrues in a Zipfian manner, with the most famous garnering more fame merely for having spoken, regardless of other merits. You can estimate the fame of an author by applying the final fame formula iteratively to all of the author’s publications over their career.

N: The number of authors on a paper. More sharing leads to less fame; in particular, the key relation is to the inverse of √N. Lone genius is the best approach, but one co-author won’t drag you down too much, especially if that co-author is significantly more famous than you. (Specifically, √2 ≈ 1.41421 times as famous as you.) Having many co-authors should thus be avoided.

IF: The impact factor of the journal in which the paper is published. More impactful fame is more impactful.

δJ: The distance between the field of the journal in which the paper is published and the field of linguistics. Any linguist can publish in a linguistics journal, but only an überlinguist can publish a linguistics paper in a biology journal. The number of steps required to traverse from linguistics to the discipline in question in any reasonable academic ontology will do as a distance metric here.

sC·i: The “sciencey” factor of a journal field. These factors have to be determined empirically, so some sample values are provided here. Linguistics has been normalized to 1.

41.9Computer Science
0.0001  Literary studies
0.00Post Modern Literary Criticism

χg: The xerox-generation of the work. In the case of samizdat-style unpublished papers that have nonetheless gained wide circulation, the number of times a paper has been re-copied and distributed is an indicator of importance and fame. For works originally mimeographed and still in circulation after 2007, χg approached infinity.

τf: The trendiness, at the time of publication, of the theoretical framework in which the theory is couched. Shiny new frameworks always garner more attention than older, better understood frameworks in which refutations are more easily formulated. Measured in the standard onomastic unit of Kaytlynns.

β⦿g: The bogosity of the theory. The actual theoretical worth of the theory has some impact on its fame. The effect is small, but measurable. Bogosity is measured in the standard A.I. unit of µLenats.

Θ: The amount of data used in the publication (called E by Colden). There are actually two relevant measures, Θ, the number of morphemes in distinct glossed examples, and ΔΘ, the number of morphemes in new distinct glossed examples. New data is much more dangerous to a theory and thus to fame, because it is subject to new interpretation and discussion by those who would seek to refute you; also, no one really re-reads old examples.

ћ: The number of speakers of the language of the example data provided. Also to prevent division-by-zero discontinuities, we note that ћ is not the number of current speakers, but rather the number of speakers at the height of the language’s political and social influence in Western Culture. So ћ is fairly high for Latin, even though no one speaks it natively today.

ε: Slater and Colden both fail to take into account that Θ can be zero. While it takes an exceptionally bold linguist and a slightly drunk editor for it to happen, such data-free papers have been published, and have incurred significant increases in fame when the β⦿g factor can was kept in check. As we were saying, we add a small stabilizing factor, ε, to prevent division-by-zero discontinuities. We have not fully determined the empirical or theoretical value of ε, though 0.000001 works relatively well in most cases where Θ or ΔΘ are zero. We believe that ε may actually vary by author, and may be inversely proportional to FA.

Ω: The real-world applicability factor, as suggested by Slater. However, Slater failed to provide either standard units or empirically determined reference values. We provide the latter below.

73.9   Can be used by fieldworkers while taking field notes
11.8Can be used to write a reference grammar
3.47Can be used to do graduate homework problems
2.16Can be used to do undergrad homework problems
1.55Can be explained in a textbook aimed at undergrads
1.04Can be explained in a textbook aimed at grad students
0.53Cannot be understood by professional linguists who do not have a background in statistics
0.32Cannot be understood by professional linguists who do not have a background in computer science
0.21Cannot be understood by professional linguists who are not authors on the paper
0.10Cannot be understood by professional linguists who have not recently imbibed mind-altering substances
0.01Cannot be understood by the author of the paper two hours after writing it

Combining all of these factors, we arrive at the following equation for change in fame as the result of publication:

ΔT = [(FA · √N-1) + τf] · max[(IF · δJ)sC·i , χg!] · [(ΔΘ + ε) + (√Θ + ε)-√Ω] · [log13(ћ)]-1 - β⦿g1/83.6

Note that ! here indicates the factorial operator, not the imperative mood or an interjection. So, “6!” isn’t SIX!“, but rather 6 x 5 x 4 x 3 x 2 x 1.

From this equation, we can see that a non-famous author (FA = 0) cannot accrue any appreciable fame without riding some trend, that publishing outside linguistics is very fame-inducing, that less data is better data, and that unknown languages with few speakers are better sources of fame. Interestingly, an infamous author (FA < 0) can increase their infamy in the same way and by the same measure as a famous author would increase their fame.

There are some limitations with our model. We have not dealt with artificial languages and their impact on ћ (though it is of little concern at present). We have not yet incorporated a term for the effect of equations on linguistic fame since the actual occurrence of papers with equations is rare, though the effect is known to be strongly positive. Nor have we fully modeled the effects of non-linguist authorsa more pressing problem as researchers from the natural sciences keep intruding into linguistics. Fortunately, all of the papers written by non-linguists, such as those by Atkinson, et al., have bogosity readings in the range of googols of googols of yottaLenats, and thus have no impact within linguistics itself.

Also of note, fame decays over timea factor ignored entirely by Slater and Colden. The formula for the residual fame (without subsequent bolstering publication) is as follows:

T(t) = T0e-λt

where t is measured in issues of all linguistics journals published since the article in question first appeared, and λ is a decay constant that is particular to a given sub-field of linguistics. Lower numbers indicate slower decay, and our experimentally determined values for some choice fields are:

0.012   Syntax
0.079Compu­tational Linguistics
4.533   Historical Linguistics
5.011Comparative Linguistics
50.13Discourse Analysis
318.9Applied Linguistics

As we come to the end of this paper, we realize that we may have been too harsh on Slater and Colden, since their first-mover advantage in this interesting corner of linguistics is negated by the fact that the trendiness of the topic at hand (τf) has significantly increased of late, bolstering our own ΔT from this paper, while their own T(t) continues to fade. Thanks, guys!

Unrelenting Things You Didn’t Know You Didn’t KnowMadalena Cruz-Ferreira
The Speculative Grammarian Essential Guide to LinguisticsA ReviewDon Boozer
SpecGram Vol CLXVIII, No 1 Contents