Files in the Wild
File naming practices as indicators
of academic introversion/extraversion

Heyókȟa Wakȟáŋ, Ph.D.
Tau­ma­taw­ha­ka­tangi­hanga­koauauo­ta­ma­tea­tu­ri­pu­ka­ka­pi­ki­maunga­ho­ro­nu­ku­po­kaiwhe­nua­ki­ta­na­ta­hu University
New Zealand

An unfortunately uncommonly-studied component of the human linguistic capability is speaker personality. It is a difficult-to-quantify, squishy concepteven more so than semantics or pragmaticsbut nonetheless vitally important in a practical sense (unlike, say, semantics or pragmatics). Despite these difficulties, some minimal headway has been made in understanding the linguistic role of “Big Five” factors, such as the effect of extraversion on discourse, introversion on lack of discourse, openness to experience on L2 acquisition, conscientiousness on ability to learn Lojban, agreeableness on linguistic persuasion, and neuroticism on prescriptivist tendencies.

In this paper I take a multidimensional meta-approach: rather than looking again at the personality-related content of papers in psycholinguistics (or, Phonē forfend!, actual data), I investigate here the personality-related form of papers written by psycho1 linguists. In particular, the names of non-web files made available online for download (such as pdf, doc, rtf, etc) are semi-linguistic indicators of the introversion/extraversion focus of the author with respect to themselves and the field in which they work.

The intuition behind this hypothesis is simple enough. Files are to be downloaded to a reader’s computer, where they will usually join a large unsorted mess on the reader’s virtual desktop (analogous to the large unsorted mess on most readers’ physical desktop). Files that are named so as to make sense in the context of the author’s filessuch as “lacus09.doc”, which is probably the only article written by that author for the 2009 LACUS conferencewould likely be indicative of academic introversion, self-centeredness, and egotism. Files named so as to make sense in the context of the reader’s downloadssuch as “file-naming-practices_2010.pdf”are likely to be indicative of academic extraversion, selflessness, and altruism.

A multi-faceted doubly-regressed principle component analysis of file name data (gathered from the internet, by an automated web crawler and seven wage-slave undergrads) and personality data (gathered by the author and any one of several nubile assistants who are not his wife, via personal interaction with various linguists at conferences and departmental cocktail parties) revealed four primary eigenvectors in file-attribute space that highly correlate “author-focused” file naming practices and certain “uncorrected personality traits” in the authors.

I’ve given these eigenvectors mnemonic names and subjective characteristic descriptions, so as to appropriately pad the length of this paper. The features are listed in order of magnitude in the PCA analysis.2

AR: Aggressive Reductionismcharacterized by the use of very short title elements, including idiosyncratic abbreviations, and a lack of dates; also includes gratuitous used of Greek letters, and having a shoe size of less than 7.

SE: Standards Exclusioncharacterized by the use of highly proprietary (.doc), archaic (.wp), or technically complex (.tex) formats for papers, over more common and less annoying formats, like .pdf; also includes using the incorrect file suffix (such as .rtf for .doc), or excessive margins (greater than one inch).

HO: Hyphen Obviationcharacterized by the lack of spaces or hyphens between words, or even the CommonCourtesy of CamelCase; also includes putting footnotes in particularly unreadable or small fonts, and having creepily long second toes.

LE: Letter Exclusioncharacterized by the use of mostly or only numbers rather than words in file names, making it difficult for readers to associate the content of the file with the name; also includes the inability to produce snappy footnotes, and a fondness for elderberries.

In conclusion, the intuitions behind the hypothesis have been borne out.3 After all, would anyone tolerate a reprint that merely said “LACUS 2009” across the top of the page, rather than indicating the title, author, date, and journal information? Of course not. File names are necessarily more restrictive as a medium, but to make no effort at all indicates a lack of respect for one’s intellectual colleagues and academic peers.


1 Or, as some in the U.S. prefer to be called, “Sociopathic Americans”.

2 For those not familiar with principle component analysis, well, you’re pretty much screwed if you are looking to a footnote to explain it, but the basic idea is to find a smaller number of independent featureswhich are composed of weighted portions of the original, possibly dependent, featureswhich provide a new, non-redundant co-ordinate system for your feature space. The composition of the new features need not be particularly sensible: a perfectly fine principle component of the new feature space could be something like 5% “body hair grossness level” + 13% “tie color quality rating” + 58% “schadenfreude score” + 317% “cheese-o-philia percentile”. Perfectly fine.

3 Though I have to admit that the significant number of relevant foot- and footnote-related factors is surprising.

