Slate.com

The Definition of a Dictionary

2015-01-13

By the high-gloss, high-tech standards of 21st-century corporate life, the headquarters of America’s premier dictionary publisher is an unusual place. Merriam-Webster Inc. is housed in a two-story brick building in Springfield, Massachusetts, that, if not for the bas-relief dictionary and company name above the front door, could pass for an old elementary school. There’s a broad central staircase, and dowdy conference rooms, linoleum floors, and creaky wooden doors, even some hospital-green and cafeteria-yellow walls. The decor is a mishmash of stately oak desks from the 1940s and gray cubicles from the 1990s.

On the first floor are business offices and company artifacts. A glass case down an echoey hallway contains Noah Webster’s first lexicographic effort, A Compendious Dictionary of the English Language, published in 1806, and his 1828 follow-up, An American Dictionary of the English Language, which, with 70,000 entries, rivaled Samuel Johnson’s great British book. The second floor is home to about 40 definers, etymologists, pronouncers, daters, and typists, plus the most comprehensive extant repository of the history of American English: 16 million 3-by-5-inch slips of paper, known as citations, crammed into alphabetized drawers in rows of chest-high, metal filing cabinets. “The essential value of the company is inside those drawers,” says Peter Sokolowski, a Merriam editor. “It’s irreplaceable.” The citation files are supposed to be fireproof. But with no sprinkler system installed—an accidental soaking would cause serious damage—no one wants to find out if they really are.

Merriam’s president and publisher, John M. Morse, admits that the company should probably move to a more modern space. But the 75-year-old building was paid off long ago—cost of construction: $200,000—and running a reference publisher in western Massachusetts is a lot cheaper than doing so in Boston or New York. Employees, whose work tends to the monastic, also like the bucolic region, if not struggling downtown Springfield. In any case, it’s fitting that this iconic American brand, created by one of the nation’s first political thinkers and intellectual entrepreneurs, remains loyal to its past. While Noah Webster was a Connecticut man, the company has been in Springfield since brothers George and Charles Merriam acquired the rights to the dictionary after Webster died in 1843.

Now Merriam-Webster is pushing into the future by making an audacious nod to its past. More than half a century after it was published, the company’s landmark book—Webster’s Third New International Dictionary, Unabridged, known in lexicographic circles as Webster’s Third, W3, the Unabridged, or the Third—is getting an overhaul. The Third is a behemoth—4 inches thick, 13½ pounds, 2,700 pages—that falls like a crashing wave when opened. A fourth edition, by contrast, might never exist as a physical object. This latest revision, a project Merriam-Webster hopes will secure its dominance in the tenuous business of commercial lexicography if not ensure its future survival, is happening entirely online.

On its face, this might sound like a terrible plan. Merriam has tasked the majority of its employees with rewriting a book that likely won’t generate revenue the old-fashioned way, through hardcover sales. The project involves the subscription-only Unabridged site, not Merriam’s free online dictionary, which is based on its smaller desktop book, Merriam-Webster’s Collegiate Dictionary. So there’s no guarantee it will find enough customers willing to pay $29.95 a year to turn a profit. Plus, the work could take decades to complete. By the time the Third gets close to being a Fourth, it’s not clear how people will use a dictionary, or even what a dictionary will be.

But while the Internet has upended a publishing model that dates to Robert Cawdrey’s 1604 A Table Alphabeticall, it also has strengthened the feeling among lexicographers that the public cares deeply about language—and that there is still a place for the dictionary. For Merriam specifically, the potential of digital lexicography, a belief that people crave guidance and trust authority, and its own historical place in American letters have combined to convince it of the wisdom of rolling the dice and redoing the Third.

“Creating a new Unabridged Dictionary gives us the opportunity to revisit the biggest questions of all,” Morse says. “What is it that ought to be said and shown about the words in the dictionary? What should we talk about when we talk about words? The Unabridged provides the platform to present the fullest explication of words and hence the opportunity to say what it is that ought to be said. And the answer shifts from generation to generation.”

That might come off as highfalutin, and possibly self-serving. Merriam isn’t the first dictionary company to update its signature reference work or to go digital-only with it, and the Unabridged isn’t even close to the biggest lexicographic resource out there; Oxford University Press has been issuing quarterly online revisions of its mammoth Oxford English Dictionary since 2000. But while the OED has a handful of lexicographers writing definitions in New York, and the legendary tome is revered for its comprehensive historical approach—the most recent printed edition ran to 20 volumes—it’s ultimately an English-as-in-England work. Merriam-Webster’s Unabridged is distinctly American, the seminal sourcebook not only for English as it is written and spoken in the United States but also for the history of lexicography in the United States.

“Language,” Noah Webster wrote in the preface to his American Dictionary, “is the expression of ideas; and if the people of one country cannot preserve an identity of ideas, they cannot retain an identity of language.” Almost 200 years later, the descendant of the company Webster founded is the last fully staffed American dictionary-maker standing. And its unabridged dictionary is its crown jewel.

“Within certain boundaries, we get to reinvent what the dictionary is,” Morse says. “The opportunity does not come often, so it’s vitally important that we seize it.”

“I was working on twerk,” Emily Brewster whispers in her tidy cubicle on the Merriam building’s second floor. A warren of desks stacked with papers and shelves groaning with books, the Editorial Floor, as it is officially known, could pass for a newspaper newsroom. But one thing is missing: noise. Into the 1990s, talking was all but banned and staffers communicated by writing on pink citation slips, even to make lunch plans. Silence still reigns; like most of her editorial colleagues, Brewster doesn’t have a phone on her desk.

Now, on this summer afternoon, the Merriam associate editor is moving on to upcycle, specifically the noun form: “an upward trend in business activity.” That’s the quick, first-draft definition she’s typed into an Excel spreadsheet titled New Words, a document in which Merriam staffers suggest and track possible dictionary additions. Brewster is collecting examples of nounal usage, from Merriam’s electronic citation files and from the database Nexis. (The electronic system was created in 1983 but paper citations, or “cits,” were also generated until 2009.) Another definer has noted a verb form, “to recycle materials into a product of higher intrinsic value.”

Brewster copies and pastes quotations that use the word, plus the source and date, into a Microsoft Word document. There’s a 1981 Associated Press story about airline stocks, a 1984 Business Wire release about insurance brokers, a 1997 Houston Chronicle piece on mortgage rates. As she tinkers with the definition, I ask whether she wants to avoid upward in the definiens (aka the definition) because the definiendum (the expression being defined) is itself an up word. “What I want is the most easily comprehended wording,” she says. “It is less than ideal to use upward in a word like upcycle. But upward trend is a phrase that people really understand and isn’t overly complicated.”

I suggest “a period of increased economic activity.” She politely ignores me. Then I point out that the upcycle examples she has culled all refer to financial products like stocks and insurance, indicating that it might be a narrow Wall Street term rather than a broad economic one. “How about ‘an upward trend in the value of financial instruments’?” I ask. “That’s very good,” Brewster replies. “Can I use that? We can put your initials in the dictionary. Which nobody gets.” (Look out Noah, here I come.)

Brewster wears narrow black eyeglasses and has excellent posture. At 41, she is one of three younger Merriam editors with a word-nerd cult following thanks to their tweets about objectless prepositions, the history of scuttlebutt, and why people were looking up satire last week, as well as videos on flat adverbs and weird plurals. Brewster is a general definer, meaning she doesn’t have a specialty. This is the first time she has been assigned to New Words, which is “kind of just the dream job of a lexicographer,” she says.

Brewster picks five or six potential newbies at a time, collects citations, and types up sample definitions and notes on possible “senses,” or distinct meanings, of each word or phrase. She lets more complex candidates “percolate” for days, weeks, or even months before either writing a final definition or deciding that a word isn’t worthy of entry just yet. (Brewster’s editor, Stephen Perrault, who has Merriam’s best title, director of defining, makes the final call on whether or not a New Word makes the Unabridged.) Vocal fry is on Brewster’s to-define list, as are voice actor and voice cast. She’s also getting started on a new sense of the verb troll; while the revised Unabridged definition already includes “to antagonize (others) online,” Brewster is seeing growing usage in an offline sense, too.

The New Words file contains about 1,700 nominees for word-dom. But it isn’t the sum and substance of the Unabridged revision. Merriam plans to re-examine and when necessary—and it’s usually necessary—rewrite each of more than 476,000 entries from the most recent printing of the Third, in 2002, when the original 1961 edition, plus its seven addenda, was first made available online. The current project began with a few staff members in 2009 and has since ramped up. Merriam has issued three updates, which it is calling “Releases”: 4,800 new or revised entries plus new senses of existing entries in January 2013 (including aftermarket and cyberbullying), 2,800 last March (bad hair day and badassery), and 2,750 in October (superfood and vuvuzela). Release 4 is due in the spring.

Work on the Third took more than a decade, an editorial staff of 70, and scores of consultants; the book lists 202 experts in fields from Maori etymology to pavement construction to colorimetry to knots. The Third wasn’t done until every definition was completed. But a dictionary isn’t primarily a book anymore. It’s a database. The deadline to update is never, or always, because a digital dictionary can be updated continuously and because language evolves continuously. For the Unabridged project, Merriam is assigning some work in alphabetical order, but it also is adding and updating in several other ways:

1) Revising 1961 definitions that are outdated, in some cases embarrassingly so. Autism, defined in the Third as “absorption in self-centered subjective mental activity,” is now “a developmental disorder” with a detailed clinical explanation. Marriage has a new subsense for same-sex marriage. Menopause is no longer “called also change of life.” With Release 4, the first sense of homosexuality will change from “atypical sexuality” to “sexual attraction or the tendency to direct sexual desire toward another of the same sex.” Other constant targets: entries in the Third containing language that might offend—from the use of he, him, his, and himself as gender-neutral pronouns to the inclusion of words like Negroid, half-breed, and mongoloid in definitions or example quotations.

2) Redoing groups of words by subject matter. When I visited, Mark Stevens, who heads Merriam’s general reference publishing (everything that isn’t a strict dictionary) but who has an expertise in music, had completed scores of jazz, rock, ethnic, and classical music terms, and was plowing through contemporary pop. Associate editor and general definer Kory Stamper has been revising religious terms on and off for three years.

3) Editing entries randomly, even if it means falling down lexicographic rabbit holes. Senior editor for life science Joan Narmontas came across arborvitae, thought it needed work, and wound up tinkering with 81 additional trees and shrubs, from bog pine to cryptomeria to thuja. “I knew I shouldn’t be doing it,” she says. “But I also knew if I didn’t do it now, I would never do it.” It took her about a week to spruce up all that flora.

4) Adding the new words. Merriam is including about 100 of them per release. New words are central to any dictionary revision because they reflect changes in the language and because they generate news stories that lead to page views and subscriptions. People want to know whether ollie or booty call have made “the dictionary.” (In the case of the online Unabridged, yes, they have.)

5) Importing entries and senses from five decades of annual updates to the Collegiate dictionary. There are thousands of them—the editors aren’t even sure how many. Merriam is doing similar work with other sources like its Medical Dictionary. Some recent additions from the Collegiate: catnapper, debeard, deep throat, eureka moment. And from the Medical: caffeinism, essential tremor, forensic odontology, music therapy.

6) Revising definitions of the most looked-up words. Merriam keeps track of every word looked up on the free online dictionary, which is based on the 11th edition of the Collegiate, published in 2003. In 2010, the company began posting the Top 25 daily, weekly, and all-time lookups. The most in-demand words tend to have meanings that are complicated, nuanced, or misunderstood. According to Merriam's spreadsheet tally, pragmatic is No. 1 since 2010, with almost 30 percent more hits than second-place disposition.

The problems with the reader favorites are typical of many definitions in the Third: misleading orders of meaning, old-fashioned language, dated or inadequate examples, the absence of senses and uses that have gained currency in the past 50 years. For instance, the entry for irony, which is No. 8 on the all-time list, placed the little-used adjective form—“made or consisting of iron”— ahead of the familiar noun. That’s been changed. Jazz, on the other hand, hasn’t been updated yet. The first noun sense in the 1961 definition isn’t the style of music, it’s “vulgar: COPULATION.” That’s because the Third listed senses in what’s known as historical order, or the order in which they first appear in print, from oldest to newest. Readers, however, typically want to see the most common meaning of a word first, and that’s how senses are listed now. Merriam hopes to redefine the Top 1,000 lookups by the end of 2015.

Brewster spent a year overhauling a dozen of the top words, including pragmatic and disposition plus affect and effect, didactic, ubiquitous, conundrum, holistic, insidious, integrity, hypocrite, and love. Then she moved on to the New Words. In the months since my visit, her work has became more structured, with weekly deadlines for words starting with specific letters: H the week ending Nov. 14; then M, N, and O; then I, P, and Q; then R, S, and T; then J and K; then U through Z plus A; then B and L; then D through G; then—wrapping up her work on Release 4 last week—C, including clickbait and candy corn.

She has produced anywhere from five to 25 definitions a week. Twerking is ready, she tells me in December. Nutjob (which dates to 1959) and minorly are good to go. Jeggings is, too. The new sense of trolling looks promising, she says, “but first I have to finish hot mess.” Brewster is very excited about hot mess. Thanks to Google Books, she found it in a machinists union trade journal from 1899: “If the newspaper says the sky is painted with green chalk that is what goes. Verily, I say unto you, the public is a hot mess.”

Upcycle is also done. Brewster eschews “upward trend in business activity” for “a cycle or part of a cycle marked by growth, increase, or improvement.” Alas, my “financial instruments” wording doesn’t make it. Too narrow. Instead, Brewster offers three broader examples: “a period of economic growth,” “a period during which something (such as a rate, price, or stock value) increases,” and “a period of increased or increasing success, popularity, or availability.” The last one includes a quotation about an up cycle for Russian figure skating.

The updating of Webster’s Third is a big deal because Webster’s Third was a big deal. The Third—which is actually the eighth Merriam unabridged volume in a straight line back to the American Dictionary; it’s the third one containing the word “new”—followed the company tradition of revising its biggest book once a generation. But it represented a massive departure in substance and style from Webster’s Second, which was published in 1934. That upheaval provoked a controversy unlike any before in lexicography.

The Second was what one Merriam editor calls the Internet of its time: 3,350 pages long, with more than 600,000 main entries, including proper nouns, and hundreds of pages of biographical, geographical, and literary appendices and other encyclopedic matter. The Second was designed to be a single-source reference for the educated classes and an aspirational text for the masses. It contained long lists of popes and dukes, and hundreds of illustrative quotations from the Bible, Shakespeare, and Dickens. But there was no mention of Mae West, Eugene O’Neill, or Babe Ruth. Popular culture, a term dating to the 19th century, was considered too unrefined for such a serious work. The Second was also priggishly didactic, prescribing what its ivory tower editors and consultants considered “proper” language, and brusquely dismissing usage that was, as its labels declared, incorrect, improper, or illiterate.

The editor of the Third, Philip B. Gove, imposed what he saw as logistically, culturally, and lexicographically necessary changes. The Second couldn’t get any bigger, so Gove eliminated almost all of the “nonlexical” encyclopedic matter, plus tens of thousands of main entries, to make space for new terms—words that originated from world war and the Cold War, technology, science, sports, politics, and, yes, pop culture.

Schooled in the modern field of structural linguistics, Gove believed that speech should guide usage, that rules obscured the reality of how language is used, and that dictionaries should describe rather than prescribe usage. He replaced the Second’s subjective usage labels with standard and nonstandard classifications and cut back on the application of the label slang. The idea was to let notes and quotations help to illustrate a word’s usage. Gove didn’t want to depend on the writing and speech of dead white males alone to do that. He quoted Mickey Spillane and Ethel Merman, the latter to illustrate (quite nicely) one sense of the transitive verb form of drain: “Matinee days are tough; two shows a day drain a girl.”

The Third triggered a full-on culture war, one that began with a poorly written Merriam press release touting the appearance of ain’t in the dictionary. Ain’t actually was in the Second, but Gove, and in some instances the Third itself, did a lousy job of explaining that the inclusion of a word—that is, an acknowledgment of its existence—did not amount to an endorsement of how it was used in speech or writing. The floodgates opened. Critics liberal and conservative alike attacked the Third as an assault on proper English, an air-raid siren of social and linguistic decay.

In a January 1962 Atlantic essay titled “Sabotage in Springfield,” Wilson Follett called the Third a “shock,” “a scandal and a disaster,” and “in many crucial particulars a very great calamity.” (Follett was appalled that hepcat was added while the names of the apostles were deleted.) The New York Times fulminated a dozen or so times against the book’s “permissiveness” and “informality”; in one editorial, it mockingly used unlabeled slang from the Third to urge the “passel of double-domes at the G. & C. Merriam Company joint” to stop the presses and start over. Writing in The New Yorker in March 1962, Dwight Macdonald compared the Third to the end of the world, and not entirely figuratively.

“Philip Gove called the English language ‘an instrument of the people.’ He said Webster’s Third should have ‘no traffic’ with artificial distinctions of correctness in the language,” David Skinner, the author of The Story of Ain’t: America, Its Language, and the Most Controversial Dictionary Ever Published, said on Slate’s language podcast Lexicon Valley in 2012. “These are fighting words. It’s like he was taking an ax to Webster’s Second. It’s like he was taking an ax to a stack of classroom textbooks that since the 18th century had been upholding the rules on ‘shall’ and ‘will’ and ‘imply’ and ‘infer’ and hundreds of other subtle linguistic niceties.”

Maybe the uproar reflected post-Sputnik insecurity about America’s place in the world, or worries that racial tensions and longhaired beatniks (a new word, defined fantastically, in the Third) would topple the old order. Or maybe it reflected concern that established institutions—Merriam-Webster among them—could no longer be trusted. Whatever its origins, the furor was mostly misguided. Critics frequently decried words and usages from the Third that were also in the Second. The debate continued in popular media for years, even into the new century. (In a 2001 cri de coeur in Harper’s against what he viewed as permissive usage, David Foster Wallace attacked the “notoriously liberal” Third—and repeated mistakes made by Gove’s critics. The Third, Foster Wallace wrote, “included such terms as heighth and irregardless without any monitory labels on them.” The former was labeled chiefly dialectal and the latter nonstandard.)

According to Herbert C. Morton’s The Story of Webster’s Third: Philip Gove’s Controversial Dictionary and Its Critics, Time Inc. became interested in buying Merriam and republishing the Third, marking with an asterisk words it considered objectionable. The book and magazine publisher American Heritage mounted a hostile takeover attempt, its president telling the New York Herald-Tribune that the Third was an “affront” to “sound scholarly principles.” That failed; in 1964, Merriam was acquired in a friendly deal by its current owner, Encyclopaedia Britannica. So American Heritage created a competing big book, 1969’s American Heritage Dictionary of the English Language, complete with a usage panel of more than 100 editors, journalists, and professors, including Wilson Follett, Dwight Macdonald, and other fervent critics of the Third. Random House also published an unabridged dictionary, in 1966.

But for all the outrage and umbrage, the Third didn’t cause Merriam to topple from its perch atop American lexicography. The book sold well, and lexicographers and linguists mostly praised it. Gove, however, was “deeply troubled and hurt,” Morton writes, by the vicious attacks, and he spent the next decade at Merriam “doggedly defending his principles and explaining his work.” But he also kept doing nuts-and-bolts lexicography, writing definitions of scatological and racially insensitive words (over Gove’s objections, Merriam’s president had removed fuck from the galleys of the Third) and helping with early planning for the inevitable Fourth. Gove died in 1972 at age 70.

Every editorial decision Gove made was dictated by space: the need to create as much of it as possible so he could cram new words into the finite boundaries of the printed book. Space, or the limited amount of it, is why dictionaries employ all sorts of symbols and abbreviations. Gove claimed he saved 80 pages in the Third by using fewer commas.

He also decreed that individual senses of words wouldn’t get separate lines. Neither would quotations; Ethel Merman’s description of her work was tacked on to the end of sense 2b(3) of drain. Most dramatically, Gove redefined Merriam’s defining style, banning the use of complete sentences. As lexicographer Ben Zimmer puts it, Gove-style definitions start “with a general category (or genus) followed by various distinguishing features (or differentiae),” in which commas are permitted only to separate items in a series. The resulting single statement must be replaceable—that is, it can be plugged into a sentence in place of the word it’s defining. An airplane, for instance, was defined by the Third as “a fixed-wing aircraft heavier than air” (genus) “that is driven by a screw propeller or by a high-velocity jet and supported by the dynamic reaction of the air against its wings” (differentiae).

Gove wanted to save space, paradoxically, so his dictionary could be as expansive as possible. Sometimes, the results were comical. Two of the Third’s most frequently mocked definitions are door, at 72 words, and weighing in at 91, hotel:

a building of many rooms chiefly for overnight accommodation of transients and several floors served by elevators, usually with a large open street-level lobby containing easy chairs, with a variety of compartments for eating, drinking, dancing, exhibitions, and group meetings (as of salesmen or convention attendants), with shops having both inside and street-side entrances and offering for sale items (as clothes, gifts, candy, theater tickets, travel tickets) of particular interest to a traveler, or providing personal services (as hairdressing, shoe shining), and with telephone booths, writing tables and washrooms freely available

Salesmen! Travel tickets! Shoe shining! The entry for oxygen is an even-more absurd 192 words long: “a nonmetallic chiefly bivalent element that is normally a colorless odorless tasteless nonflammable diatomic gas slightly soluble in water, that is the most abundant of the elements on earth occurring uncombined in air to the extent …” Take a breath and chuckle. But Merriam’s Stamper says that oxygen helped her get inside the minds of her defining predecessors.

“Before I started working on the Unabridged, I was very quick to pooh-pooh that [defining style] as, like, ‘This person is so full of themselves. They think they know everything about oxygen and they’re going to say everything they know about oxygen,’ ” Stamper tells me. “So I read something like oxygen or hotel and they’re still pretty risible, they’re pretty laughable. … But I understand why the definer got to a point in looking at all of the evidence for oxygen, that they felt like, ‘If this is unabridged, then, goddamn it, it’s going to be unabridged.’ ”

Online, there’s no imperative to abridge. Editors aren’t counting characters to make a definition fit on a printed page, and they’re not sending words to the lexicographic guillotine to make room for a new crop of entries. Book production constraints forcibly narrow the focus of definitions and, Merriam president John Morse says, “deliberately leave some aspect of a word’s character unacknowledged.” Online, for the first time in the history of lexicography, that’s not necessary. “You really try to say, ‘Here is the space and interest to tell this story in the fullest possible way,’ ” Morse says. “From a definer’s point of view, this is the most rewarding kind of defining you can do. You’re not self-censoring with every word you write down.”

At the same time, definers like Stamper and Brewster must find the balance between telling the fullest story and deciding what’s useful to or necessary for the average reader. Are 192-word definitions OK? How many quotations should illustrate a meaning? How long should etymological and usage notes run? Do words need to percolate for years—at Merriam, the waiting period was often a decade or longer, as evidence slowly accumulated in the citation files—before they are granted admission?

“Obviously, it’s very liberating,” says Perrault, the director of defining. “We can add lots of good stuff now and not worry about it. But we do have to remind ourselves that there’s still value in conciseness and that maybe people looking up a word aren’t really interested in seeing eight quotations for that word.”

Consider god. In the Third, god was a bit of a hot mess. There were separate entries for god and God, and they were long, cluttered, missing key nonreligious meanings, and written with a Christian bias. Stamper sifted through 20,000 citations, plus countless articles, books, and reference works, and whittled god into a compact yet intellectually thorough five senses and 15 subsenses that include quotations from the Bible, Ulysses, Sermons of a Buddhist Abbot, and Publishers Weekly, and this example sentence: “I wish to God you’d shut up.” It took her four months. “I’m really proud of what I did, but I’m also terrified of the entry,” she says. God tends to provoke strong reactions. (In December, Stamper talked to Slate’s David Plotz about what it’s like to work as a lexicographer.)

The Third’s rigid rules and resulting quirks emerged from Gove’s style manual, which is the stuff of lexicographic legend: more than 600 single-spaced pages organized in black, loose-leaf binders known, imposingly, as the Black Books. Merriam didn’t create a contemporary version of the Black Books for the Unabridged update. But Perrault and director of editorial operations Madeline Novak—he was hired in 1979, she was hired in 1980, they were married in 1984—did implement a bunch of changes. Some of those address Govian ticks that have annoyed lexicographers for 50 years, like the Third’s refusal to capitalize any headword (except, against Gove’s wishes, God) and the limited use of descriptive labels like slang.

“People wanted those labels because they wanted the dictionary to tell them that this is not a regular word in English,” Perrault says. “And I think that’s an appropriate expectation.”

That may sound like Merriam is moving closer in philosophy to the critics who assailed Gove’s willingness to let trends in language dictate the usage recommendations, or lack thereof, made by the Third. But it’s really just taking a more common-sense approach to what a general dictionary like the Collegiate or Unabridged is supposed to do: report how a word is used. Explaining in full how a word’s usage came to pass and offering an opinion about that usage is a task for a usage dictionary. “One strives for complete objectivity; the other is predicated on subjectivity,” Stamper says, “whether that subjectivity is informed by the author’s personal peeves or a collected body of evidence showing the ‘best practices’ of English.”

Other changes in the Unabridged reflect the evolution of English. As language and speech have grown more informal since 1961, so has lexicography. Parodic-sounding run-on definitions like hotel and oxygen have given way to simpler prose. (Generally, that is. Door, hotel, and oxygen await revision.) In content and tone, the Third was less stiff and self-important than the Second, but it was still stiff and self-important. The capital-D dictionary was a serious work for serious people who displayed it prominently in their living rooms. Now there’s a dictionary on your phone.

“I always felt it would be good to make the language of the dictionary a little more normal, make it accessible,” Perrault says as he, Novak, and I sit around one of those old wooden desks on Merriam’s first floor. “That was not the focus for W3.”

“It was written to sound like an unabridged dictionary,” Novak says.

“Which in a way is appropriate,” Perrault says. “I feel like there is something you could call ‘the voice of the dictionary.’ When you write a definition, you can’t let yourself go—you’re expressing yourself as ‘the dictionary.’ There’s a certain formality to it. It’s a little Jeevesian, you might say. ‘What does this mean?’ ‘It means this, sir.’ ”

With this new revision, Perrault says, Merriam’s goal is to maintain the voice of the dictionary while changing its tone slightly, making definitions more straightforward. Other lexicographers, though, think this more laid-back version of traditional defining is still too stodgy. “Even though space on the Web is infinite, we’re still focused on the literary form that is the dictionary definition,” says Erin McKean, founder of the online dictionary Wordnik. “You can write villanelles with more ease than you can write Merriam definitions.”

Even if the definitions do feel like “definitions,” it’s clear that Merriam’s revised entries represent a new storytelling form. Accompanied by historical notes, the meatier definitions feel less like brief explanations than discrete biographies. There used to be a couple of lines of explanatory matter—an etymology, the occasional note on usage. Now that material is written, and in complete sentences. These notes can be blindingly scholarly, filled with odd symbols and diacritical marks, like the entry for the combining form blephar-:

borrowed from Greek, from
blépharon “eyelid,” probably going back to a derivative from the base of
blépein “to see”

Eric Hamp (in
Glotta, vol. 72 [1994], p. 15) suggests
*gʷlep-H-ro- from the base
*gʷlep- (whence
blépein). The variants in initial
gl- found in Doric—
glépharon for
blépharon—are explained by Hamp as outcomes of word-initial
*gʷl- with syllabification of the
-l-, yielding
*gul-, reduced by analogy to
*gl- (see his earlier article “Notes on Early Greek Phonology,”
Glotta, vol. 38 [1960], p. 202). The aspirate in
blépharon, according to Hamp, would be parallel to
kephalḗ “head” from
*kep-h₂
-l-. Alternatively, Robert Beekes sees the
g-/b- alternation as a sign of pre-Greek substratum, citing Edzard Furnée,
Die wichtigsten konsonantischen Erscheinungen des Vorgriechischen (Mouton, 1972), p. 389—though both Beekes and Furnée observe that the evidence for this particular alternation is exiguous.

They can also be playful, like the entry for bippy:

probably originally a nonsense word, used in the phrase “You bet your sweet bippy!”, denoting an unspecified body part

The line “You bet your (sweet) bippy!” was popularized in the American television show
Rowan & Martin’s Laugh-In, which ran from January 1968 to March 1973. George Schlatter, the executive producer of the show, said the following about the word: “Our shows are gone through quite thoroughly for taste. What upsets most of the critics are the jokes they don’t understand, and that’s more of an educational problem than a taste problem. We say things like ‘You bet your bippy!’ or ‘You bet your nurdle!’ I’m sure some people attach a dirty connotation to those words. We don’t even know what they mean; they’re just funny” (quoted in Joan Barthel, “Hilarious, Brash, Flat, Peppery, Repetitious, Topical and in Borderline Taste,”
New York Times Magazine, 6 Oct. 1968). The hypothesis that the word was borrowed from Yiddish
pipik/pupik “navel” has not been confirmed.

Or the one for asshat:

The seemingly nonsensical linking of
ass and
hat has a curious earlier history as a sort of cultural meme. Examples of the linkage can be found in dialogue lines from recent films: “Anyone found bipedal in five wears his ass for a hat!” (addressed to the employees of a bank as the robbers leave,
Raising Arizona, 1987, script by Ethan and Joel Coen); “I like your ass. Can I wear it as a hat?” (a character’s parody of a flirtatious advance,
City Slickers, 1991, script by Lowell Ganz and Babaloo Mandel). Of more immediate etymological relevance may be this dialogue sequence from the television series
That ’70’s Show: “RED: Eric, if you don’t want to wear your ass for a hat, you'll get up here, pronto! DONNA: You better go. You know how that ass-hat screws up your hair” (“Red Fired Up,” Episode 24 of Season 2, script by Dave Schiff, first aired May 8, 2000). The current meaning of asshat may be a reanalysis, perhaps in part based on the expression “have one’s head up one’s ass” (meaning “to be obtuse, be insufficiently conscious of one’s surroundings”), perhaps in part due to simple phonetic similarity to
asshole. A more precise history will depend on the location of further attestations.

“I make it sound very scholarly,” says Merriam etymologist Jim Rader, a onetime Slavic linguistics graduate student who writes the historical notes. “I want to be very scholarly about a very ridiculous word.” He says that adding detailed dives into etymologically interesting words wasn’t part of the Unabridged master plan, but the luxury of space has made it possible. “I just figured I’m writing this stuff down anyway, so why not put it in the dictionary?”

Webster’s Fourth New International Dictionary, Unabridged was supposed to be on bookshelves two decades ago. In February 1988, Merriam president and publisher William Llewellyn wrote a 70-page memo—labeled C-O-N-F-I-D-E-N-T-I-A-L—detailing plans for the new edition. The Fourth would not radically depart from the Third the way the Third had departed from the Second, he said. But there would be tweaks and pruning, especially of “poorly attested entries” such as impuberty and impulsor. The Fourth would add 50,000 new terms, from bodice ripper to minivan to sabermetrics, bringing the total to half a million and growing the book by 300 pages.

The revision, Llewellyn predicted, would take eight years and cost nearly $7 million, twice as much as it cost to publish the Third. In its first two years, the Fourth would generate an annual profit of $2.8 million on sales of 40,000 units, and about $900,000 a year after that. It wasn’t “an investment to warm the heart of a Harvard M.B.A. perhaps,” Llewellyn wrote. But it was “an opportunity and a responsibility” essential to Merriam’s history and mission. The unabridged dictionary “is what makes Merriam Merriam and is what fosters the sales success of our other books. Without it we will quickly be just another dictionary publisher and before long, not even the biggest one.”

The action, however, was in the lucrative market for smaller, cheaper college dictionaries. At the time Llewellyn wrote his memo, Merriam was moving more than a million copies a year of the ninth edition of the Collegiate, which was published in 1983. By the fall of 1988, it had been on the New York Times best-seller list for 155 weeks and counting. But competition was mounting. Simon & Schuster, Houghton Mifflin, and Random House were preparing new college dictionaries of their own.

Random House added the word “Webster’s” (which has been a generic synonym for “dictionary” for more than a century) and mimicked Merriam’s distinctive red jacket design; Merriam sued and won, but a federal appeals court overturned the jury verdict and $4 million award. After Merriam published the 10th edition of the Collegiate in 1993, rivals sniped at the exclusion of words like cyberpunk and mountain bike. Merriam countered that it alone offered radwaste and ranch dressing. But the “word wars” were eating at the company’s market share and its cachet. When, as Morse recalls, a book buyer for Sam’s Club told a Merriam sales rep that “any old red Webster’s will do,” it was a sign of the times, and it hurt.

In this aggressively competitive marketplace, Merriam doubled down on front-list titles that would sell. Updating the Collegiate required returning to the citation files and re-examining and often revising every element of all 150,000-plus entries, a project that engaged the entire staff and necessitated about 1 million editorial decisions. The company produced bilingual dictionaries. It entered the burgeoning English language learner’s market, which had been dominated by the British dictionary-makers Oxford, Collins, and Cambridge. Merriam’s Advanced Learner’s English Dictionary, published in 2008, involved a decade of labor.

Plans for the Fourth were tabled again and again. Still, the big book, or at least the idea of the big book—the single, grand, definitive authority on American English, the dictionary that distinguished Merriam from its rivals—remained ingrained in the company’s DNA. The project became Merriam’s white whale or, better yet, its Elvis, rumored to be alive, occasionally even spotted inside the building. After Llewellyn’s memo, the staff spent a year “subject coding” the Third, examining every sense of every entry and applying, where appropriate, a subject—law, medicine, architecture, music, etc. Work began on a style manual, with rules governing parts of speech, order of entry, variant spellings, run-ons, boldface colons, cross references, inflected forms, functional labels, pronunciation, etymology, dating, usage labels, usage notes, illustrative quotations, and more. But other projects intervened and the manual was never finished.

During my visit to Springfield, I tell Morse that Gove must have assumed the Third would be updated in 30 or 40 years, if not sooner. “I’m quite sure that he thought that,” Morse says. “And I think that probably a number of editors were thinking exactly that. And I suspect that our slowness in getting to this has probably been a source of disappointment and frustration: ‘Why couldn’t you have done more and done it earlier?’ ” Morse says the answer is that lexicography is “the art of the possible.” So many words, so little time.

Like many Merriam employees, the 63-year-old Morse is a company lifer. He was hired in 1980 after getting a master’s in English language and literature from the University of Chicago. His academic work had nothing to do with lexicography. (Few Merriam staffers arrive having dreamed of a career defining words. The place just fits bookish people with degrees in literature or linguistics who don’t want to be academics.) Morse wrote definitions for the 1983 Collegiate, directed publication of a geographical dictionary, and helped convert the Third to a digital format. When he was named president and publisher in 1997, he became only the second lexicographer since Noah Webster to hold the top business job.

The assumption inside Merriam at the time was that the company would publish another fat unabridged dictionary, maybe with a CD-ROM stuck in a flap on the inside of the front cover. That’s what drove the perceived need for a Govian style manual, and also ratcheted up staffers’ worries. If sales of Merriam’s most popular title, the Collegiate, were on the decline, how could the company justify investing the time and money required to create a Fourth?

Morse had already decided that the company couldn’t—but also that a physical book no longer had to be the Holy Grail. In 2002, Merriam published one more addendum to the Third with entries—propeller head, tree hugger, golden handcuffs—already prepared for the new Collegiate edition the following year. But Morse decided that that would be it for printing-press revisions to the Third. The future of the Unabridged would be digital. As the decade rolled on, big projects like the learner’s dictionary were completed. The online Unabridged—which from its debut has cost $29.95 a year—began attracting subscribers. Morse scrapped the next all-consuming Collegiate update. Web advertising, subscriptions, and the backlist were paying the bills.

In early 2009, Morse told the editorial staff that the company was embarking on a full update of the Third. It wouldn’t be a Fourth. It would happen on the website only. It would be called the Unabridged. There was disbelief, relief, thrill. Finally. “This was the thing we had been talking about my entire career at this company,” says Daniel Brandon, a physical science editor. Chris Connor, a life science editor, says that, “for a number of us, W4 was just this rumor. It was a signpost on the horizon.”

After some surprise that the staff’s efforts would not generate an enormous bound volume, the overwhelming scope of the project quickly set in. “Everyone was like, ‘Oh, this is gonna be a lot of work,’ ” Stamper says.

In 2010, a group of scholars at Harvard University and MIT performed a quantitative analysis of Google’s then-new database of 5.2 million digitized books, which included 361 billion words in English. They estimated that, as of 2000, the English lexicon included more than 1 million unique words, and compared that corpus with the words in the Unabridged and the American Heritage Dictionary. In a paper published in Science, the researchers concluded that “52 percent of the English lexicon—the majority of the words used in English books—consists of lexical ‘dark matter’ undocumented in standard references.”

One obvious takeaway is that the breadth of the English language is greater than any dictionary. Another is that data can help lexicographers do their jobs better. The Google Books Ngram Viewer—which resulted from the ongoing “Culturomics” project at Harvard and MIT—and other digital corpuses can detect “low-frequency words” lexicographers might be missing. Such resources can also produce a more accurate assessment of current usage, “to reduce the lag between changes in the lexicon and changes in the dictionary,” the researchers wrote.

These findings raise some existential questions for dictionary-makers. In the Internet age, what’s the point of selective lexicography? Do we really need an Unabridged or OED to tell us whether a word is a word? Or is our new ability to type a string of letters into a search engine and instantly see how often and in what context it’s been used an adequate substitute for a dictionary? Absent the old space constraints, why should Merriam or anyone else get to pin a ribbon on a word and welcome it to the club?

A traditional lexicographer doesn’t catalog every word known to humankind. The Unabridged is in fact a very much abridged compilation of the English language—it’s just not quite as abridged as other dictionaries. What a lexicographer does, then, is decide whether a word has become established in the language, determine how its use has evolved, and explain that to readers. “Anybody on the Internet can write a definition of anything and put it up there,” Perrault says. “But I think most people want to see what ‘the dictionary’ says. That still exists, and it’s good for us because it means people appreciate what we do, that there are people who have expertise.”

Merriam’s expertise has for generations been grounded in its voluminous citation files, and in the ability of its trained editors to interpret the information contained therein. Merriam is still old school about collecting “cits.” Editors spend an hour or so a day “reading and marking”: reading books, magazines, newspapers, trade journals, websites, catalogs, cereal boxes—almost any published matter—and marking noteworthy usage. The printed material is stacked on shelves in a back corner of the second floor, and typists enter the cits into the electronic files. (The paper cit files, which date to the late 1800s, are a lexical treasure trove: yellowed newspaper clippings, advertisements and cartoons, handwritten comments about definitions and proposed changes, each slip stamped in purple ink with an editor’s name and the date.)

Gove and his staff didn’t have access to databases that could support or refute what a handful of citations told them. Given the hit or miss nature of reading and marking, their judgments were remarkably good but far more speculative than ones being made today. “The smaller the sample size, the more credence you give to everything you see,” says Katherine Connor Martin, the head of U.S. dictionaries for Oxford University Press in New York.

The slow accumulation of cits created a culture of deliberateness. It took blog six years to travel from coinage in 1999 to admission into Merriam’s Collegiate. That was considered fast. Now, as the Culturomics researchers posited, the proliferation of electronic databases and antsy online audiences are encouraging lexicographers to move faster still. “We’re probably at the level that if some prominent new word comes into being and goes viral, we don’t want to have wait two or three years to put it in the dictionary,” Perrault says.

Perrault estimates that about 90 percent of the words in the New Words spreadsheet will eventually make the Unabridged. Scanning the list I notice upfake, a basketball term for a player feigning a shooting motion. It doesn’t register on the Ngram Viewer. Googling yields a few hits for instructional drills and videos. ESPN columnist Bill Simmons used it in 2006. (“Finally, with the clock winding down, he puts a quick move on Kaman, upfakes him, and drains a 16-footer to win the game.”) The New York Times hyphenated it in a live blog in 2010. But that’s about everything.

Since the New Words list is just a place for editors to track the progress of a word’s use, there’s no guarantee upfake will ever be entered in a Merriam dictionary. (It doesn’t make Release 4.) I ask Perrault and Novak why, in the bottomless well of the online dictionary, it shouldn’t.

“If upfake were used enough, it would have been in,” Novak says.

“Our only concern always is to accurately record the language,” Perrault says. “That’s still the case even if it happens more quickly than it used to. We’re still not going to stick things in there just because it’s this week’s word. … The standard doesn’t change.”

If this sounds like a contradictory message—that Merriam wants to be nimbler about acknowledging and admitting words (less than two or three years) but also wants to defend the standard that has defined and elevated the company for two centuries (upfake needs to wait)—that’s because it is. The challenge for Merriam is finding the sweet spot between Noah Webster and the Internet, between a strict single standard and a lexicographic free-for-all.

That latter category might include Urban Dictionary and Wiktionary, which are examples of crowdsourced lexicography, or ordinary people defining words themselves. Traditional dictionary-makers have jumped into this sandbox because the public expects it and because it can be lexicographically fruitful. The British publisher Collins is entering reader-suggested words into its online dictionaries. Merriam isn’t going that far, but it is posting reader submissions on its Open Dictionary page—planking, kidult, conflict mineral—and adding some of them to the New Words queue.

(Soliciting help from readers isn’t a new phenomenon in dictionary-making. In 1879, the first editor of the OED, James Murray, appealed to “the English-speaking and English-reading public to read books and make extracts for” his planned dictionary. The “madman” in Simon Winchester’s The Professor and the Madman, about the making of the OED, was one such contributor.)

Nevertheless, in an age when traditional lexicography might feel like a dying art, democratization is still provoking some anxiety. “Is this kind of crowdsourcing a worthwhile endeavor for dictionary-makers, beyond providing valuable publicity for publishers facing a tough consumer market?” Ben Zimmer asks in an article about the future of online lexicography in the December 2014 issue of Dictionaries: Journal of the Dictionary Society of North America. “Or could the reliance on the wisdom of the crowds end up diluting the authority that the leading print dictionaries have traditionally held?”

Probably both. The vibrancy of online wordsmithing suggests that the concept of the dictionary is changing, as is our sense of who should decide what’s in it. That’s fine with lexicographers like McKean of Wordnik, which searches corpuses containing billions of words, including traditional and user-generated dictionaries. McKean comes from mainstream lexicography; she was Oxford’s editor-in-chief of U.S. dictionaries. But she now argues that conventional dictionaries are good mostly for telling readers what words are in conventional dictionaries—that is, for validating the way people tend to think about the acceptability of words—and that they don’t do enough to reveal the munificent glories of an ever-changing language.

“More people are coming around to the idea that the dictionary is the convenience store of words,” she says. “And even the Unabridged is the convenience store of dictionaries.”

Just the day before, McKean tells me, she spotted a word in the New York Times that she had never seen before: tsukuroi, defined by reporter Alice Rawsthorn as “the art of repair.” Wordnik’s search engine didn’t pick it up. So McKean posted a comment and a link, noting its existence in print in English. When I told Brewster at Merriam about the exchange, she added tsukuroi to the New Words spreadsheet. Time will tell whether, after mo