2016-12-14

When I started researching my family tree more than thirty years ago, I purchased a paper reprint of a genealogy book first published in 1920: The Harmon Genealogy, comprising all branches in New England written by Artemas C. Harmon. The book mentions my great-grandmother, Lucy Harmon, and documents her Harmon ancestry back to 1667. It is a wonderful resource, and I have referred to this book often over the years.



I paid more than $100 for this reprinted book many years ago. Today I found the same book online. The cost is ZERO. I can download the entire book to my hard drive or to a jump drive or save it to an online storage service. I can print one page, multiple pages, or even the entire book. Even better, I can electronically search the entire book within seconds for any word or phrase. Not only can I search for names, but I can also search for towns, dates, occupations, or any other words of interest. Try doing that with a printed book!

The Internet Archive, also known as “The Wayback Machine,” is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format.

The Internet Archive is well known for storing terabytes of old web pages. However, the organization has also expanded its role to digitize and store all sorts of public domain material, including old books, movies, audio recordings, radio shows, and more. I have also found a few modern books on The Internet Archive that were legally contributed by the copyright holders themselves.

The site’s Text Archive contains a wide range of fiction, popular books, children’s books, historical texts and academic books. The list includes genealogy books as well. The Internet Archive is working with several sponsoring libraries to digitize the contents of their holdings. In addition, private individuals are invited to scan the public domain books in their personal libraries and upload them as well. (See http://www.archive.org/about/faqs.php#195 for information about contributing your books.)

The result is a huge resource of books in TXT, PDF, and other formats, books that you can download to your computer, save, and then search for any word. The same books are also visible to Google and other search engines, including online every-word searches.



Above: “A collection of upwards of thirty thousand names of German, Swiss, Dutch, French and other immigrants in Pennsylvania from 1727 to 1776 : with a statement of the names of ships, whence they sailed, and the date of their arrival at Philadelphia, chronologically arranged, together with the necessary historical and other notes, also, an appendix containing lists of more than one thousand German and French names in New York prior to 1712”

The PDF versions contain images of every page in the original books. That makes them easy to read. I prefer to look at PDF versions of a book whenever possible. However, searching the PDF versions electronically does not work very well.



You should be aware of a couple shortcomings of books converted to plain text, however. First, the TXT files have lost the formatting of the original books; there is no bold or italics or underlining since such formatting is not supported by TXT formatting. In addition, paragraph indentations and other “spacing” often is lost.

Secondly, many of the books available were converted to TXT format by OCR software. OCR never converts all words perfectly; so, you can expect to find numerous OCR errors in these documents. For instance, “The Harmon Genealogy, comprising all branches in New England” has some words mis-scanned, and many dates have errors in them. The common one was substituting the letter “I” in place of the number one, such as “I920” instead of “1920.” This will cause difficulty if you are electronically searching for specific words or numbers.

The Internet Archive presently digitizes more than 1,000 books a day and presently has more than 11 million “texts” (books and other printed material) available online. There is also a collection of 300,000 modern eBooks that may be borrowed or downloaded by the print-disabled at OpenLibrary.org. If you do not find what you want today, come back in a few months and try again. It may have been added by then.

Of course, the Internet Archive is not the only source of digitized books. In fact, Google Books is a well-known source of digitized books. Operated by a well-funded commercial company, Google Books gets most of the publicity. However, with commercial ownership come proprietary business methods. Google Books has almost stopped adding new books to the collection. New additions have slowed to a trickle. However, all books previously digitized remain available online at http://books.google.com.

The Internet Archive also provides most books in http, EPUB, Kindle, Daisy, and DjVu formats in addition to TXT and PDF. As a result, the books and other documents can be read on almost any ebook reader as well as on computers, iPads, and most cell phones that have web browsers.

The Internet Archive does not yet have all the genealogy books ever published. In fact, nobody seems to know how many genealogy books are available this way. Even the folks at The Internet Archive don’t know. They simply scan everything they can find and don’t worry much about classifying the topics. However, it is known that the Archive’s ever-expanding collection of genealogy resources includes items from the Allen County Public Library Genealogy Center in Fort Wayne, Indiana; Robarts Library at the University of Toronto; the University of Illinois Urbana-Champaign Library; Brigham Young University in Provo, Utah; the National Library of Scotland; the Indianapolis City Library’s Indianapolis City Directory and Yearbooks Collection; The Leo Baeck Institute Archives of German-speaking Jewry Leo Baeck Institute Archives; and the Boston Public Library.

Resources include among many things books on surname origins, vital statistics, parish records, census records, passenger lists of vessels, and other historical and biographical documents, as well as individual volumes contributed by thousands of users from around the world. Most of the genealogy books are published in English but there are numerous exceptions.

I searched the “Texts” section of The Internet Archive for the word “genealogy” and found 128,198 results. By searching in “Texts,” I was able to ignore the “hits” found on the Internet and in other sources. That’s not a definitive answer as the word “genealogy” obviously will exist more than once in many books. However, it does provide a rough idea of the popularity of the word in The Internet Archives’ books, magazines, and other texts. Whatever the true number, there must be thousands of genealogy books available today on The Internet Archive, and the number is growing rapidly.

The Internet Archive also has scanned and digitized the U.S. Census records from 1790 through 1930. Unlike the commercial providers of census data, the versions provided by The Internet Archive have not been indexed. They are useful only if you already know where to look for your ancestors. Small towns can easily be searched one page at a time while cities probably are best searched if you already know the enumeration districts involved.

Also unlike the commercial providers of census data, the census information on The Internet Archive is available free of charge to everyone. However, the Internet Archive version has not been indexed.

In fact, everything on The Internet Archive is free. There is never a charge for anything on The Internet Archive. As a non-profit, however, the organization does accept donations which are tax-free to Americans.

In a casual search, I found all sorts of material of interest to genealogists on The Internet Archive, including these:

Compiled service records of soldiers who served in the American Army during the Revolutionary war

Polk Lafayette, Indiana, city directory (Volume yr. 1891)

Preakness and the Preakness Reformed church, Passaic County, New Jersey: a history, 1695-1902, with genealogical notes, the records of the church and tombstone inscriptions

The history of ancient Wethersfield, Connecticut: comprising the present towns of Wethersfield, Rocky Hill, and Newington, and of Glastonbury prior to its incorporation in 1693: from date of earliest settlement until the present time (Volume 1,pt.2)

Ziegler Genealogy by John A. M. Ziegler

Genealogy of the Beaudry Family of Northern Ontario and Relatives

Morse genealogy by Morse & Leavett

Genealogy of the Spotswood family in Scotland and Virginia

The Lenher family: a genealogy by Sarah Marion Lenher

The above is only a tiny fraction of the many books available free of charge on The Internet Archive.

The Internet Archive isn’t perfect, but it does provide a great resource for genealogists, historians, and others. If you are looking for information about your family tree, I’d suggest that you check out The Internet Archive at http://www.archive.org. You can read about the Internet Archive’s genealogy collection at https://archive.org/details/genealogy.

If you are interested in The Harmon Genealogy, comprising all branches in New England, go to https://archive.org/details/harmongenealogyc00harm.

Caution: This book is great; but, like most genealogy books, it does contain a few errors. Author Artemas C. Harmon did a very good job of research, but his work was not perfect.

Filed under: Books, Online Sites

Show more