When the Library Is Online: The Future of Knowing

Will libraries and bookstores survive the Internet age? Richard Fernandez examines what the efforts by Google and Microsoft to digitize the world's collection of books mean for readers and whether it is something to be celebrated or feared.

November 27, 2007 - by Richard Fernandez

Support Pajamas Media; Visit Our Advertisers

Anthony Grafton, discussing Future Reading in the New Yorker, describes efforts by Google and Microsoft to digitize the world’s collection of books with a mixture of approval and horror. Grafton loves knowledge and his approval comes from the realization that the Internet can reach an audience even greater than his beloved New York Public Library, which “admitted everyone … not only presentable young scholars … but also many wild figures who haunted the reading rooms.” But he regards with horror the idea that digitized books can ever replace “real” ones. For quality research Grafton believe one still had to return to books and paper. Online research might work for amateurs, but serious scholars and professionals need books.

Sit in your local coffee shop, and your laptop can tell you a lot. If you want deeper, more local knowledge, you will have to take the narrower path that leads between the lions and up the stairs. [a reference to the lion statues at the New York Public Library] … Duguid describes watching a fellow-historian systematically sniff two-hundred-and-fifty-year-old letters in an archive. By detecting the smell of vinegar-which had been sprinkled, in the eighteenth century, on letters from towns struck by cholera, in the hope of disinfecting them-he could trace the history of disease outbreaks. Historians of the book-a new and growing tribe-read books as scouts read trails.

Grafton’s devotion to books — to the continuation of the “old and reassuring story: bookish boy or girl enters the cool, dark library and discovers loneliness and freedom” — obscures the basic fact that today and for all time to come, online information will dominate books. The numbers are unequivocal. The Library of Congress, which is larger than the New York Public Libary, contains about 11 terabytes of information. That’s a huge amount of information. Yet it is dwarfed by the amount of information already accessible online through search engines, about 167 terabytes. This is about fifteen times as much as the Library of Congress, a figure which even Grafton admits is impressive. But the information available through search engines like Google in turn shrinks to a literal dot compared to the material for which no ready directory exists: the so-called Deep Web. Deep Web is that part of the Internet for which there is no street map. The University of California in Berkeley estimates the Deep Web to be 91,000 terabytes in size — 545 times larger than all the material indexed by search engines and 8,150 times larger than the holdings of the Library of Congress. The difference between paper and online holdings is the difference between a small chicken and a fully grown Tyrannosaurus rex. And if Google, Microsoft and others ever finish their plan to migrate books online it will simply mean that the T-Rex has eaten the chicken. Grafton describes the digital migration efforts that are already underway.

Google and Microsoft are flanked by other big efforts. Some are largely philanthropic, like the old standby Project Gutenberg, which provides hand-keyboarded texts of English and American classics, and the distinctive Million Book Project, founded by Raj Reddy, at Carnegie Mellon University. Reddy works with partners around the world to provide, among other things, online texts in many languages for which character-recognition software is not yet available. There are hundreds of smaller efforts in specialized fields-like Perseus, a site, based at Tufts, specializing in Greek and Latin-and new commercial enterprises like Alexander Street Press, which offers libraries beautifully produced collections of everything from Harper’s Weekly to the letters and diaries of American immigrants. It has become impossible for ordinary scholars to keep abreast of what’s available in this age of electronic abundance-though D-Lib Magazine, an online publication, helps by highlighting new digital sources and collections, rather as material libraries used to advertise their acquisition of a writer’s papers or a collection of books with fine bindings.

Books are great, but digital storage is the wave of the future. Yet we cannot see the wave in its entirety. We don’t know where most of that avalanche of knolwedge is and how to easily find it. Most information on the Web is locked up in databases and cannot be “spidered,” a term used to describe the software indexing of Internet material. For example, web pages generated from databases only “exist” when a query is run, like online telephone directories which do not have a separate page for every person in the directory and only create a page in response to a request. Database generated pages have a transient existence and cannot easily be indexed. Password protected websites like locked apartments or private telephone numbers defy our attempts to see within them. Much information lives on the Deep Web. It is there but we cannot see it without taking special steps.

The immense size of the unindexed Internet has motivated consultants and online resources to offer help at finding information in the Deep Web the way traditional librarians guided scholars through the stacks in days gone by.

Modern librarians can spend as much time helping readers with online searches as they do with finding books or paper documents. But despite their best efforts researchers can never be certain whether something they are looking for has been missed. There is no one map which can show where everthing is. The explosive growth of online information may in fact outstrip our ability to catalogue it. The cost of indexing means the picture of what we know will always be out of date or incomplete. Often it will be both. Like some vast terra incognita, the undiscovered country of human knowledge expands constantly, defying even attempts to survey it.

Anthony Grafton described the amazement which author Alfred Kazin felt on entering the New York Public Library. It was a 1938 Aladdin’s Cave which contained “anything I had heard of and wanted to see.” Its books and publications let Kazin ramble through “lonely small towns, prairie villages, isolated colleges, dusty law offices, national magazines, and provincial ‘academies’.” But like any ramble it would be a hit or miss affair, the things that it encountered owing something to chance. Despite today’s technical advances, 21st century researchers are in principle no further ahead than Kazin. The lack of a roadmap means we may often miss what we are looking for, and just as frequently find something even better by accident. Knowledge has always defied efforts at easy unification. And if Google’s digitizing efforts don’t produce what Grafton hoped would be a “universal library” and instead become “a patchwork of interfaces and databases,” there is no help for it. We too will have to ramble. But the sheer growth of online information will definitely mean that the future of reading will refer to reading online.

Through the unknown, unremembered gate
When the last of earth left to discover
Is that which was the beginning;
At the source of the longest river
The voice of the hidden waterfall
And the children in the apple-tree
Not known, because not looked for

- “Little Gidding”, T.S.Eliot

Richard Fernandez is PJM Sydney editor; he also writes at the Belmont Club.

Comment DiggDigg This Delicious del.icio.us Digg Print Digg PJM Home

5 Comments

Windingstad:

Yeah, online libraries are definately the future, although I love venturing the halls of the Oslo library as well.

But it’s just very convenient to search up whatever stuff I need to my next paper on questia.com. Tons of academic literature right there at my disposal with just a click.

Nov 27, 2007 - 1:46 am buddy larsen:

The TS Eliot verse suddenly lighting up an essay on digital storage–what a delight–and what great writing! By Eliot and Fernandez both.

Nov 27, 2007 - 9:16 am Ted:

Isn’t the probative status of images of legal documents an issue? Hack and Photoshop the deed and I own Rockerfeller Center!

What of textual accuracy–Stalin and Big Brother sanitized history. Are not the several digital repositories one must postualte and their backups susceptible to “hacking”? Wikipedia springs to mind.

I am told there exists a document in Moscow which states clearly that the United States of America only leased Alaska. Which is the forgery? Their copy or our copy?

Your software / repository is hack-proof? AHAHAHAHAHAHA!

Regards,
Ted

Nov 27, 2007 - 9:49 am mike:

“Digital migration” has been underway for a long time…copyright remains a significant issue in that process.

But don’t fall into the trap of ‘technological determinism’ here…just because something can be put online does not mean it should be.

Afterall, 80% + of the internet is porn.

Nov 27, 2007 - 11:46 am victor:

An excellent article that accurately summarizes certain developments that are occurring with online libraries. I should know because I am the founder of Bookyards ( http://www.bookyards.com ), a library that has been online for the past eight years and is one of the first to explore the commercial possibilities for libraries on the internet.

The only thing that I can add to the article are the following …. a historical perspective, and pointing out certain trends that are not explained in your article.

When we first started there were only a handful of libraries, with a limited readership. Today, there are over 800 legal libraries (the list is at http://www.bookyards.com/categories.html?type=links&category_id=1522 ), and approximately 150 online “pirate” libraries (online libraries that are providing books without respecting copyright law). Today’s overall readership for online books runs in the millions.

The content that was available 8 years ago was approximately 30,000 online books. Today, I would estimate that there are over a million English books online.

But the problem with today’s online libraries are the following:
(1) the content that is being accumulated, while overwhelming, is primarily material published before 1927 and is out of date. It is not only out of date, but …… and I am being polite ….. quite useless and not worth anyone’s time to look at.
(2) The format’s that are being used are not user friendly. PDF is preferred, but other formats such as DjVu now have been given priority.
(3) For most of these sites (Google and Universal Library) you cannot download the books.
(4) Copyright issues and legal problems have not been resolved, and it will be years before they are. This limits everything.

If this was the case, bookstores and libraries will be safe. But the threat that will be coming to libraries and bookstores will not be the projects that have been outlined by your article, but it will come from offshore online libraries that do not respect copyright laws, have modern books and content (some of them have present day bestsellers), are user friendly (use pdf), and can be downloaded (via through rapidshare and other file transferring services). Gigapedia ( http://gigapedia.org/ ) is a perfect example of this trend….. a Russian online library with thousands of English books that are copyrighted in the West but are available for free on their site.

Like Napster was for music, YouTube and p2p networks were for videos and movies, it is these “pirate” websites that will have a greater impact on books, libraries, and the publishers who produce them than online libraries such as Google, Universal Library, or Project Gutenberg.

Nov 29, 2007 - 10:06 pm

Write a Comment

Name: (required, displayed)
Email: (required, not publicized)
URL: (optional, displayed)
remember personal info?
Comments: