19 posts categorized "Web/Tech"

December 26, 2011

Announcing TL-2 Online

The Smithsonian Libraries is pleased to announce that the online version of Taxonomic Literature II, or TL-2, is now online on the Libraries' website. We are calling this TL-2 Online.

What is TL-2?

TL-2 is an essential tool for Botany research that includes botanists and their publications from 1753 to the present. Comprising fifteen volumes, seven original and eight supplemental, Tl-2 is organized alphabetically by author and includes some biographical information about each author.  The main content for the author entries is the publications that he or she has written. TL-2 was constructed such that each author is assigned a unique abbreviation and each publication a unique number. There are nearly 10,000 authors and over 37,000 publications in TL-2 and the entire set of data is cross-referenced in the two indexes in each of the fifteen volumes.

To put it simply, TL-2 is a database published in the form of a book. Now that the Libraries, with generous permission from the publisher, has digitized and placed the content online the door has been opened to utilize the data from TL-2 in new ways, some of which we haven't even imagined yet. 

What can I do with TL-2?

Currently the website allows you to search TL-2 Online either via a simple keyword search, or a more advanced search on several fields including logical AND and OR operators. Additionally, all volumes of TL-2 may be read online using a simple page-turning application. Finally, in addition to the scan of the page, all pages that contain searchable data are presented with the corrected OCR text that was created during the digitization process. 

Our goal was to construct TL-2 Online using modern web development techniques to minimize page refreshes in order to offer a better experience for readers. The result is that viewing search results and reading the volumes online is very, very fast. 

The data used to create TL-2 Online is also available for download and use by other people and organizations. The download file contains the full corrected text as well as the XML version of the parsed data. Due to the fact that we continue to work on the data and plan to do additional parsing, this data is subject to change and a version number and last modified date are provided for reference.

What else do you have planned?

We're very glad you asked that! As part of the Libraries' website redesign, TL-2 Online will be one of the first components of the new Digital Library that to be presented entirely as Linked Open Data (LOD). Overall, LOD will be integral to the entire Digital Library, but TL-2 will be the first data set that we make available in that manner. The Smithsonian Libraries will aims to become the permanent home for TL-2 and the authority for TL-2 Linked Open Data identifiers. Although LOD is not directly visible to visitors to our site, making it available allows other computers and software to more easily reuse and query the data without extensive programming.

We also plan to continue parsing the data inside TL-2 in order to provide new avenues for using and analyzing the data. Expect the TL-2 Online website to expand to include new downloadable data and new features when the time comes. For example, may botanists contributed specimens to herbaria (libraries of plant specimens) around the world. We would like to present that data in a searchable fashion on the site when the data is ready.

Lastly, a note: Since TL-2 Online was digitized from a printed work, there are bound to be errors in the OCR and places where the parsing was not quite accurate. Although we have minimized many these, there may still be some that exist. Please be patient and feel free to contact us if you'd like to bring anything to our attention.

We hope that botanists around the world continue to use TL-2 and that they find our new online offering even easier to use than the printed work. 

December 14, 2011

Free eBooks for Your New eReader

This is the second post in our new series, Library Hacks, where we take a look at cool and interesting online resources from the Smithsonian Libraries and the cyberworld at large.

Are you giving or getting an eReader this holiday season? Maybe you are one of the millions already using smartphones or tablets to access just about everything online. In my humble librarian opinion, one of the greatest uses for such devices is free downloadable books! Of course, you can and should check your local public library to find ebooks to borrow, but there are lots of websites offering access to ebooks, too. However, not all such sites give free access! Many, like Amazon.com, offer ebooks for sale only. So I thought I'd highlight some of the biggest and best sites for finding free ebooks -- which won't put an extra squeeze on your holiday budget.

Project Gutenberg logoProject Gutenberg

Project Gutenberg was the first provider of free full-text ebooks. Its founder Michael S. Hart, who passed away earlier this year, invented ebooks in 1971, so this is really the granddaddy of free downloadable book sites. It currently offers access to over 36,000 titles, but that number increases to over 100,000 ebooks when you include Project Gutenberg’s partners around the world. These books were all previously released by established publishers, which means you won’t just find a bunch of fan fiction self-published by some guy obsessed with Batman. Also, all of the ebooks uploaded by Project Gutenberg have been diligently proofread by volunteers to limit typos/errors.

Project Gutenberg offers a simple book search feature to search by title, author or subject. You can also browse the bookshelves if you’re not sure what you’re looking for. It’s fun to scan all the topics covered –- everything from children’s picture books (many with full-color illustrations) to cookery (lots of recipes!) to German language books (das ist gut!). Keep in mind -– these ebooks are available for free because their U.S. copyright has expired. But this means you won’t be able to access the current New York Times bestsellers here.

No fee or registration is required, and these ebooks can be downloaded to your PC, eReader, tablet, most smartphones, and even some MP3 players and gaming systems. Easy-to-follow instructions are available to help you figure it all out.

 

Open Library Open Library logo

The goal of Open Library, an initiative of the Internet Archive, is stated simply: One web page for every book ever published. But this definitely is not a simple task! So far the site has over 20 million edition records, and new records are constantly being added. This is truly an open project, with information being contributed by a wide variety of libraries. Individual people also are encouraged to participate by adding and/or fixing book records, writing book descriptions, adding book cover images, or editing nearly any page on the site.

Open Library offers direct access to over 1 million free ebooks in a variety of formats (PDF, plain text, ePub, DjVu, MOBi, DAISY, and "Send to Kindle"). And it's easy to use -- a simple search box is offered at the top of each page on the Open Library site. Right below that, you will find a small check box to limit your search to only ebooks. You also can browse on the Accessible Books page to see what is available for free. Open Library even has its own Lending Library with over 10,000 ebook titles available to borrow, one copy at a time for two weeks. These include mainly 20th century works which might be hard to find elsewhere online for free. For example, I found The Razor's Edge by W. Somerset Maugham, a book I've been wanting to read, which I was able to borrow, even though it's not offered for free download here or on other sites.

Another great service from Open Library is the availability of books in Digital Accessible Information SYstem (DAISY) format. DAISY presents written material in an audible format for people with print disabilities such as blindness, impaired vision, and dyslexia. Details on accessing DAISY books are provided in Open Library's FAQ section, and a list of all the devices that can read DAISY files is available at daisy.org.

 

HathiTrust Digital Library HathiTrust logo

This is one of the less well known ebook sources, but it's particularly valuable for research. The mission of the HathiTrust is "to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge." It is a partnership of over 60 major research institutions and libraries worldwide, and the HathiTrust Digital Library brings together their collections to be preserved in digital form for posterity. In fact, the name "Hathi" comes from the Hindi word for elephant, an animal known for its long memory.

It is important to realize that currently the main focus for this organization is preservation, not necessarily free public access. So while the HathiTrust has digitized nearly 10 million volumes (including both books and journals), only about a quarter of them are available free online -- a total of about 2.5 million volumes, mostly ones in the public domain. Also, these items are offered only in PDF format, which is a less eReader-friendly format than some of those available at the other ebook sites mentioned in this post.

The search options allow you to do a catalog search by title, author, subject, etc., and you can check a box to limit it to full view only (meaning complete books you can read online or download in PDF). A handy feature for doing research is the full-text search option, which allows you to look for terms within the full-text of all 10 million+ volumes that the HathiTrust has digitized. While you can't access the full-text of them all, you can determine if your search term shows up only once or multiple times in the volume, which can help you decide if it might be a resource worth tracking down for your research.

 

Google Books logo Google Books

Did you know that digitizing books was part of the driving force behind the creation of Google? Back in 1996, Google co-founders Sergey Brin and Larry Page were computer science graduate students working on a project about digital libraries and the use of a “web crawler” to search through the contents of electronic books. Google certainly grew way beyond this idea, but it wasn’t forgotten -- it evenutally became Google Books. The ultimate goal for Google Books is to scan all the books in the world, allowing people to easily search for and find the books they need. While this goal is still far off, Google Books reports that it has already scanned over 15 million books in over 400 languages.

Now a caveat -- most of these books are not available in full-text for free. Where possible, Google Books does provide free access, mainly for books that are in the public domain because the copyright has expired, or those where the copyright holder has given permission for free access. Most of these scanned books give access to only part of the text, along with links to find libraries that hold physical copies of the book or sources that sell copies. Keep in mind that, unlike the other ebook providers included in this post which are nonprofits, Google is a profit-making venture. And there has been some debate about whether Google Books should be allowed to provide even limited access to books that are still protected by copyright.

That said, Google Books is still a good resource for finding books you are interested in. It lets you browse by broad subect areas, or you can use a simple search box to search for specific words. Like the HathiTrust, this site also offers the capability of looking for terms within the full text of all its scanned books, even if the entire text is not available for free download. Your search will take you to where your term appears within the book, providing access to a limited section surrounding that term (the amount of surrounding material you can see will vary, depending upon the copyright holder's agreement with Google). This can help you determine if the book seems to be relevant to your subject and may be worth trying to find in a library or for sale if it's not available to download.

Google Books also offers both iPhone/iPad and Android apps that sync automatically with your own account on the site, as well as different formats for use with eReaders, making it even easier to take ebooks with you.

 

Have you used any of these sites to download books? If so, what did you find there? Where else have you gone to get your ebook fix? Feel free to share your experiences in the comments section below. And if you're giving an eReader as a gift, be sure to let the recipient know about these free ebook sites to get the most out of their new gadget!

Happy Holidays from the Library Hacks!

November 29, 2011

The Future of Information Alliance

The Smithsonian, along with nine other organizations, is a founding partner of the Future of Information Alliance. FIA, hosted by the University of Maryland, is described by co-director Ira Chinoy as a "thinktank without walls", interested in fostering interdisciplinary discussions of the role of information in our lives.

 

FIA stickers.jpgFIA stickers from Launch Week.

 

From November 14th-18th, the University of Maryland held a weeklong launch for the FIA, with five brainstorming discussion sessions. I was happy to be able to attend two of these sessions, "Visiting Future-ists" and "Creativity and Culture". Both sessions featured "future-ists" Dan Russel, director of user happiness for Google, Mary Czerwinski, from Microsoft's VIBE group, and Abdur Chowdhury, former chief scientist at Twitter. In the first session, the future-ists described their own work and the opportunities and challenges they saw in information. In the second session, the future-ists were joined by several University of Maryland faculty members to discuss the role that creativity could play in innovation and information. Other sessions over the week were "Transparency and Boundaries" and "Science in Our Lives". 

 

Dan Russell, Mary Czerwinski, Abdur Chowdhury. Photo by Evan Golub.

Courtesy of FIAumd on Flickr.

 

Both of the sessions that I attended included lively discussion. In the "Visiting Future-ists" meeting, Dan Russell noted that only 10% of English-speaking web users were aware of the Ctrl+F feature used to search a document or web page. This worried many people in the audience. When someone asked the panel how to bridge a tech divide like this, Mary Czerwinski posited that the problem isn't teaching people, the problem is that Ctrl + F is a poor user interface and isn't intuitive.  Another interesting quote came from Abdur Chowdhury, when an audience member asked what one should do if he or she realizes the academic institution wasn't a good fit for him or her. Chowdhury responded, "It's called a 'library'".

Besides creating interesting discussion on the future of information, FIA hopes to announce a seed grant program in the next few months. Winning projects will be characterized by an interdisciplinary approach to solving real information problems. I look forward to seeing what these innovative projects may be!

October 20, 2011

Notes from the LITA National Forum: Linked Open Data

LITA Forum Image

On September 30, two of the Smithsonian Institution Libraries' staff attended the American Library Association's LITA (Library and Information Technology) National Forum. The three-day conference was titled "Rivers of Data, Currents of Change". Although it was not explicitly defined, there was a common thread of conversation surrounding Linked Open Data throughout the conference.

For this reason, the presentation given by the Smithsonian Libraries' digial projects librarian Keri Thompson and lead developer Joel Richard, along with Trish Rose-Sandler of the Missouri Botanical Garden, was well-received. Titled "Building the New Open Linked Library: Theory and Practice," the talk gave a high-level overview of the redesign of the Libraries' website, a brief summary of Linked Data, how the Libraries' website redesign centers around the concept of Linked Open Data, and some of the unique things that happen when open data is made available on the web, specifically with the Biodiversity Heritage Library (BHL).

Keri Thompson gave a summary of where our website is today and a very concise overview of the types of content we have. She also gave a brief introduction to Linked Open Data to help get the audience up to speed, since only about half were familiar with the concept.

Joel Richard talked about implementing Linked Data in Drupal 7 and one in-depth example of the data we are planning to put online. The data set provided Taxonomic Literature 2 (or TL-2) is a database of botanists, their publications and detailed information about their contributions to botany and is being digitized by the Libraries'. He also discussed how we are mapping the TL-2 data to the Linked Open Data model, and challenges that we foresee in this development.

Trish Rose-Sandler discussed and presented examples of the new and unique types of things that people can do when we make our data available on the web. These included repurposing BHL content as well as new visualizations of data that were impossible before BHL. 

Overall, the presentation was well-attended with between 50 and 60 people in the audience and a number of good questions posed after the presentation and in subsequent days at the conference. Fortunately the Libraries' talk occurred early in the conference and it was good to see that Drupal and Linked Data were mentioned in a number of talks throughout the conference, including the closing keynote presentation by Barbara McGlamery, a Taxonomist at Martha Stewart Living Omnimedia. The Libraries' staff plan to attend the 2012 LITA National Forum in Columbus, OH to follow-up on our experiences as we build the Digital Library in the upcoming months.

View the Powerpoint slides of the presentation from the ALA Connect Site.

 

My Other Accounts

Flickr FriendFeed Twitter
RSS Feed
Blog powered by TypePad
Member since 12/2007