FREE BOOKS

Author's List




PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  
the lowest common denominator". It can be read, written, copied and printed by any simple text editor or word processor on every computer in the world. It is the only format compatible with 99% of hardware and software. It can be used as it is or to create versions in many other formats. It will still be used while other formats will be obsolete (or are already obsolete, like formats of a few short-lived reading devices launched between 1999 and 2003). It is the assurance collections will never be obsolete, and will survive future technological changes. The goal is to preserve the texts not only over decades but over centuries. There is no other standard as widely used as ASCII right now, even Unicode, a "universal" encoding system created in 1991. Project Gutenberg also publishes eBooks in well-known formats like HTML, XML or RTF. There are Unicode files too. Any other format provided by volunteers (PDF, LIT, TeX and many others) is usually accepted, as long as they also supply an ASCII version where possible. But a large scale conversion into other formats is handed over to other organizations. For example Blackmask Online, which uses Project Gutenberg's collections to offer thousands of free eBooks in eight different formats based on the Open eBook (OeB) format. Or Manybooks.net, which converts Project Gutenberg's eBooks into formats readable on PDAs. Or Bookshare.org, the main digital library for the visual impaired community in the US, which converts books from Project Gutenberg into Braille format and DAISY (Digital Audio Information System) format. What is entailed exactly, once copyright clearance is received? Digitization is done by scanning the book page after page to get "image" files. Then volunteers run an OCR (Optical Character Recognition) software to convert "image" files into text files. Then each text file is proofread (i.e. re-read and corrected) by comparing it to the "image" file or the original page of the print version. There is an average of 10 mistakes per page for a good OCR package and... many more mistakes if the quality of the scanner and the OCR package is not great. The book is proofread twice on the computer screen by two different people, who make any corrections necessary. When the original is in poor condition, as with very old books, it is keyed in manually, word by word. Some volunteers themselves prefer to type short texts, or works they particularly like. But most books are sc
PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  



Top keywords:
formats
 

format

 

Project

 

Gutenberg

 

volunteers

 
eBooks
 
obsolete
 

Unicode

 

original

 

mistakes


package

 
converts
 

version

 

proofread

 

collections

 

computer

 

software

 

entailed

 

prefer

 

System


Information
 

Digitization

 

scanning

 
received
 
clearance
 
Digital
 
copyright
 

digital

 

library

 

Bookshare


visual

 
denominator
 

Braille

 

community

 

impaired

 
corrections
 

average

 

screen

 

scanner

 
quality

people

 

comparing

 

condition

 
Optical
 

Character

 

Recognition

 

lowest

 

common

 

manually

 
convert