FREE BOOKS

Author's List




PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  
the lowest common denominator". It can be read, written, copied and printed by any simple text editor or word processor on every computer in the world. It is the only format compatible with 99% of hardware and software. It can be used as it is or to create versions in many other formats. It will still be used while other formats will be obsolete (or are already obsolete, like formats of a few short-lived reading devices launched between 1999 and 2003). It is the assurance collections will never be obsolete, and will survive future technological changes. The goal is to preserve the texts not only over decades but over centuries. There is no other standard as widely used as ASCII right now, even Unicode, a "universal" encoding system created in 1991. Project Gutenberg also publishes eBooks in well-known formats like HTML, XML or RTF. There are Unicode files too. Any other format provided by volunteers (PDF, LIT, TeX and many others) is usually accepted, as long as they also supply an ASCII version where possible. But a large scale conversion into other formats is handed over to other organizations. For example Blackmask Online, which uses Project Gutenberg's collections to offer thousands of free eBooks in eight different formats based on the Open eBook (OeB) format. Or Manybooks.net, which converts Project Gutenberg's eBooks into formats readable on PDAs. Or Bookshare.org, the main digital library for the visual impaired community in the US, which converts books from Project Gutenberg into Braille format and DAISY (Digital Audio Information System) format. What is entailed exactly, once copyright clearance is received? Digitization is done by scanning the book page after page to get "image" files. Then volunteers run an OCR (Optical Character Recognition) software to convert "image" files into text files. Then each text file is proofread (i.e. re-read and corrected) by comparing it to the "image" file or the original page of the print version. There is an average of 10 mistakes per page for a good OCR package and... many more mistakes if the quality of the scanner and the OCR package is not great. The book is proofread twice on the computer screen by two different people, who make any corrections necessary. When the original is in poor condition, as with very old books, it is keyed in manually, word by word. Some volunteers themselves prefer to type short texts, or works they particularly like. But most books are sc
PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  



Top keywords:
formats
 

format

 

Project

 

Gutenberg

 
volunteers
 
eBooks
 

obsolete

 
Unicode
 

original

 

mistakes


package

 

converts

 
version
 

proofread

 
collections
 
computer
 

software

 

entailed

 
prefer
 

System


Information

 

Digitization

 

scanning

 
received
 

clearance

 
Digital
 

copyright

 

digital

 

library

 

Bookshare


visual

 

denominator

 
Braille
 

community

 

impaired

 

corrections

 
average
 
screen
 

scanner

 

quality


people

 

comparing

 

condition

 

Optical

 
Character
 

Recognition

 
lowest
 

common

 
manually
 

convert