FREE BOOKS

Author's List




PREV.   NEXT  
|<   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  
anned, "OCRized" and proofread. Digitization in "text format" means a book can be copied, indexed, searched, analyzed and compared with other books. It is possible to search the content of the book with the "Find" button available in any browser and any software, without a specific search engine. Project Gutenberg provides a "Nearly Full Text" search (on the first 100 K of each file) using Google, with a database updated approximately monthly. It also provides a search of book metadata (author, title, brief description, keywords) as a participant in Yahoo!'s Content Acquisition Program, with a database updated weekly. (Please see the bottom of the Online Book Catalog.) In the Advanced Search, several fields can be filled: author, title, subject, language, category (any, audio book, music, pictures), LoCC (Library of Congress Catalog classification), filetype (text, PDF, HTML, XML, JPEG, etc.), and eText/eBook No. A field "Full Text" was recently added as an experimental feature. The assets of digitization in "text format" are numerous. It makes a smaller and more easily sendable computer file, unlike digitization in "image format", which produces a bulky "photo" file. Contrary to other formats, the files are accessible for low-bandwidth use. They can be copied as much as needed to produce new digital or print versions for free. The typos pointed out after the text is released can be fixed at any time. Readers can change the font and size of characters, the margins or the number of lines per page. Visually impaired readers can increase the letter size. Blind readers can use speech recognition software. All this is very difficult, if not impossible, with many other formats. If the eBooks released are 99.9% accurate in the eyes of the general reader, the goal is not to create authoritative editions, and to argue with a picky reader whether a certain sentence should have a colon instead of a semi-colon between its clauses. Project Gutenberg is convinced that proofreading by human beings is a very important step, and that this step makes all the difference. The use of scanned books as is --converted to text format by OCR software with no proofreading-- gives a much lower quality result. After running OCR software, the text is 99% reliable, in the best of cases. After proofreading, the text becomes 99.95% reliable (a high percentage which is also the standard at the Library of Congress). For this reason, Project Guten
PREV.   NEXT  
|<   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   >>  



Top keywords:
search
 

software

 

format

 

Project

 
proofreading
 
reader
 

updated

 
author
 

Library

 

Catalog


digitization

 

released

 
formats
 

readers

 
database
 
Congress
 

reliable

 

copied

 
Gutenberg
 

speech


margins

 

number

 

characters

 
letter
 

Visually

 
impaired
 

increase

 

running

 

Readers

 

versions


standard

 

digital

 
reason
 

pointed

 

percentage

 

recognition

 
change
 
sentence
 

important

 

difference


scanned

 

clauses

 

convinced

 

converted

 
editions
 

impossible

 
quality
 

result

 
difficult
 

beings