FREE BOOKS

Author's List




PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35  
>>  
read on a cell phone. Or Wattpad, a free service for reading and sharing stories on a mobile phone. Once downloaded to your phone, the service gives instant access to works from Project Gutenberg. As a volunteer, the wisest thing to do is to choose a book published before 1923. It is also required that copyright clearance be confirmed prior to working on any book by sending a photocopy of the title page and verso page (even if the latter is blank) to Michael Hart. The pages should be sent as scans to be uploaded on the website. For people who cannot create scans, it is possible to send photocopies by postal mail. The pages will then be filed, either on paper or electronically, so that the proof will be available in the future, to demonstrate if necessary that the book is in the public domain under the US law. Project Gutenberg doesn't release any book until the book's copyright status has been confirmed. What is entailed exactly, once copyright clearance is received? Digitization is done by scanning the book page after page to get "image" files. Then volunteers run an OCR (Optical Character Recognition) software to convert "image" files into text files. Then each text file is proofread (i.e. re-read and corrected) by comparing it to the "image" file or the original page of the print version. There is an average of 10 mistakes per page for a good OCR package, and many more mistakes if the quality of the scanner and the OCR package is not great. The book is proofread twice on the computer screen by two different people, who make any corrections necessary. When the original is in poor condition, as with very old books, it is keyed in manually, word by word. Some volunteers themselves prefer to type short texts, or works they particularly like. But most books are scanned, "OCRized" and proofread. Contrary to digitization in "image format", which consists only in scanning the pages, digitization in "text format" adds the OCR step: a) the book can be copied, indexed, searched, analyzed and compared with other books; b) it is possible to search the content of the book with the "Find" button available in any browser and any software, without a specific search engine. The assets of digitization in "text format" are numerous. It makes a smaller and more easily sendable computer file, unlike digitization in "image format", which produces a bulky "photo" file. Contrary to other formats, the files are accessible for low-bandw
PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35  
>>  



Top keywords:
digitization
 

format

 

proofread

 

copyright

 

Contrary

 
people
 
mistakes
 

volunteers

 

original

 

software


scanning

 
package
 

computer

 

Gutenberg

 

clearance

 

search

 

service

 

Project

 

confirmed

 

scanner


numerous
 

quality

 

assets

 
corrections
 
screen
 
smaller
 
unlike
 

formats

 

version

 

accessible


corrected

 
comparing
 

average

 

sendable

 

easily

 
produces
 

engine

 

scanned

 

OCRized

 
content

compared

 

analyzed

 

copied

 
consists
 

searched

 

indexed

 

manually

 

specific

 

condition

 
browser