FREE BOOKS

Author's List




PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35  
>>  
read on a cell phone. Or Wattpad, a free service for reading and sharing stories on a mobile phone. Once downloaded to your phone, the service gives instant access to works from Project Gutenberg. As a volunteer, the wisest thing to do is to choose a book published before 1923. It is also required that copyright clearance be confirmed prior to working on any book by sending a photocopy of the title page and verso page (even if the latter is blank) to Michael Hart. The pages should be sent as scans to be uploaded on the website. For people who cannot create scans, it is possible to send photocopies by postal mail. The pages will then be filed, either on paper or electronically, so that the proof will be available in the future, to demonstrate if necessary that the book is in the public domain under the US law. Project Gutenberg doesn't release any book until the book's copyright status has been confirmed. What is entailed exactly, once copyright clearance is received? Digitization is done by scanning the book page after page to get "image" files. Then volunteers run an OCR (Optical Character Recognition) software to convert "image" files into text files. Then each text file is proofread (i.e. re-read and corrected) by comparing it to the "image" file or the original page of the print version. There is an average of 10 mistakes per page for a good OCR package, and many more mistakes if the quality of the scanner and the OCR package is not great. The book is proofread twice on the computer screen by two different people, who make any corrections necessary. When the original is in poor condition, as with very old books, it is keyed in manually, word by word. Some volunteers themselves prefer to type short texts, or works they particularly like. But most books are scanned, "OCRized" and proofread. Contrary to digitization in "image format", which consists only in scanning the pages, digitization in "text format" adds the OCR step: a) the book can be copied, indexed, searched, analyzed and compared with other books; b) it is possible to search the content of the book with the "Find" button available in any browser and any software, without a specific search engine. The assets of digitization in "text format" are numerous. It makes a smaller and more easily sendable computer file, unlike digitization in "image format", which produces a bulky "photo" file. Contrary to other formats, the files are accessible for low-bandw
PREV.   NEXT  
|<   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35  
>>  



Top keywords:
digitization
 

format

 

proofread

 

copyright

 
Contrary
 
people
 

mistakes

 
volunteers
 

original

 

software


scanning

 

package

 
computer
 

Gutenberg

 
clearance
 
search
 

service

 

Project

 
confirmed
 

scanner


numerous

 

quality

 

assets

 
corrections
 

screen

 
smaller
 

unlike

 

formats

 

version

 

accessible


corrected

 

comparing

 
average
 

sendable

 

easily

 

produces

 
engine
 
scanned
 

OCRized

 

content


compared

 

analyzed

 

copied

 

consists

 
searched
 

indexed

 
manually
 

specific

 
condition
 

browser