FREE BOOKS

Author's List




PREV.   NEXT  
|<   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   >>  
search engines (AltaVista, Yahoo!, etc.) are built upon IR technology. Similarly, though much newer, it is likely that many people will soon be using automated summarizers to condense (or at least, to extract the major contents of) single (long) documents or lots of (any length) ones together. [...] In this context, multilingualism on the Web is another complexifying factor. People will write their own language for several reasons -- convenience, secrecy, and local applicability -- but that does not mean that other people are not interested in reading what they have to say! This is especially true for companies involved in technology watch (say, a computer company that wants to know, daily, all the Japanese newspaper and other articles that pertain to what they make) or some Government Intelligence agencies (the people who provide the most up-to-date information for use by your government officials in making policy, etc.). One of the main problems faced by these kinds of people is the flood of information, so they tend to hire 'weak' bilinguals who can rapidly scan incoming text and throw out what is not relevant, giving the relevant stuff to professional translators. Obviously, a combination of SUM and MT (machine translation) will help here; since MT is slow, it helps if you can do SUM in the foreign language, and then just do a quick and dirty MT on the result, allowing either a human or an automated IR-based text classifier to decide whether to keep or reject the article. For these kinds of reasons, the US Government has over the past five years been funding research in MT, SUM, and IR, and is interested in starting a new program of research in Multilingual IR. This way you will be able to one day open Netscape or Explorer or the like, type in your query in (say) English, and have the engine return texts in *all* the languages of the world. You will have them clustered by subarea, summarized by cluster, and the foreign summaries translated, all the kinds of things that you would like to have. You can see a demo of our version of this capability, using English as the user language and a collection of approx. 5,000 texts of English, Japanese, Arabic, Spanish, and Indonesian, by visiting MuST Multilingual Information Retrieval, Summarization, and Translation System. Type your query word (say, 'baby', or whatever you wish) in and press 'Enter/Return'. In the middle window you will see the headlines (or just keyword
PREV.   NEXT  
|<   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   >>  



Top keywords:

people

 

English

 

language

 
reasons
 

interested

 
Multilingual
 

relevant

 

foreign

 
information
 
Japanese

research

 

Government

 
automated
 
technology
 
article
 

reject

 

funding

 

window

 

middle

 
headlines

keyword

 
Return
 

classifier

 

allowing

 

result

 

decide

 
program
 
languages
 

collection

 

approx


clustered

 

subarea

 

things

 

version

 

translated

 

summaries

 

summarized

 
capability
 

cluster

 

return


Arabic
 

Netscape

 
System
 
Explorer
 
Translation
 

visiting

 

Indonesian

 
Spanish
 
engine
 

Information