FREE BOOKS

Author's List




PREV.   NEXT  
|<   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   >>  
search engines (AltaVista, Yahoo!, etc.) are built upon IR technology. Similarly, though much newer, it is likely that many people will soon be using automated summarizers to condense (or at least, to extract the major contents of) single (long) documents or lots of (any length) ones together. [...] In this context, multilingualism on the Web is another complexifying factor. People will write their own language for several reasons -- convenience, secrecy, and local applicability -- but that does not mean that other people are not interested in reading what they have to say! This is especially true for companies involved in technology watch (say, a computer company that wants to know, daily, all the Japanese newspaper and other articles that pertain to what they make) or some Government Intelligence agencies (the people who provide the most up-to-date information for use by your government officials in making policy, etc.). One of the main problems faced by these kinds of people is the flood of information, so they tend to hire 'weak' bilinguals who can rapidly scan incoming text and throw out what is not relevant, giving the relevant stuff to professional translators. Obviously, a combination of SUM and MT (machine translation) will help here; since MT is slow, it helps if you can do SUM in the foreign language, and then just do a quick and dirty MT on the result, allowing either a human or an automated IR-based text classifier to decide whether to keep or reject the article. For these kinds of reasons, the US Government has over the past five years been funding research in MT, SUM, and IR, and is interested in starting a new program of research in Multilingual IR. This way you will be able to one day open Netscape or Explorer or the like, type in your query in (say) English, and have the engine return texts in *all* the languages of the world. You will have them clustered by subarea, summarized by cluster, and the foreign summaries translated, all the kinds of things that you would like to have. You can see a demo of our version of this capability, using English as the user language and a collection of approx. 5,000 texts of English, Japanese, Arabic, Spanish, and Indonesian, by visiting MuST Multilingual Information Retrieval, Summarization, and Translation System. Type your query word (say, 'baby', or whatever you wish) in and press 'Enter/Return'. In the middle window you will see the headlines (or just keyword
PREV.   NEXT  
|<   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   >>  



Top keywords:

people

 

English

 
language
 

reasons

 

interested

 

Multilingual

 

relevant

 
foreign
 

information

 

Japanese


research

 

Government

 

automated

 
technology
 
article
 

reject

 

funding

 
window
 

middle

 

headlines


keyword
 

Return

 
classifier
 

allowing

 

result

 

decide

 

program

 

languages

 

collection

 
approx

clustered

 

subarea

 

things

 
version
 

translated

 
summaries
 
summarized
 

capability

 

cluster

 
return

Arabic

 
Netscape
 
System
 

Explorer

 

Translation

 

visiting

 

Indonesian

 
Spanish
 
engine
 

Information