Return to ACL-03 Home
Machine Translation of a Very Large User Support Database
We demonstrate an example-based machine translation system deployed to translate English-language articles in the Microsoft Product Support Services Knowledge Base. The MSR-MT system is architected to permit simultaneous development among multiple languages, and can be rapidly trained to a client's domain using existing translated documents.
English Writing Wizard
We demonstrate English Writing Wizard. It provides assistance for users in English writing. Our approach to polish English sentences is to dynamically recommend appropriate sentences through NLP-based information retrieval when users are writing. Our approach accepts queries in both English and Chinese. There are three key technologies: mining collocations from un-annotated corpus, mining synonymous expressions from corpus, and translating Chinese collocations into English Collocations.
IBM TAKMI: Text Analysis and Knowledge Mining for Business Intelligence and Life Sciences
We will demonstrate two text mining systems, IBM TAKMI for business intelligence (TAKMI, for short) and IBM TAKMI for life sciences (MedTAKMI, for short). By applying shallow parsing, synonym/semantic dictionary lookup, and information extraction, our systems can analyze millions of documents and extract useful information for specific domains. TAKMI has been applied to the CRM (Customer Relationship Management) domain for the analysis of customer contact records, and MedTAKMI has been applied to a medical domain for the analysis of MEDLINE documents.
Nippon Telegraph and Telephone Corp.
NTT Communication Science Laboratories provides a large lexical database (Nihongo-no Goitokuse; Lexical properties of Japanese) containing over 80,000 Japanese words and the characters, and a comprehensive thesaurus (Nihongo Goi-taikei; A Japanese Lexicon) including 300,000 Japanese words classified into 3,000 semantic categories. Goi-taikei provides a base ontology to SAIQA, NTT's Open-Domain Question Answering systems. SAIQA's Named Entity Recognizer employs Support Vector Machine (SVM), a high performance machine-learning method. Since off-the-shelf SVM classifiers were too inefficient, we developed a new algorithm that is orders-of-magnitude faster.
We demonstrate a web-based machine translation environment 'Yakushite.Net' that can be improved in terms of accuracy and scope through online collaboration by users. The environment leverages the cooperative efforts of online users for the creation of highly accurate dictionaries, enabling people with deep knowledge of a particular subject to collaborate in the enhancement of specialized dictionaries for online machine translation. Also we show future plan of the environment.
Cliche : Integrating MT and TM
Seiji Okura, Tatsuo Yamashita,
Masaru Fuji, Akira Ushioda
We demonstrate a machine-aided translation system, Cliche, in which machine translation (MT) and translation memory (TM) technologies are integrated. Cliche's TM module handles translation examples analyzed by MT module, so that advanced search is possible. This system enables a high-speed, high-quality on-line translation and a translation by a group of translators from remote locations connected by a network.
NEC's Natural Language Processing Technologies
NEC's continuing research and
development of natural language processing technologies has
created many versatile products. In this exhibition, we will
demonstrate TABITSU, TopicScope, and Reputation Search
GroupScribe: A Rule-based Group Communication Management System
Sougo Tsuboi and Hideo Umeki
Communication and documentation can be complementary processes in various group-work situations. We have developed a group communication support system, GroupScribe, that provides a mechanism for extracting parts of relevant information from e-mail messages posted to each community and reorganizing them into auto-updating documents. Each document created in GroupScribe has a consolidation rule, which includes the range, the type, and the layout of extracts. These documents can also be edited interactively and shared as appropriate with other people. GroupScribe can therefore facilitate the users creating documents based on the contents of communication and managing both communication and documentation efficiently.
Recent Results of NLP Research at Hitachi's Central Research Laboratory
LanA Consulting is a software company specializing in IT applications including multilingual natural language processing.
NTCIR: Large-scale test collections for IR, QA and Summarization
The goal of the NTCIR Project is to provide the infrastructure of large-scale evaluation of information access technologies, which enhance better access to information in huge document collections using language analysis and information retrieval. The targets have been cross-language information retrieval of Chinese, Korean, Japanese and English, term recognition, text summarization, question answering, web retrieval, patent retrieval, and so on.
Computational Linguistics Group of the Communications Research Laboratory (CRL) is doing research on natural language processing (NLP). We cover from basic research on language to applications using NLP technologies. In this exhibition, we will demonstrate 1) Japanese-English Bilingual Corpora and its applications, 2) Japanese Learners' Corpus of English and 3) Medical Speech Translator.
Innovation of Natural Language Processing at the University of Tokyo
Our group is concerned with a framework for representing, embedding and retrieving intelligent knowledge in texts to integrate text processing and knowledge processing, including:
LIL is a research group of Information Technology Center. Tightly connected to the university library resource, we conduct following projects to support the actual text processing activities performed by the university students and faculty members:
Our goal is to develop intelligent media technology that facilitates human-to-human communication and to explore seeds for social contribution including risk communication, using:
A Statistical-Information-Based Selector of the Best among Multiple MT Outputs
Yasuhiro Akiba, Eiichiro Sumita,
Hiromi Nakaiwa, and Seiichi Yamamoto
The authors demonstrate a system for automatically selecting the best among outputs from three machine translation (MT) systems: D-cube, HPAT, and SAT, which are, respectively, an example-based MT, a pattern-based MT, and an SMT. The selection system assigns scores to each MT output by using statistical models of the target language and the translation, compares their scores statistically by using a non-parametric multiple comparison test, and selects the best. The selection system and MT systems are subsystems of ATR's speech-to-speech translation system and are automatically constructed using a bilingual corpus, ATR's BTEC (Basic Travel Expression Corpus).
In an attempt to build a firm basis for the processing technology of spontaneous speech, we have been compiling a large annotated corpus of spontaneous Japanese since 1999 aiming at the final public release in the spring of 2004. The Corpus of Spontaneous Japanese, or CSJ, contains more than 650 hours of spontaneous Standard Japanese produced by more than 1400 speakers. Speech signal, two-way transcription, and two-way POS annotation are provided for the whole corpus. In addition to these, phonetic labels (both segment and intonation), clause-boundary labels, dependency-structure labels, and, discourse structure labels are to be provided for a subset of the CSJ covering about 500,000 words, or 44 hours. Details of the corpus will be shown using real examples and the results of preliminary linguistic analyses.
Zero Detector as a Japanese Language Teaching Aid
Zero Detector (ZD) is a linguistic analysis tool for Japanese language teachers. This program was developed to promote effective instruction of zero pronouns (zeros) by making invisible zeros visible. ZD takes Japanese narrative texts as input and undergoes morphological and syntactic analyses and zero detecting/inserting processes. It then provides various information on the input clause, including places of zeros, and a valency pattern of the predicate, depending on teachers' needs. ZD helps teachers: (1) predict the difficulties with zeros that students might encounter, and (2) provide students with systematic instruction of zeros, both in their interpretation and production of discourses containing zeros.
Integrating multiple databases of research papers with the data on the WWW
We have developed a system which makes it possible to retrieve papers from multiple databases at a time. Our system can show the citation relationships between papers together with their reasons for citations visually. Using our system, researchers can grasp the outline of a domain at a glance.
Applications of Natural Language Processing Method Using
Inductive Learning - Araki Laboratory, Hokkaido University -
University, Sapporo, JAPAN.
Our demonstrations are as
Research Activities in Japan Advanced Institute of Science and Technology
Akira Shimazu, Satoshi Tojo,
Kiyoaki Shirai and Kentaro Torisawa
In this exhibition, we will show the research activities being conducted by the following four faculty members, who are studying computational linguistics and related research fields.
Prof. Akira Shimazu: Our main research topics are first, to model information, linguistic structure and relation between them, from the viewpoint of communication, and second, to develop a computational model for natural dialogues as seen in daily life. We will introduce some examples from our research.
Prof. Satoshi Tojo: Music scores are a form of language, in that they are both grammatical sequences of symbols, in terms of both rhythmic structure and cadence. According to music theory, salient notes dominate other notes and important chords dominate other chords, and thus the notion of `head' plays a major role. To support this notion, we will present Generative Theory of Tonal Music, and show our approach based on Head-driven Phrase Structure Grammar.
Assoc. Prof. Kiyoaki Shirai: Our main research topic is corpus-based natural language processing so as to achieve word sense disambiguation, statistical parsing, etc. We will introduce a word sense disambiguation, or WSD, system using two heterogeneous language resources as a supporting technology for a document reading assistant system.
Assoc. Prof. Kentaro Torisawa: Our research interests include automatic knowledge acquisition from large-scale corpora, high-level grammar formalisms for natural languages, and parsing algorithms. In the exhibition, we will show word clusters induced by EM-based clustering algorithms and related research.
We will present a large-scale associative concept dictionary and its implementation as a brain memory model on pulsed neural network architecture. As an application system using the model, we will demonstrate a computational system for metaphor understanding. The system obtains a meaning of a metaphorical expression, "A is B," just like a human intuitive understanding. The dictionary was built by using large-scale associative data obtained by human association experiments. Distances between stimulus words and their associated words are calculated with a linear programming method using response parameters in the association experiments.
Our exhibition introduces recent research activities at Language Media Laboratory of Kyoto University, focusing on knowledge acquisition from the Web. The Web can be viewed as the hugest corpus: NLP technologies enable automatic or semi-automatic acquisition of various kinds of knowledge from the corpus. We will demonstrate two systems. The first system realizes automatic collection of related terms from seed words, which can be used as a tool of compiling a glossary of a certain domain. The second system realizes bilingual lexicon acquisition from comparable corpora, which helps compile a Japanese-English dictionary.
Computational Linguistics Laboratory of Nara Institute of Science and Technology will demonstrate basic natural language tools, and integrated tools and environment for paraphrasing.
Our natural language tools include part-of-speech taggers (ChaSen and MeCab) for Japanese, Chinese and English, pharse and NE chunkers, and syntactic dependency analyzer (CaboCha) for Japanese, all based on machine learning techniques such as HMM and SVM.
The integrated environment for paraphrasing is named KURA, and we will demonstrate its facilities and a question answering system, KURA-QA.
Our exhibition introduces the Tohoku University 21st Century Center of Excellence (COE) program in humanities entitled "A Strategic Research and Education Center for an Integrated Approach to Language and Cognition". The principal disciplines involved in this project include Linguistics, Brain-Functional Studies, Cognitive Psychology, and Robotics/Artificial Intelligence. The goals of this project are: (i) Creation of a new field of "Integrated linguistics Science", which sheds light on studies of language learning, language acquisition, language disorder, age-related language loss, and robot language; (ii) Mutual interaction between theoretical and experimental studies: theoretical linguistic studies can receive feedback from experimental sciences, e.g., brain-functional studies and cognitive psychology, and vice versa.
The explosive growth of Internet and increased availability of electronic media in many languages, has promoted the development of multilingual systems capable of running in several monolingual modes. The global implications of changes in recent society and scientific communities necessitate multi-national collaboration and thus a shift of emphasis the development of multilingual information retrieval systems for crossing language boundaries.
Most of the classification research is about term-weighting based on vector model, and some methods have proposed for estimating term relevance. There are several approaches which may use to address the particular problems of CLIR, including dictionary based, corpus-based and machine translation.
The exhibition shows a prototype system K2 in which a user interacts with animated agents in the virtual world. Through speech input, the user can command the agents to manipulate the objects. The agent's behavior and the subsequent changes in the virtual world are presented to the user in terms of a three-dimensional animation. Through the prototype system, the project aims to explore natural language understanding situated to a real/virtual world and the relation between language understanding and action.
We demonstrate a Web-based system, Asunaro, which facilitates users to read sentences in the Japanese language. Asunaro provides users with information such as meanings of words, structures of sentences and explanations of syntax and idiomatic expressions in English, Thai, Indonesian, Malay and Chinese. Asunaro realises this by using morphological and syntactic analyses.
We are developing a new machine translation method called ``Analogical Mapping Method for MT based on Semantic Typology''. Our new method is constructed from two theories: One is the Semantic Typology Theory. This theory suggests that human understanding of the world is accompanied by an epistemological framework under the influence of one's mother tongue. The other is the Analogical Mapping Theory advocated by Kikuya Ichikawa. To test the applicability of these theories we are compiling sentence pattern database and semantic pattern database.
For demonstration, We will show the sentence pattern database we have compiled and show our method for matching Japanese sentences to sentence pattern database.
Recent Research at Mori Laboratory, Yokohama National University
We will introduce our recent research at Mori laboratory, Yokohama National University, Japan. It includes question-answering systems, navigation systems for IR results, summarizers for multiple documents, and so on.
We will demonstrate the following experimental systems: