JAIST Repository >
School of Information Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/7832

Title: Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora
Authors: NGUYEN, Tri-Thanh
SHIMAZU, Akira
Keywords: fine person categories extraction
named entities
pattern extraction
algorithm
Issue Date: 2007-10-01
Publisher: 電子情報通信学会
Magazine name: IEICE TRANSACTIONS on Information and Systems
Volume: E90-D
Number: 10
Start page: 1542
End page: 1549
DOI: 10.1093/ietisy/e90-d.10.1542
Abstract: Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.
Rights: Copyright (C)2007 IEICE. Tri-Thanh Nguyen, Akira Shimazu, IEICE TRANSACTIONS on Information and Systems, E90-D(10), 2007, 1542-1549. http://www.ieice.org/jpn/trans_online/
URI: http://hdl.handle.net/10119/7832
Material Type: publisher
Appears in Collections:b10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
A11970.pdf430KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.

 


Contact : Library Information Section, Japan Advanced Institute of Science and Technology