JAIST Repository >
Center for Strategic Development of Science and Technology 2003-2008 >
JAIST PRESS Publications >
IFSR 2005 >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/3923

Title: A High Precision Algorithm for Automatic Extraction of High-frequency Words Based on Statistics
Authors: XUAN, Zhaoguo
DANG, Yanzhong
JIANG, Shaohua
ZHAO, Mingwei
Keywords: Chinese-word segmentation
statistics algorithm
high-frequency words
Chinese information processing
Issue Date: Nov-2005
Publisher: JAIST Press
Abstract: Automatic Chinese Word Segmentation is one of the basic research issues on text categorization, automatic summarization and information retrieval as well as other Chinese Information Processing tasks. In this paper we put forward a high precision algorithm for extracting high-frequency words without thesaurus. It firstly counts the frequencies of co-occurrence patterns of Chinese characters from documents, then eliminates the “bridge-connection” frequencies and therefore obtains the support frequencies of patterns. Afterwards, the words are identified and acquired according to the support frequencies instead of the primary appearing frequencies. The proposed algorithm is tested in the task of extracting words from several sets of scientific document abstracts, and the results show that this algorithm can improve both precision and recall of extracted lexical set to some extent. This algorithm can either be applied to text categorization and automatic summarization.
Description: The original publication is available at JAIST Press http://www.jaist.ac.jp/library/jaist-press/index.html
IFSR 2005 : Proceedings of the First World Congress of the International Federation for Systems Research : The New Roles of Systems Sciences For a Knowledge-based Society : Nov. 14-17, 2133, Kobe, Japan
Symposium 6, Session 4 : Vision of Knowledge Civilization Future Technology
Language: ENG
URI: http://hdl.handle.net/10119/3923
ISBN: 4-903092-02-X
Appears in Collections:IFSR 2005

Files in This Item:

File Description SizeFormat
20033.pdf71KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology