JAIST Repository >
Center for Strategic Development of Science and Technology 2003-2008 >
JAIST PRESS Publications >
IFSR 2005 >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/10119/3923
|
Title: | A High Precision Algorithm for Automatic Extraction of High-frequency Words Based on Statistics |
Authors: | XUAN, Zhaoguo DANG, Yanzhong JIANG, Shaohua ZHAO, Mingwei |
Keywords: | Chinese-word segmentation statistics algorithm high-frequency words Chinese information processing |
Issue Date: | Nov-2005 |
Publisher: | JAIST Press |
Abstract: | Automatic Chinese Word Segmentation is one of the basic research issues on text categorization, automatic summarization and information retrieval as well as other Chinese Information Processing tasks. In this paper we put forward a high precision algorithm for extracting high-frequency words without thesaurus. It firstly counts the frequencies of co-occurrence patterns of Chinese characters from documents, then eliminates the “bridge-connection” frequencies and therefore obtains the support frequencies of patterns. Afterwards, the words are identified and acquired according to the support frequencies instead of the primary appearing frequencies. The proposed algorithm is tested in the task of extracting words from several sets of scientific document abstracts, and the results show that this algorithm can improve both precision and recall of extracted lexical set to some extent. This algorithm can either be applied to text categorization and automatic summarization. |
Description: | The original publication is available at JAIST Press http://www.jaist.ac.jp/library/jaist-press/index.html IFSR 2005 : Proceedings of the First World Congress of the International Federation for Systems Research : The New Roles of Systems Sciences For a Knowledge-based Society : Nov. 14-17, 2133, Kobe, Japan Symposium 6, Session 4 : Vision of Knowledge Civilization Future Technology |
Language: | ENG |
URI: | http://hdl.handle.net/10119/3923 |
ISBN: | 4-903092-02-X |
Appears in Collections: | IFSR 2005
|
Files in This Item:
File |
Description |
Size | Format |
20033.pdf | | 71Kb | Adobe PDF | View/Open |
|
All items in DSpace are protected by copyright, with all rights reserved.
|