JAIST Repository >
School of Information Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/10665

Title: A Hidden Topic-Based Framework toward Building Applications with Short Web Documents
Authors: Phan, Xuan-Hieu
Nguyen,Cam-Tu
Le, Dieu-Thu
Nguyen, Le-Minh
Horiguchi, Susumu
Ha, Quang-Thuy
Keywords: Topic Modeling
Web mining
hidden topic analysis,
text classification
ranking
Issue Date: 2010-02-18
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Magazine name: IEEE Transactions on Knowledge and Data Engineering
Volume: 23
Number: 7
Start page: 961
End page: 976
DOI: 10.1109/TKDE.2010.27
Abstract: This paper introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.
Rights: Copyright (C) 2010 IEEE. IEEE Transactions on Knowledge and Data Engineering, 23(7), 2010, 961-976. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
URI: http://hdl.handle.net/10119/10665
Material Type: publisher
Appears in Collections:b10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
14506.pdf644KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.

 


Contact : Library Information Section, Japan Advanced Institute of Science and Technology