JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/14737

タイトル: Factors affecting sentiment prediction of malay news headlines using machine learning approaches
著者: Alfred, Rayner
Wong, Wei Yee
Lim, Yuto
Obit, Joe Henry
キーワード: Topic Selection
Feature Extraction
Classification
Clustering
発行日: 2016-09-18
出版者: Springer
誌名: Communications in Computer and Information Science
巻: 652
開始ページ: 289
終了ページ: 299
DOI: 10.1007/978-981-10-2777-2_26
抄録: Malay language is a major language that is in used by citizens of Malaysia, Indonesia, Singapore and Brunei. As the language is widely used, there are abundant of text articles written in Malay language that are available on the internet. This has resulted in the increasing of the Malay articles published online and the number of articles has increased greatly over the years. Automatically labeling Malay text articles is crucial in managing these articles. Due to lack of resources and tools used to perform the topic selection automatically for Malay text articles, this paper studies the factors that influence the performances of the algorithms that can be applied to perform a topic selection automatically for Malay articles. This is done by comparing the contents of the articles with the corresponding topics and all Malay articles will be assigned to the appropriate topics depending on the results of the classification process. In this paper、 all Malay articles will be classified by using the k-Nearest Neighbors (k-NN) and Na?ve Bayes classifiers. Both classifiers are used to classify and assign a topic to these Malay articles according to a predefined set of topics. The effectiveness of classifying these Malay articles using the k-NN classifier is highly dependent on the distance methods used and the number of Nearest Neighbors, k. Thus, this paper also assesses the effects of using different distance methods (e.g., Cosine Similarity and the Euclidean Distance) and varying the number of clusters, k. Other than that, the effects of utilizing the stemming process on the performance of the classifiers are also studied. Based on the results obtained, the proposed approach shows that the k-NN classifier performs better than the Na?ve Bayes classifier in classifying the Malay articles into their respective topics. In addition to that, the stemming process also improves the overall performances of both classifiers. Other findings include the application of Cosine Similarity as the distance measure has improved the performance of the k-NN classifier.
Rights: This is the author-created version of Springer, Rayner Alfred, Wong Wei Yee, Yuto Lim, Joe Henry Obit, Communications in Computer and Information Science, 652, 2016, 289-299. The original publication is available at www.springerlink.com, http://dx.doi.org/10.1007/978-981-10-2777-2_26
URI: http://hdl.handle.net/10119/14737
資料タイプ: author
出現コレクション:b10-1. 雑誌掲載論文 (Journal Articles)

このアイテムのファイル:

ファイル 記述 サイズ形式
22897.pdf306KbAdobe PDF見る/開く

当システムに保管されているアイテムはすべて著作権により保護されています。

 


お問い合わせ先 : 北陸先端科学技術大学院大学 研究推進課図書館情報係