JAIST Repository >
School of Information Science >
Articles >
Journal Articles >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/14737

Title: Factors affecting sentiment prediction of malay news headlines using machine learning approaches
Authors: Alfred, Rayner
Wong, Wei Yee
Lim, Yuto
Obit, Joe Henry
Keywords: Topic Selection
Feature Extraction
Issue Date: 2016-09-18
Publisher: Springer
Magazine name: Communications in Computer and Information Science
Volume: 652
Start page: 289
End page: 299
DOI: 10.1007/978-981-10-2777-2_26
Abstract: Malay language is a major language that is in used by citizens of Malaysia, Indonesia, Singapore and Brunei. As the language is widely used, there are abundant of text articles written in Malay language that are available on the internet. This has resulted in the increasing of the Malay articles published online and the number of articles has increased greatly over the years. Automatically labeling Malay text articles is crucial in managing these articles. Due to lack of resources and tools used to perform the topic selection automatically for Malay text articles, this paper studies the factors that influence the performances of the algorithms that can be applied to perform a topic selection automatically for Malay articles. This is done by comparing the contents of the articles with the corresponding topics and all Malay articles will be assigned to the appropriate topics depending on the results of the classification process. In this paper、 all Malay articles will be classified by using the k-Nearest Neighbors (k-NN) and Na?ve Bayes classifiers. Both classifiers are used to classify and assign a topic to these Malay articles according to a predefined set of topics. The effectiveness of classifying these Malay articles using the k-NN classifier is highly dependent on the distance methods used and the number of Nearest Neighbors, k. Thus, this paper also assesses the effects of using different distance methods (e.g., Cosine Similarity and the Euclidean Distance) and varying the number of clusters, k. Other than that, the effects of utilizing the stemming process on the performance of the classifiers are also studied. Based on the results obtained, the proposed approach shows that the k-NN classifier performs better than the Na?ve Bayes classifier in classifying the Malay articles into their respective topics. In addition to that, the stemming process also improves the overall performances of both classifiers. Other findings include the application of Cosine Similarity as the distance measure has improved the performance of the k-NN classifier.
Rights: This is the author-created version of Springer, Rayner Alfred, Wong Wei Yee, Yuto Lim, Joe Henry Obit, Communications in Computer and Information Science, 652, 2016, 289-299. The original publication is available at www.springerlink.com, http://dx.doi.org/10.1007/978-981-10-2777-2_26
URI: http://hdl.handle.net/10119/14737
Material Type: author
Appears in Collections:b10-1. 雑誌掲載論文 (Journal Articles)

Files in This Item:

File Description SizeFormat
22897.pdf306KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology