|
JAIST Repository >
b. 情報科学研究科・情報科学系 >
b10. 学術雑誌論文等 >
b10-1. 雑誌掲載論文 >
このアイテムの引用には次の識別子を使用してください:
http://hdl.handle.net/10119/14737
|
タイトル: | Factors affecting sentiment prediction of malay news headlines using machine learning approaches |
著者: | Alfred, Rayner Wong, Wei Yee Lim, Yuto Obit, Joe Henry |
キーワード: | Topic Selection Feature Extraction Classification Clustering |
発行日: | 2016-09-18 |
出版者: | Springer |
誌名: | Communications in Computer and Information Science |
巻: | 652 |
開始ページ: | 289 |
終了ページ: | 299 |
DOI: | 10.1007/978-981-10-2777-2_26 |
抄録: | Malay language is a major language that is in used by citizens of Malaysia, Indonesia, Singapore and Brunei. As the language is widely used, there are abundant of text articles written in Malay language that are available on the internet. This has resulted in the increasing of the Malay articles published online and the number of articles has increased greatly over the years. Automatically labeling Malay text articles is crucial in managing these articles. Due to lack of resources and tools used to perform the topic selection automatically for Malay text articles, this paper studies the factors that influence the performances of the algorithms that can be applied to perform a topic selection automatically for Malay articles. This is done by comparing the contents of the articles with the corresponding topics and all Malay articles will be assigned to the appropriate topics depending on the results of the classification process. In this paper、 all Malay articles will be classified by using the k-Nearest Neighbors (k-NN) and Na?ve Bayes classifiers. Both classifiers are used to classify and assign a topic to these Malay articles according to a predefined set of topics. The effectiveness of classifying these Malay articles using the k-NN classifier is highly dependent on the distance methods used and the number of Nearest Neighbors, k. Thus, this paper also assesses the effects of using different distance methods (e.g., Cosine Similarity and the Euclidean Distance) and varying the number of clusters, k. Other than that, the effects of utilizing the stemming process on the performance of the classifiers are also studied. Based on the results obtained, the proposed approach shows that the k-NN classifier performs better than the Na?ve Bayes classifier in classifying the Malay articles into their respective topics. In addition to that, the stemming process also improves the overall performances of both classifiers. Other findings include the application of Cosine Similarity as the distance measure has improved the performance of the k-NN classifier. |
Rights: | This is the author-created version of Springer, Rayner Alfred, Wong Wei Yee, Yuto Lim, Joe Henry Obit, Communications in Computer and Information Science, 652, 2016, 289-299. The original publication is available at www.springerlink.com, http://dx.doi.org/10.1007/978-981-10-2777-2_26 |
URI: | http://hdl.handle.net/10119/14737 |
資料タイプ: | author |
出現コレクション: | b10-1. 雑誌掲載論文 (Journal Articles)
|
このアイテムのファイル:
ファイル |
記述 |
サイズ | 形式 |
22897.pdf | | 306Kb | Adobe PDF | 見る/開く |
|
当システムに保管されているアイテムはすべて著作権により保護されています。
|