JAIST Repository >
a. 知識科学研究科・知識科学系 >
a30. リサーチレポート >
Research Report - School of Knowledge Science: ISSN 1347-1570 >
KS-RR-2009 >

このアイテムの引用には次の識別子を使用してください: http://hdl.handle.net/10119/8449

タイトル: Generalized kernel canonical correlation analysis : criteria and low rank kernel learning
著者: Nguyen, Canh Hao
Ho, Tu Bao
Renders, Jean-Michel
Cancedda, Nicola
発行日: 2009-02-20
出版者: 北陸先端科学技術大学院大学知識科学研究科
誌名: Research report (School of Knowledge Science, Japan Advanced Institute of Science and Technology)
巻: KS-RR-2009-002
開始ページ: 1
終了ページ: 21
抄録: Canonical Correlation Analysis is a classical data analysis technique for computing common correlated subspaces for two datasets. Recent advances in machine learning enable the technique to operate solely on kernel matrices, making it a kernel method with the advantages of modularity, efficiency and nonlinearity. Its performance is also improved with appropriate regularization and low-rank approximation methods, making it applicable to many practical applications. However, the classical technique is applicable to find correlation of only two datasets. It is a practical problem that we wish to consider correlation of more than two datasets at the same time. Such problems occurs in many domains such as multilingual text processing, where we wish to find a common representation of parallel document corpora from more than two languages altogether (we call this situation multiple view or multiview for short). Generalizing CCA to more than two views face some problems, namely: finding criteria for multiview CCA and available computational solutions for these criteria. In this report, we analyze the criteria that have been proposed to be objective functions for multi-view CCA. We obtain that only some of them are suitable for our purpose. In these criteria, only one of them, namely MAXVAR, has an efficient solution. We describe our algorithm for this criterion. We conduct experiments on a multi-lingual corpora. Experiment results show that multi-view CCA brings an advantage over two view CCA when there are not too many training data are available. We then show that low rank approximation of kernels are done independently from views. This could be a disadvantage as different views may be projected onto subspaces that may not result in correlation. We then propose a new incomplete Cholesky decomposition procedure that simultaneously decomposes all views. Experiment results show that the new ICD, by making sure the alignment of subspaces from different views, give a higher performance for multiview CCA when there are many views and a few dimensions for approximation.
URI: http://hdl.handle.net/10119/8449
資料タイプ: publisher


ファイル 記述 サイズ形式
KS-RR-2009-002.pdf30205KbAdobe PDF見る/開く



お問い合わせ先 : 北陸先端科学技術大学院大学 研究推進課図書館情報係