JAIST Repository >
School of Knowledge Science >
JAIST Research Reports >
Research Report - School of Knowledge Science: ISSN 1347-1570 >
KS-RR-2009 >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10119/8449

Title: Generalized kernel canonical correlation analysis : criteria and low rank kernel learning
Authors: Nguyen, Canh Hao
Ho, Tu Bao
Renders, Jean-Michel
Cancedda, Nicola
Issue Date: 2009-02-20
Publisher: 北陸先端科学技術大学院大学知識科学研究科
Magazine name: Research report (School of Knowledge Science, Japan Advanced Institute of Science and Technology)
Volume: KS-RR-2009-002
Start page: 1
End page: 21
Abstract: Canonical Correlation Analysis is a classical data analysis technique for computing common correlated subspaces for two datasets. Recent advances in machine learning enable the technique to operate solely on kernel matrices, making it a kernel method with the advantages of modularity, efficiency and nonlinearity. Its performance is also improved with appropriate regularization and low-rank approximation methods, making it applicable to many practical applications. However, the classical technique is applicable to find correlation of only two datasets. It is a practical problem that we wish to consider correlation of more than two datasets at the same time. Such problems occurs in many domains such as multilingual text processing, where we wish to find a common representation of parallel document corpora from more than two languages altogether (we call this situation multiple view or multiview for short). Generalizing CCA to more than two views face some problems, namely: finding criteria for multiview CCA and available computational solutions for these criteria. In this report, we analyze the criteria that have been proposed to be objective functions for multi-view CCA. We obtain that only some of them are suitable for our purpose. In these criteria, only one of them, namely MAXVAR, has an efficient solution. We describe our algorithm for this criterion. We conduct experiments on a multi-lingual corpora. Experiment results show that multi-view CCA brings an advantage over two view CCA when there are not too many training data are available. We then show that low rank approximation of kernels are done independently from views. This could be a disadvantage as different views may be projected onto subspaces that may not result in correlation. We then propose a new incomplete Cholesky decomposition procedure that simultaneously decomposes all views. Experiment results show that the new ICD, by making sure the alignment of subspaces from different views, give a higher performance for multiview CCA when there are many views and a few dimensions for approximation.
URI: http://hdl.handle.net/10119/8449
Material Type: publisher
Appears in Collections:KS-RR-2009

Files in This Item:

File Description SizeFormat
KS-RR-2009-002.pdf30205KbAdobe PDFView/Open

All items in DSpace are protected by copyright, with all rights reserved.


Contact : Library Information Section, Japan Advanced Institute of Science and Technology