Sufficient Component Analysis (SCA)


The purpose of sufficient dimension reduction (SDR) is to find the low-dimensional subspace of input features that is sufficient for predicting output values. In this software, we provide a novel distribution-free SDR method called sufficient component analysis (SCA), which is computationally more efficient than existing methods. In our method, a solution is computed by iteratively performing dependence estimation and maximization: Dependence estimation is analytically carried out by recently-proposed least-squares mutual information (LSMI), and dependence maximization is also analytically carried out by utilizing the Epanechnikov kernel.

Main Idea

The goal of SDR is to find a low-dimensional representation {boldmath z} = {boldmath Wx} of input {boldmath x} that is sufficient to describe output boldmath{y}. More precisely, we find {boldmath z} such that

{boldmath y} mathop{perp!!!perp} {boldmath x}~|~{boldmath z},

meaning that, given the projected feature {boldmath z}, the feature {boldmath x} is conditionally independent of output {boldmath y}.

In SCA, the optimal transformation matrix that leads to {boldmath y} mathop{perp!!!perp} {boldmath x}~|~{boldmath z} can be characterized as

{boldmath W^ast} = mathop{textnormal{argmax}}_{boldmath W}~ textnormal{SMI}(Z, Y),~~textnormal{s.t.}~{boldmath W}{boldmath W}^top = {boldmath I}_m.

where {mathrm{SMI}}(Z,Y) is the squared-loss mutual information:

{mathrm{SMI}}(Z,Y) := frac{1}{2} int!!!int  left( frac{p_{zy}({boldmath z},{boldmath y})}{p_y({boldmath y})p_z({boldmath z})}  - 1right)^2 textnormal{d}{boldmath z} textnormal{d}{boldmath y}.


  • All the model parameters are automatically tuned by cross-validation.

  • It scales to the large dataset.



I am grateful to Prof. Masashi Sugiyama for his support in developing this software.


I am happy to have any kind of feedbacks. E-mail: yamada AT sg DOT cs DOT titech DOT ac DOT jp


Yamada, M.*, Niu, G., Takagi, J. & Sugiyama, M.
Computationally efficient sufficient dimension reduction via squared-loss mutual information.
In C.-N. Hsu and W. S. Lee (Eds.), Proceedings of the Third Asian Conference on Machine Learning (ACML2011), JMLR Workshop and Conference Proceedings, vol.20, pp.247-262, Taoyuan, Taiwan, Nov. 13-15, 2011.