^{1}

^{*}

^{1}

^{2}

This paper studies the problem of tensor principal component analysis (PCA). Usually the tensor PCA is viewed as a low-rank matrix completion problem via matrix factorization technique, and nuclear norm is used as a convex approximation of the rank operator under mild condition. However, most nuclear norm minimization approaches are based on SVD operations. Given a matrix
, the time complexity of SVD operation is
*O*(
*mn ^{2})*
, which brings prohibitive computational complexity in large-scale problems. In this paper, an efficient and scalable algorithm for tensor principal component analysis is proposed which is called Linearized Alternating Direction Method with Vectorized technique for Tensor Principal Component Analysis (LADMVTPCA). Different from traditional matrix factorization methods, LADMVTPCA utilizes the vectorized technique to formulate the tensor as an outer product of vectors, which greatly improves the computational efficacy compared to matrix factorization method. In the experiment part, synthetic tensor data with different orders are used to empirically evaluate the proposed algorithm LADMVTPCA. Results have shown that LADMVTPCA outperforms matrix factorization based method.

A tensor is a multidimensional array. For example, a first-order tensor is a vector, a second-order tensor is a matrix, and tensors with three or higher-order are called higher-order tensors. Principal component analysis (PCA) finds a few linear combinations of the original variables. The PCA plays an important role in dimension reduction and data analysis related research areas [

The tensor PCA is of great importance in practice and has many applications, such as computer vision [

which is equivalent to

where

And

The above solution is called the leading PC. Once the leading PC is found, the other PCs can be computed sequentially via the so-called deflation technique. For example, the second PC could be gotten in the following ways: 1) Generate the first leading PC of the tensor, 2) Subtract the first leading PC of the tensor from the original tensor, 3) Generate the leading PC of the rest tensor. This leading PC is noted as the second PC of the original Tensor. And the rest PCs could be obtained in a similar way [

Scale | 100 × 100 | 1000 × 1000 | 1000 × 5000 | 5000 × 5000 | 5000 × 10,000 | 10,000 × 10,000 |
---|---|---|---|---|---|---|

Full SVD | 1.70e−3 | 0.13 | 1.26 | 22.38 | 55.64 | 263.03 |

Leading PC | 4.13e−4 | 0.01 | 0.09 | 0.52 | 1.13 | 2.27 |

although more iterations are needed for greedy atom decomposition methods to reach convergence, their total computational costs are much less compared with SVD based matrix completion methods. Thus, in the rest of this paper, we focus on finding the leading PC of a tensor.

If

In fact, the algorithm for supersymmetric tensor PCA problem can be extended to tensors that are not super-symmetric [

Another research line, like CANDECOMP (canonical decomposition) and PARAFAC (parallel factors) propose imposing rank-one constraint of tensor to realize the tensor decomposition:

where

equality constraint is due to the fact that

The difficulty of problem (6) lies in the dealing of the rank constraint

To avoid the matrix SVD operation, we reformulate the problem (5) with vectorized technique, and consider the following optimization problem:

where

The rest of this paper is organized as follows. In Section 2, a brief review of LADM algorithm is firstly given. And then, the detailed description of using LADM with vectorized technique to solve tensor principal component problem is presented. Section 3 is the experiment part, in which synthetic tensor data with different orders are used to empirically evaluate the proposed algorithm LADMVTPCA. The last section gives concluding remarks.

In this section, we first review the Linearized Alternating Direction Method of Multipliers (LADM) [

Considering the convex optimization problem,

where

where the augmented Lagrangian function

The penalty parameter

Then the LADM algorithm solves problem (8) by generating a sequence

The framework of linearized ADMM is given in Algorithm 2.

In the following of this section, we present the linearized ADM method to solve the leading principal component problem. Without loss of generality, we consider a 4-th order tensor in this section for the leading principal component problem with rank-one constraint, we formulated the original problem as problem (17).

where

where

where

In the following part, we show how to solve these subproblems in the algorithm through a linearized way with vectorized technique. After all these subproblems solved, we will give the framework of the algorithm that summarize our algorithm for solving (18) in Algorithm 3. In order to achieve the saddle fast and improve the quality of the solution, we adjust the parameter

In a traditional way,

We slightly modify the above LADM algorithm by imposing a proximal term

The minimum value will be obtained while the derivative of

where

In this subsection, we report the numerical experiments and results on Algorithm 3 to solve the tensor leading PC problem (18). As the ADMPCA [

We apply our approach to synthetic datasets. The data is generated with uniform distributed eigen vectors

We compare LADMVTPCA with ADMPCA for solving problem (18). In

ADMPCA | LADMVTPCA | ||||
---|---|---|---|---|---|

Inst.# | objDiff. | objVal | Time | objVal | Time |

Dimension n = 4 | |||||

1 | 2.92e−06 | 1.07e+02 | 6.62e-01 | 1.07e+02 | 2.00e−03 |

2 | 7.29e−04 | 1.00e+02 | 6.57e-01 | 1.00e+02 | 3.01e−03 |

3 | 3.73e−04 | 1.00e+02 | 7.02e-01 | 1.00e+02 | 1.01e−03 |

4 | 4.57e−05 | 1.00e+02 | 6.85e-01 | 1.00e+02 | 1.92e−03 |

Dimension n = 8 | |||||

1 | 2.64e−06 | 1.00e+02 | 4.55e+00 | 1.00e+02 | 3.01e−03 |

2 | 1.66e−08 | 1.00e+02 | 4.48e+00 | 1.00e+02 | 3.00e−03 |

3 | 1.58e−05 | 1.00e+02 | 4.16e+00 | 1.00e+02 | 3.00e−03 |

4 | 1.32e−07 | 1.00e+02 | 4.42e+00 | 1.00e+02 | 3.00e−03 |

Dimension n = 16 | |||||

1 | 2.89e−04 | 1.04e+02 | 4.45e+01 | 1.04e+02 | 1.10e−02 |

2 | 2.13e−07 | 1.00e+02 | 4.65e+01 | 1.00e+02 | 7.00e−03 |

3 | 1.29e−05 | 1.00e+02 | 4.61e+01 | 1.00e+02 | 1.30e−02 |

4 | 2.63e−07 | 1.00e+02 | 4.64e+01 | 1.00e+02 | 1.18e−02 |

Dimension n = 32 | |||||

1 | 3.40e−09 | 1.00e+02 | 2.48e+03 | 1.00e+02 | 6.77e−01 |

2 | 8.57e−09 | 1.00e+02 | 2.40e+03 | 1.00e+02 | 7.19e−01 |

3 | 4.69e−06 | 1.00e+02 | 2.38e+03 | 1.00e+02 | 6.26e−01 |

3 | 2.85e−08 | 1.00e+02 | 2.43e+03 | 1.00e+02 | 6.80e−01 |

denote the CPU times (in seconds) of ADMPCA and LADMVTPCA, respectively. From

Tensor PCA is an emerging area of research with many important applications in image processing, data analysis, statistical learning, and bio-informatics. In this paper, we propose a new efficient and scalable algorithm for tensor principal component analysis called LADMVTPCA. A vectorized technique is introduced in the processing procedure and linear alternating direction method is used to solve the optimization problem. LADMVTPCA provides an efficient way to compute the leading PC. We empirically evaluate the proposed algorithm on synthetic tensor data with different orders. Results have shown that LADMVTPCA has much better computational cost beyond matrix factorization based method. Especially for large-scale problems, matrix factorization based method is much more time-consuming than our method.

Fan, H.Y., Kuang, G.Y. and Qiao, L.B. (2017) Fast Tensor Principal Component Analysis via Proximal Alternating Direction Method with Vectorized Technique. Applied Mathematics, 8, 77-86. http://dx.doi.org/10.4236/am.2017.81007