The positive-definiteness and sparsity are the most important property of high-dimensional precision matrices. To better achieve those property, this paper uses a sparse lasso penalized D-trace loss under the positive-definiteness constraint to estimate high-dimensional precision matrices. This paper derives an efficient accelerated gradient method to solve the challenging optimization problem and establish its converges rate as . The numerical simulations illustrated our method have competitive advantage than other methods.
In the past twenty years, the most popular direction of statistics is high- dimensional data. In functional magnetic resonance imaging (FMRI), bioin- formatics, Web mining, climate research, risk management and social science, it not only has a wide range of applications, but also the main direction of scientific research at present. In theoretical and practical, high-dimensional precision matrix estimation always plays a very important role and has wide applications in many fields.
Thus, estimation of high-dimensional precision matrix is increasingly becoming a crucial question in many field. However, estimation of high- dimensional precision matrix has two difficulty: 1) sparsity of estimator; (ii) the positive-definiteness constraint. Huang et al. [
To overcome the difficulty (ii), one possible method is using the eigen- decomposition of Θ ^ and designing Θ ^ to satisfy condition { Θ ≥ 0 } . Assume that Θ ^ has the eigen-decomposition Θ ^ = ∑ i = 1 p λ i v i T v i and then a positive semi- definite estimator was gained by setting Θ ^ = ∑ i = 1 p max ( λ i , 0 ) v i T v i . However, this strategy destroys the sparsity pattern of Θ ^ for sparse precision matrix estimation. Yuan et al. [
Recently, Zhang et al. [
Θ ^ + = arg min Θ ≥ ε I 1 2 〈 Θ 2 , Σ ^ n 〉 − tr ( Θ ) + λ ‖ Θ ‖ 1, off (1)
It is important to note that ε is not a tuning parameter like λ . We simply include ε in the procedure to ensure that the smallest eigenvalue of the estimator is at least ε . They developed an efficient alternating direction method of multipliers (ADMM) to solve the challenging optimization problem (1) and establish its convergence properties.
To gain a better estimator for high-dimensional precision matrix and achieve the more optimal convergence rate, this paper mainly propose an effective algorithm, an accelerated gradient method ( [
rate as O ( 1 k 2 ) .
The paper is organized as follows: Section 2 introduces our methodology, including model establishing in Section 2.1; step size estimation in Section 2.2; an accelerate gradient method algorithm in Section 2.3; the convergence analysis results of this algorithm in Section 2.4. Section 3 introduced numerical results for our method in comparing with other methods. And discussion are made in Section 4. All proofs are given in the Appendix.
According to introduction, our optimization problem D-trace Loss function as follow:
min Θ ≥ ε I F ( Θ ) : = 1 2 〈 Θ 2 , Σ ^ n 〉 − tr ( Θ ) + λ ‖ Θ ‖ 1,off (2)
where λ is a nonnegative penalization parameter, Σ ^ n is the sample cova-
riance matrix. Σ ^ n = 1 n ∑ i = 1 n X i X i T , and | Θ | 1 = ‖ Θ ‖ 1 , o f f = ∑ i ≠ j ‖ Θ i j ‖ is the l 1 off- diagonal penalty. Defining f ( Θ ) = 1 2 〈 Θ 2 , Σ ^ n 〉 − tr ( Θ ) , and f ( Θ ) is a con-
tinuously differentiable function. Considering the gradient step
Θ k = Θ k − 1 − 1 t k ∇ f ( Θ k − 1 ) (3)
where t k ≥ 0 is a stepsize, ∇ f ( Θ ) = 1 2 ( Θ Σ ^ n + Σ ^ n Θ ) − I . The smooth part (3)
can be reformulated equivalent as a proximal regularization of the linearized function f ( Θ ) at Θ k − 1 :
Θ k = arg min Θ ≥ ε I Φ t k ( Θ , Θ k − 1 ) (4)
where
Φ t k ( Θ , Θ k − 1 ) = f ( Θ k − 1 ) + 〈 Θ − Θ k − 1 , ∇ f ( Θ k − 1 ) 〉 + t k 2 ‖ Θ − Θ k − 1 ‖ F 2 (5)
Based on this equivalence relationship, solving the optimization problem (2) by the following iterative step:
Θ k = arg min Θ ≥ ε I Ψ t k ( Θ , Θ k − 1 ) ≜ arg min Θ ≥ ε I Φ t k ( Θ , Θ k − 1 ) + λ | Θ | 1 = arg min Θ ≥ ε I 1 2 〈 Θ k − 1 2 , Σ ^ n 〉 − tr ( Θ k − 1 ) + 〈 Θ − Θ k − 1 , 1 2 ( Θ k − 1 Σ ^ n + Σ ^ n Θ k − 1 ) − I 〉 + t k 2 ‖ Θ − Θ k − 1 ‖ F 2 + λ | Θ | 1 = arg min Θ ≥ ε I t k 2 ‖ Θ − [ Θ k − 1 − 1 t k ( 1 2 ( Θ k − 1 Σ ^ n + Σ ^ n Θ k − 1 ) − I ) ] ‖ F 2 + λ | Θ | 1 (6)
with equality in the last line by ignoring terms that do not depend on Θ .
Defining ( C ) + as the projection of a matrix C onto the convex cone { C ≥ ε I } . Assuming that C has the eigen-decomposition ∑ j = 1 p λ j v j T v j , and then ( C ) + can be obtained as ∑ j = 1 p max ( λ j , ε ) v j T v j . Defining an entry-wise soft-thresholding rule for all the off-diagonal elements of a matrix
S ( Z , τ ) = { s ( z j l , τ ) } 1 ≤ j , l ≤ p with
s ( z j l , τ ) = sign ( z j l ) max ( | z j l | − τ ,0 ) I { j ≠ l } + z j l I { j = l } . Thus, the above problem can be summarized in the following theorem:
Theorem 1: Let B ∈ ℝ n × n , and Θ is symmetric covariance matrix, then:
S ( B ) = arg min Θ ≥ ε I { 1 2 ‖ Θ − B ‖ F 2 + λ | Θ | 1 } (7)
is given by S ( B ) = ( S ( B , λ ) ) + , where S ( B , λ ) = { s ( B j l , λ ) } 1 ≤ j , l ≤ p with s ( B j l , λ ) = sign ( B j l ) max ( | B j l | − λ ,0 ) I { j ≠ l } + B j l I { j = l } .
The proof of this theorem is easy by applying the soft-thresholding method.
To guarantee the convergence rate of the resulting iterative sequence, Firstly giving the relationship between our proximal function Ψ t k and the objection function F at the certain point.
Lemma 1: Let
T μ ( Θ ˜ ) = arg min Θ ≥ ε I Ψ μ ( Θ , Θ ˜ ) (8)
where Ψ is defined in Equation (6). Assuming the following inequality holds:
F ( T μ ( Θ ˜ ) ) ≤ Ψ u ( T μ ( Θ ˜ ) , Θ ˜ ) (9)
then for any Θ ∈ ℝ n × n , then:
F ( Θ ) − F ( T μ ( Θ ˜ ) ) ≥ μ 2 ‖ T μ ( Θ ˜ ) − Θ ˜ ‖ F 2 + μ 〈 Θ ˜ − Θ , T μ ( Θ ˜ ) − Θ ˜ 〉 (10)
This lemma is proved in the Appendix.
At each iterative step of the algorithm, an appropriate step size μ is needed to satisfy Θ k = T u ( Θ k − 1 ) and
F ( Θ k ) ≤ Ψ μ ( Θ k , Θ k − 1 ) (11)
Since the gradient of f(・) satisfies Lipschitz continuous, according to Nesterov et al. [
Lemma 2: Supposing that f ( X ) is a convex function, and the gradient of f ( X ) denote ∇ f ( X ) is Lipschitz continuous with constant L , then:
f ( X ) ≤ f ( Y ) + 〈 X − Y , ∇ f ( Y ) 〉 + L 2 ‖ X − Y ‖ F 2 ∀ X , Y ∈ ℝ n (12)
so, we have
f ( T L ( Θ ˜ ) ) ≤ f ( Θ ˜ ) + 〈 T L ( Θ ˜ ) − Θ ˜ , ∇ f ( Θ ˜ ) 〉 + L 2 ‖ T L ( Θ ˜ ) − Θ ˜ ‖ F 2 (13)
Hence, when μ ≥ L , then:
F ( T μ ( Θ ˜ ) ) ≤ Φ μ ( T μ ( Θ ˜ ) , Θ ˜ ) + λ | T μ ( Θ ˜ ) | 1 = Ψ μ ( T μ ( Θ ˜ ) , Θ ˜ ) (14)
The above results show that the condition in Equation (11) is always satisfied when the update rule
Θ k = T L ( Θ k − 1 ) (15)
In practice, L may be unknown or it is expensive to compute. To use the following step size estimation method, usually, giving an initial estimate of L as L 0 and increasing this estimate with a multiplicative factor γ > 1 re- peatedly until the condition in Equation (11) is satisfied. It is well known ( [
method can achieve the optimal convergence rate of O ( 1 k 2 ) . Recently, there
have other similar methods applying in problems consisted a smooth part and a non-smooth part ( [
Algorithm1:An accelerate gradient method algorithm for high-dimensional precision matrix
1) Initialize: L 0 , γ , Θ ˜ 1 = Θ 0 ∈ ℝ n × n , α 1 = 1
2) Iterate:
3) set L ¯ = L k − 1
4) While F ( T L ¯ ( Θ ˜ k − 1 ) ) > Ψ L ¯ ( T L ¯ ( Θ ˜ k − 1 ) , Θ ˜ k − 1 ) , set L ¯ : = γ L ¯
5) Set L k = L ¯ and update
Θ k = T L k ( Θ ˜ k )
α k + 1 = 1 + 1 + 4 α k 2 2
Θ ˜ k + 1 = Θ k + ( α k − 1 α k + 1 ) ( Θ k − Θ k − 1 )
In our method, two sequences Θ k and Θ ˜ k are updated recursively. In particular, Θ k is the approximate solution at the kth step and Θ ˜ k is called the search point ( [
vergence rate of the method can be showed as O ( 1 k 2 ) . This result is sum-
marized in the following theorem.
Theorem 2: Let { Θ k } and { Θ ˜ k } be the covariance matrices sequence generated by our algorithm. Then for any k ≥ 1 , having
F ( Θ k ) − F ( Θ * ) ≤ 2 γ L ‖ Θ 0 − Θ * ‖ F 2 ( k + 1 ) 2 (16)
where Θ * = arg min Θ ≥ ε I F ( Θ ) .
In this section, providing numerical results for our algorithm which will show our algorithmic advantages by three model. In the simulation study, data were generated from N ( 0 , Σ 0 ) , where Θ 0 = ( Σ 0 ) − 1 . And the sample size was taken to be n = 400 in all models, and let p =500 in Models 1 and 2, and p = 484 in Model 3, which is similar to Zhang et al. [
Model 1: Θ i , i 0 = 1 , Θ i , j 0 = 0.2 for 1 ≤ | i − j | ≤ 2 and Θ i , j 0 = 0 otherwise.
Model 2: Θ i , i 0 = 1 , Θ i , j 0 = 0.2 for 1 ≤ | i − j | ≤ 4 and Θ i , j 0 = 0 otherwise.
Model 3: Θ i , i 0 = 1 , Θ i , i + 1 0 = 0.2 for mod ( i , p 1 / 2 ) ≠ 0 , Θ i , i + p 1 / 2 0 = 0.2 and Θ i , j 0 = 0 otherwise; this is the grid model in Ravikumar et al. [
Simulation results based on 100 independent replications are showed in
operator risk E ( ‖ Θ ^ − Θ 0 ‖ 2 ) , the matrix l 1, ∞ risk E ( ‖ Θ ^ − Θ 0 ‖ 1, ∞ ) , and the
percentages of correctly estimated nonzeros and zeros (TP and TN), where l 1, ∞ norm max i ( ∑ j | X i , j | ) is written as ‖ X ‖ 1 , ∞ . In the first two columns smaller numbers are better; in the last two columns larger numbers are better. In general,
This paper mainly estimate positive-definite sparse precision matrix estimation
Operator | TP | TN | ||
---|---|---|---|---|
Model 1 | ||||
our method | 0.77 | 0.80 | 94.91 | 99.20 |
Zhang et al.’s method | 0.77 | 1.06 | 88.80 | 98.77 |
Graphical lasso | 0.78 | 1.26 | 88.12 | 97.65 |
Model 2 | ||||
our method | 1.59 | 1.60 | 70.02 | 98.40 |
Zhang et al.’s method | 1.59 | 1.92 | 63.47 | 98.66 |
Graphical lasso | 1.61 | 2.11 | 64.88 | 97.40 |
Model 3 | ||||
our method | 0.56 | 0.81 | 1 | 99.21 |
Zhang et al.’s method | 0.56 | 0.91 | 99.41 | 98.57 |
Graphical lasso | 0.58 | 1.06 | 99.76 | 97.48 |
via lasso penalized D-trace loss by an efficient accelerated gradient method. The positive-definiteness and sparsity are the most important property of large covariance matrices, our method not only efficiently achieves these property, but also shows an better convergence rate. Numerical results have show that our estimator also have a better performance, comparing to Zhang et al.’s method and the Graphical lasso method.
This project was supported by National Natural Science Foundation of China (71601003) and the National Statistical Scientific Research Projects (2015LZ54).
Xia, L., Huang, X.D., Wang, G.P. and Wu, T. (2017) Positive-Definite Sparse Precision Matrix Estimation. Advances in Pure Mathematics, 7, 21-30. http://dx.doi.org/10.4236/apm.2017.71002
Since that both the trace function and l 1 norm are all convex function, so
1 2 〈 Θ 2 , Σ ^ n 〉 − tr ( Θ ) = 1 2 tr ( Θ 2 Σ ^ n ) − tr ( Θ ) ≥ 1 2 tr ( Θ ˜ 2 Σ ^ n ) − tr ( Θ ˜ ) + 〈 Θ − Θ ˜ , 1 2 ( Θ ˜ Σ ^ n + Σ ^ n Θ ˜ ) − I 〉 (17)
λ | Θ | 1 ≥ λ | T L ( Θ ˜ ) | 1 + λ 〈 Θ − T L ( Θ ˜ ) , g ( T L ( Θ ˜ ) ) 〉 (18)
where g ( T L ( Θ ˜ ) ) ∈ ∂ ‖ T L ( Θ ˜ ) ‖ 1 , is the sub-gradient of l 1 norm at point T L ( Θ ˜ ) .
Since F ( T μ ( Θ ˜ ) ) ≤ Ψ μ ( T μ ( Θ ˜ ) , Θ ˜ ) and combing in Equations (17), (18) then
F ( Θ ) − F ( T L ( Θ ˜ ) ) ≥ F ( Θ ) − Ψ L ( T L ( Θ ˜ ) , Θ ˜ ) ≥ 〈 Θ − Θ ˜ , 1 2 ( Θ ˜ Σ ^ n + Σ ^ n Θ ˜ ) − I 〉 + λ 〈 Θ − T L ( Θ ˜ ) , g ( T L ( Θ ˜ ) ) 〉 − 〈 T L ( Θ ˜ ) − Θ ˜ , 1 2 ( Θ ˜ ∑ ^ n + Σ ^ n Θ ˜ ) − I 〉 − L 2 ‖ T L ( Θ ˜ ) − Θ ˜ ‖ F 2 = 〈 Θ − T L ( Θ ˜ ) , 1 2 ( Θ ˜ Σ ^ n + Σ ^ n Θ ˜ ) − I + λ g ( T L ( Θ ˜ ) ) 〉 − L 2 ‖ T L ( Θ ˜ ) − Θ ˜ ‖ F 2 (19)
Since T L ( Θ ˜ ) is a minimizer of Ψ L ( Θ , Θ ˜ ) , thus
1 2 ( Θ ˜ Σ ^ n + Σ ^ n Θ ˜ ) − I + L ( T L ( Θ ˜ ) − Θ ˜ ) + λ g ( T L ( Θ ˜ ) ) = 0 (20)
So the Equation (19) can be simplified as:
F ( Θ ) − F ( T L ( Θ ˜ ) ) ≥ 〈 Θ − T L ( Θ ˜ ) , 1 2 ( Θ ˜ Σ ˜ n + Σ ˜ n Θ ˜ ) − I + λ g ( T L ( Θ ˜ ) ) 〉 − L 2 ‖ T L ( Θ ˜ ) − Θ ˜ ‖ F 2 = 〈 Θ − T L ( Θ ˜ ) , − L ( T L ( Θ ˜ ) − Θ ˜ ) 〉 − L 2 ‖ T L ( Θ ˜ ) − Θ ˜ ‖ F 2 = L 〈 Θ ˜ − Θ , T L ( Θ ˜ ) − Θ ˜ 〉 + L 2 ‖ T L ( Θ ˜ ) − Θ ˜ ‖ F 2 (21)
Appendix: Proof of Theorem 2Defining U k = α k Θ k − ( α k − 1 ) Θ k − 1 − Θ * , V k = F ( Θ k ) − F ( Θ * ) , easily obtaining
2 L k + 1 ( α k 2 V k 2 − α k + 1 2 V k + 1 ) ≥ ‖ U k + 1 ‖ F 2 − ‖ U k ‖ F 2 (22)
since L k + 1 ≥ L k . so
2 L k α k 2 V k − 2 L k + 1 α k + 1 2 V k + 1 ≥ ‖ U k + 1 ‖ F 2 − ‖ U k ‖ F 2 (23)
By applying Lemma 1, easily obtaining:
F ( Θ * ) − F ( Θ 1 ) = F ( Θ * ) − F ( T L 1 ( Θ ˜ 1 ) ) ≥ L 1 2 ‖ Θ 1 − Θ * ‖ F 2 − L 1 2 ‖ Θ ˜ 1 − Θ * ‖ F 2 (24)
so:
2 V 1 L 1 ≤ ‖ Θ ˜ 1 − Θ * ‖ F 2 − ‖ Θ 1 − Θ * ‖ F 2 (25)
Applying (23) (25), then:
V k + 1 ≤ L k + 1 2 α k + 1 2 ‖ Θ ˜ 1 − Θ * ‖ F 2 (26)
Combining the Equation (26) and the relation α k 2 ≥ ( k + 1 ) 2 4 , easily ob-
taining:
F ( Θ k ) − F ( Θ * ) ≤ 2 L k ‖ Θ 0 − Θ * ‖ F 2 ( k + 1 ) 2 ≤ 2 γ L ‖ Θ 0 − Θ * ‖ F 2 ( k + 1 ) 2 (27)
Submit or recommend next manuscript to SCIRP and we will provide best service for you:
Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.
A wide selection of journals (inclusive of 9 subjects, more than 200 journals)
Providing 24-hour high-quality service
User-friendly online submission system
Fair and swift peer-review system
Efficient typesetting and proofreading procedure
Display of the result of downloads and visits, as well as the number of cited articles
Maximum dissemination of your research work
Submit your manuscript at: http://papersubmission.scirp.org/
Or contact apm@scirp.org