^{1}

^{*}

^{1}

^{*}

For the
case where the multivariate normal population does not have null correlations,
we give the exact expression of the distribution of the sample matrix of
correlations *R*, with the sample
variances acting as parameters. Also, the distribution of its determinant is
established in terms of Meijer G-functions in the null-correlation case.
Several numerical examples are given, and applications to the concept of system
de- pendence in Reliability Theory are presented.

The correlation matrix plays an important role in multivariate analysis since by itself it captures the pairwise degrees of relationship between different components of a random vector. Its presence is very visible in Principal Component Analysis and Factor Analysis, where in general, it gives results different from those obtained with the covariance matrix. Also, as a test criterion, it is used to test the independence of variables, or subsets of variables ([

In a normal distribution context, when the population correlation matrix

As explained in [

Let us consider a random vector X with mean

of pairwise covariances between components in the matrix.

We obtain the population correlation matrix

For a sample of size n of observations from

(or matrix of sums of squares and products,

It is noted that by considering the usual sample covariance matrix

The

In the bivariate case, Hotelling’s expression [

using the modified estimator

Several efforts have been carried out in the past to obtain the exact form of the density of R for the general case, where

the function

pressed as an infinite series, has a much more complicated expression. Schott ([

We begin with the case of

By showing that the joint density of the diagonal elements

We can also show that (2) is a density, i.e. it integrates to 1 within its definition domain, by using the approach given in Mathai and Haubold ([

In what follows, following Kshirsagar’s approach [

Let

where

PROOF: Let us consider the “adjusted sample covariance matrix” S, given by (1). We know that

with C being an appropriate constant.

Our objective is to find the joint density of R with a set of variables

Let

diagonal

where

We set

with

where

The joint density of

R and

To integrate out the vector

(This integral is denoted

Hence, if

with

Consider the quadratic form:

and

Changing to

Since each integral is a gamma density in

QED.

Alternately, using the corresponding correlation coefficients, we have:

1) For

Hence,

as in (2) since we now have

Here, only the value of p needs to be known and this explains why expression (2) depends only on n and p. Also, as pointed out by Muirhead ([

2) Expression (3) can be interpreted as the density of R when

a)

b) (R;

The distribution of R is then

However, a closed form for this mixture is often difficult to obtain.

Alternately,

and off-diagonal sample correlations,

3) We have, using (3), and the relation

Expression (8) gives the positive numerical value of

1) The distribution of the sample coefficient of correlation in the bivariate normal case can be determined fairly directly when integrating out

Using geometric arguments, with

2) Using Equation (3) on the sample correlation matrix

Together with p = 2, we can also arrive at one of these forms (see also Section 4.3, where the determinant of R provides a more direct approach).

3) Although Fisher [

esting results on

example, in the case all_{ij} is found to be a function of p, having a maximum at p = 6. In the case all

For the case n = 2, and p = 3,

V = generalized volume defined by

A matrix equation such as (3) can be difficult to visualize numerically, especially when the dimensions are high, i.e.

taken from our analysis of Fisher’s iris data [_{1} = sepal length, x_{2} = sepal with, x_{3} = petal length and x_{4} = petal width. It gives the population correlation matrix

where all

Note: A special approach to graphing distributions of covariance matrices, using the principle of decomposing a matrix into scale parameters and correlations, is presented in: Tokuda, T., Goodrich, B., Van Mechelen, I., Gelman, A. and Tuerlinckx, F., Visualizing Distributions of Covariance Matrices (Document on the Internet).

It is also mentioned there that for the Inverted Wishart case, with

where

Similarly, the same application above gives the approximate simulated distribution of

Using expression (7), which exhibits explicitely r_{ij}, and replacing r_{ij} by the corrected value

the unbiased estimator of

In the proof of Theorem 1, we have established that

Simulated density of

Simulated densities of

Using the above matrix

we compute directly the left side by numerical integration, and the right side by using the algebraic expression. The results are extremely close to each other, with both around the numerical value 1.238523012 × 10^{5}.^{}

First, let det(R) be denoted by

Theoretically, we can obtain the density of

When considering Meijer G-functions and their extensions, Fox’s H-functions [

the Meijer function

the integral along the complex contour L of a rational expression of Gamma functions

It is a special case, when

Under some fairly general conditions on the poles of the gamma functions in the numerator, the above integrals exist.

THEOREM 2: When

From (2) the moments of order t of

these products, we can see that

the product of k independent betas,

Hence, we have here, the density of

where

And, hence, we obtain:

QED.

The density of ^{th} and 97.5^{th} percentiles can be found to be 0.04697 and 0.7719 respectively.

Let

THEOREM 3: Let

where

PROOF: Immediate by using multiplication of G-densities presented in [

QED.

Using again results presented in [

1) Bivariate normal case: a) for the bivariate case we have

which is the G-function form of the beta

Density of product of independent correlation determinants

for

Pitman [

2) When

where

3) Mixture of Normal Distributions: With X coming now from the mixture:

G-functions, but for the case

For the bivariate case, Nagar and Castaneda [

Correlation is useful in multiple regression analysis, where it is strongly related to collinearity. As an example of how individual correlation coefficients are used in regression, the variance inflation factor (VIF), well adopted now in several statistical softwares, measures how much the variance of a coefficient is increased by collinearity, or in other words, how much of the variation in one independent variable is explained by the others. For the j-th variable,

When all correlation measures are considered together, measuring intercorrelation by a single number has been approached in different ways by various authors. Either the value of

Although the notion of independence between different components of a system is of widespread use in the study of the system structure, reliability and performance, its complement, the notion of dependence has been a difficult one to deal with. There are several dependence concepts, as explained by Jogdev [

When considering only two variables, several measures of dependence have also been suggested in the literature (Lancaster [

matrix associated with a sample of n observations of the p-component system. In the general case this estimation question is still unresolved, except for the binormal case,

In the language of Reliability Theory, a p-component normal system is fully statistically independent when the

THEOREM 4: 1) Let the fully statistically independent system

a) Then the distribution of the sample coefficient of inner dependence

b) For the two-component case (p = 2), we have:

2) For a non-fully independent two-component binormal system

where

PROOF: a) For

Expression (18) is obtained from (15) by the change of variable

Numerical computations give

In this article we have established an original expression for the density of the correlation matrix, with the sample variances as parameters, in the case of the multivariate normal population with non-identity population correlation matrix. We have, furthermore, established the expression of the distribution of the determinant of that random matrix in the case of identity population correlation matrix, and computed its value. Applications are

Density (17) for sample coefficient of inner depen- dence

Density (19) of sample measure of a binormal system dependence,

made to the dependence among p components of a system. Also, expressions for the densities of a sample measure of a system inner dependence are established.