Open Journal of Statistics
Vol.04 No.05(2014), Article ID:48773,13 pages
10.4236/ojs.2014.45037
Bayesian Analysis of Simple Random Densities
Paulo C. Marques F., Carlos A. de B. Pereira
Departamento de Estatística, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brasil
Email: pmarques@ime.usp.br, cpereira@ime.usp.br
Copyright © 2014 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/



Received 13 June 2014; revised 15 July 2014; accepted 21 July 2014
ABSTRACT
A tractable nonparametric prior over densities is introduced which is closed under sampling and exhibits proper posterior asymptotics.
Keywords:
Bayesian Nonparametrics, Bayesian Density Estimation, Random Densities, Random Partitions, Stochastic Simulations, Smoothing

1. Introduction
The early 1970’s witnessed Bayesian inference going nonparametric with the introduction of statistical models with infinite dimensional parameter spaces. The most conspicuous of these models is the Dirichlet process [1] , which is a prior on the class of all probability measures over a given sample space that trades great analytical tractability for a reduced support: as shown by Blackwell [2] , its realizations are, almost surely, discrete probability measures. The posterior expectation of a Dirichlet process is a probability measure that gives positive mass to each observed value in the sample, making the plain Dirichlet process unsuitable to handle inferential problems such as density estimation. Many extensions and alternatives to the Dirichlet process have been proposed [3] .
In this paper we construct a prior distribution over the class of densities with respect to Lebesgue measure. Given a partition in subintervals of a bounded interval of the real line, we define a random density whose realizations have a constant value on each subinterval of the partition. The distribution of the values of the random density on each subinterval is specified by transforming and conditioning a multivariate normal distribution.
Our simple random density is the finite dimensional analogue of the stochastic processes introduced by Thorburn [4] and Lenk [5] . Computations with these stochastic processes involve an intractable normalization constant, and are restricted to values of the random density on a finite number of arbitrarily chosen domain points, demanding some kind of interpolation of the results. The finite dimensionality of our random density makes our computations more direct and transparent and gives us simpler statements and proofs.
An outline of the paper is as follows. In Section 2, we give the formal definition of a simple random density. In Section 3, we prove that the distribution of a simple random density is closed under sampling. The results of the simulations in Section 4 depict the asymptotic behavior of the posterior distribution. We extend the model hierarchically in Section 5 to deal with random partitions. Although the usual Bayes estimate of a simple random density is a discontinuous density, in Section 6 we compute smooth estimates solving a decision problem in which the states of nature are realizations of the simple random density and the actions are smooth densities of a suitable class. Additional propositions and proofs of all the results in the paper are given in Section 7.
2. Simple Random Densities
Let
be the probability space from which we induce the distributions of all random objects considered in the paper. For some integer
, let
be the subset of vectors of
with positive components. Write
for the Borel sigma-field of
. Let
denote Lebesgue measure over
. We omit the indexes when
. The components of a vector
are written as
.
Suppose that we have been given an interval
, and a set of real numbers
, such that
, inducing a partition of
into the
subintervals
The class of simple densities with respect to this partition consists of the nonnegative simple functions which have a constant value on each subinterval and integrate to one. Let




in which 



From now on, let














in which 



We define a random density whose realizations are simple densities with respect to the partition induced by 




A suitable family of measures that dominate the conditional distribution of U given

Lemma 2.1. Let 




The proof of Lemma 2.1 is given in Section 7. Figure 1 gives a simple geometric interpretation of the measures 
The following result is the basis for the formal definition of the random density.
Figure 1. Geometrical interpretation of the measures 





Theorem 2.2. Let




is a regular version of the conditional distribution of 

Moreover, 

The necessary lemmata and the proof of Theorem 2.2 are given in Section 7. The following definition of the random density uses the specific version of the conditional distribution constructed in Theorem 2.2.
Definition 2.3. Let


is a simple random density, in which 






in which 


3. Conditional Model
Now, we model a set of absolutely continuous observables conditionally, given the value of a simple random density
Lemma 3.1. Consider 
and let the random variables 

in which we have defined


Then, 

in which

The factorization criterion implies that 


Using the notation of Lemma 3.1, and defining

Theorem 3.2. If


This result, proved in Section 7, makes the simulations of the prior and posterior distributions essentially the same, the only difference being the computation of
4. Stochastic Simulations
We summarize the distribution of a simple random density


for




The Random Walk Metropolis algorithm [6] is used to draw dependent realizations of the steps of 
values of a Markov chain
example, the credible set is determined with the help of the almost sure convergence of
As for the parameters appearing in Definition 2.3, we take in our experiments all the



for



Example 4.1. Let 










we have in Figure 3 the posterior summaries for different sample sizes. Note the concentration of the posterior as we increase the size of the samples.
Figure 2. Effect of the value of ρ0 on the concentration of the prior. The curves in black are prior expectations and the gray regions are credible sets with credibility level of 95%.
Figure 3. Posterior summaries for Example 4.1. On each graph, the black simple density is the estimate
We observe the same asymptotic behavior of the posterior distribution with data coming from a triangular distribution and a mixture of normals (with appropriate truncation of the sample space).
5. Random Partitions
Inferentially, we have a richer construction when the definition of the simple random density involves a random partition. Informally, we want a model for the random density in which the underlying partition adapts itself according to the information contained in the data.
We consider a family of uniform partitions of a given interval


Explicitly, we are considering the following hierarchical model: K and R are independent. Given that 


induce 

In the following example, instead of specifying priors for K and R, we define the likelihood of K and R by

Example 5.1. With a sample of size 2000 generated from a 




6. Smooth Estimates
It is possible to go beyond the discontinuous densities obtained as estimates in the last two sections and get smooth estimates of a simple random density 

In view of Theorem 3.2, it is enough to consider the problem without data. As before, the sample space is the interval

denote its 

Proposition 6.1. For






Figure 4. Posterior summaries for Example 5.1. The black simple density is the estimate
Figure 5. Example 5.1. On the left graph, the black curve is the estimated distribution function 







Then, the Bayes decision is

subject to the constraints


We use the result of Proposition 6.1, proved in Section 7, choosing the
For the next example, suppose that the support of the densities is the interval
approximates uniformly any continuous function f defined on


we can rewrite the approximating polynomial as




Example 6.2. Suppose that we have a sample of 5000 data points simulated from a truncated exponential distribution, whose density is
Repeating the analysis made in Example 5.1, we find the maximum of the likelihood of K and R at 
7. Additional Results and Proofs
In this section we present some subsidiary propositions and give proofs to all the results stated in the paper.
Proposition 7.1. Let 




Proof. Define the set
by definition of


mensional hyperplane defined by the set A. Since
Proof of Lemma 2.1. When




Define 






Figure 6. Example 6.2. On the right graph, the black simple density is the estimate
Since



the definition of the inverse image of 






implying that









Lemma 7.2. Let

be a measure over





in which 

Proof. Define the function 






commutes, since


in which the fifth equality is obtained transforming by T, 


Lemma 7.3. Let 


in which 

Proof. Define 










that, by definition, 






if any of the

and again it happens that



in which 

Lemma 7.4. Let





Proof. Let



in which the penultimate equality follows from Lemma 7.2, and the last equality follows from Lemma 7.3. Hence, 
Proof of Theorem 2.2. Let 






in which we have used the Leibniz rule for the Radon-Nikodym derivatives. On the other hand, by Lemmas 7.2 and 7.3, we have that
with 


for almost every r




as desired. The fact that 
Proof of Lemma 3.1. Let 


each



Hence, 






Proof of Theorem 3.2. By Bayes Theorem, for each
in which we have used the expression of the likelihood obtained in Lemma 3.1, the Leibniz rule for the Radon-Nikodym derivatives, the expression of 


by definition, 





Defining
with



Using this result in the expression of 

in which 




Proposition 7.5. Suppose that the random variables 






1)

2)

Proof. By Definition 2.3, we have
in which 


For item 1), note that
in which the fourth equality follows from Tonelli’s Theorem. For item 2), for each
On the other hand, we have
in which the third equality follows from the hypothesis of conditional independence and Theorem B.61 of [9] , the fourth equality is a consequence of Theorem 2.6.4 of [8] , and the sixth equality is due to Tonelli’s Theorem. Comparing both expressions for
Proposition 7.6. Let 








for the 
Proof. Let 

On the other hand, by arguments similar to those used in the proof of Proposition 7.5, we have
Comparing both expressions for
almost surely
Proof of Proposition 6.1. By Tonelli’s Theorem, the expected loss is
in which we have defined the positive constant


in which we have used the linearity of the integral. Therefore, minimizing the expected loss is the same as solving the problem of constrained minimization of the quadratic form Q. For the matrix

in which we have used the linearity of the integral. Therefore, the matrix M is positive definite, yielding (see [10] ) that the quadratic form Q is convex and the problem of constrained minimization of Q has a single global solution
8. Conclusion
The random density considered in the paper can be extended to multivariate problems introducing analogous partitions of d-dimensional Euclidean space. Also, as an alternative to the empirical approach used in Section 5, we can specify full priors for the hyperparameters. Although more computationally challenging, this choice defines a more flexible model with random dimension for which the density estimates are no longer simple densities. More general random partitions can also be considered.
References
- Ferguson, T. (1973) A Bayesian Analysis of Some Nonparametric Problems. The Annals of Statistics, 1, 209-230. http://dx.doi.org/10.1214/aos/1176342360
- Blackwell, D. (1973) Discreteness of Ferguson Selections. The Annals of Statistics, 1, 356-358. http://dx.doi.org/10.1214/aos/1176342373
- Gosh, J.K. and Ramamoorthi, R.V. (2002) Bayesian Nonparametrics. Springer, New York.
- Thorburn, D. (1986) A Bayesian Approach to Density Estimation. Biometrika, 73, 65-75. http://dx.doi.org/10.2307/2336272
- Lenk, P.J. (1988) The Logistic Normal Distribution for Bayesian, Nonparametric, Predictive Densities. Journal of the American Statistical Association, 83, 509-516. http://dx.doi.org/10.1080/01621459.1988.10478625
- Robert, C.P. and Casella, G. (2004) Monte Carlo Statistical Methods. 2nd Edition, Springer, New York. http://dx.doi.org/10.1007/978-1-4757-4145-2
- Billingsley, P. (1995) Probability and Measure. 3rd Edition, Wiley-Interscience, New Jersey.
- Ash, R.B. (2000) Probability and Measure Theory. 3rd Edition, Harcourt/Academic Press, Massachusetts.
- Schervish, M.J. (1995) Theory of Statistics. Springer, New York. http://dx.doi.org/10.1007/978-1-4612-4250-5
- Bazaraa, M.S. and Shetty, C.M. (2006) Nonlinear Programming: Theory and Algorithms. 3rd Edition, Wiley-Inters- cience, New Jersey.






































































