Mixture Models for Estimating the Number of Drug Users in Thailand 2005-2007

doi:10.4236/am.2013.49168

Paper Menu >>

Journal Menu >>

Applied Mathematics, 2013, 4, 1242-1250

http://dx.doi.org/10.4236/am.2013.49168 Published Online September 2013 (http://www.scirp.org/journal/am)

Mixture Models for Estimating the Number of

Drug Users in Thailand 2005-2007

Chukiat Viwatwongkasem1*, Pratana Satitvipawee1, Suthi Jareinpituk2, Pichitpong Soontornpipit1

1Department of Biostatistics, Faculty of Public Health, Mahidol University, Bangkok, Thailand

2Department of Epidemiology, Faculty of Public Health, Mahidol University, Bangkok, Thailand

Email: *chukiat.viw@mahidol.ac.th

Received June 19, 2013; revised July 19, 2013; accepted July 26, 2013

tion License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly

cited.

ABSTRACT

It is difficult to measure the sizes of illegal drug user populations directly by using the survey method because of many

“hidden drug addicts” and the difficulty of receiving a true response. Systematic and routine information on treatment

episodes of drug users is adopted to estimate the population size in this study. Mixture models of zero-truncated Poisson

distributions using the nonparametric maximum likelihood estimators (NPMLE) by means of capture-recapture re-

peated count data were used to project the number of drug users. The method was applied to surveillance data of drug

users identified by treatment episodes in over 1140 health treatment centers in Thailand from the Bureau of Health Ser-

vice System Development, Ministry of Public Health. We presented how this mixture model could be utilized to con-

struct the unobserved frequency of drug users with no treatment episode and further estimated the total population size

of drug users in the country from 2005 to 2007. The result of simulation was confirmed that mixture model is suitable

when population is large. By means of mixture models, the estimations for the number of drug users were fitted with

excellent goodness-of-fit values and we were also compared to the conventional Chao estimates. The NPMLE for the

total number of drug users in Thailand 2005, 2006, and 2007 were 184,045 (95% CI: 181,297 - 186,793), 230,665 (95%

CI: 226,611 - 234,719), 299,670 (95% CI: 294,217 - 305,123), respectively, also 125,265 (95% CI: 123,092 - 127,142),

166,287 (95% CI: 163,222 - 169,352), 228,898 (95% CI: 224,766 - 233,030) for the number of methamphetamine

(Yaba) users, and 11,559 (95% CI: 10,234 - 12,884), 11,333 (95% CI: 9276 - 13,390), 8953 (95% CI: 7878 - 10,028)

for the number of heroin users, respectively. The numbers of marijuana, kratom-plant, opium, and inhalant users were

under-estimated because their symptoms were mild and not severe enough to remedy in health treatment centers which

led to the smaller size of the total number of drug users. The well-estimated sizes of heroin and methamphetamine ad-

dicts are high reliable because they are based on clearly evident count with a severe addiction problem to health treat-

ment centers. The estimation by means of mixture models can be recommended to monitor drug demand trend and drug

health service routinely; it is easy to calculate via the available programs MIXTP based on request.

Keywords: Capture-Recapture Count Data; Drug Use in Thailand; Mixture Models of Zero-Truncated Poisson

Distributions; Population Size Estimation; Unobserved Zero Count Data

1. Introduction

Drug abuse in Thailand has remained a serious health

problem; its epidemic is still severe and widespread. In-

formation on the number of illegal drug users is a benefit

of the policy and the plan on narcotics control, to imple-

ment a reduction strategy, and to allocate resources to the

health service. Nevertheless, it is difficult to measure the

sizes of drug user populations directly because of many

“hidden drug addicts”. Surveys, especially on the large

national scale, are unlikely to be the most efficient meth-

ods due to a huge cost and manpower, the difficulty of

receiving a true response, the problems of dealing with a

hidden population and ethical issues.

Capture-recapture methods are a classical and useful

tool to solve a hidden population problem and to estimate

a total population size because it can estimate and adjust

for the extent of incomplete ascertainment using infor-

mation from overlapping lists of cases from two or more

distinct sources [1]. Moreover, there are not only the

conventional multiple sources methods but also the ap-

*Corresponding author.

C. VIWATWONGKASEM ET AL. 1243

proaches available based upon one source with repeated

counts for each individual. In this study, a single source

is considered from a surveillance system counting the

number of times that a drug user went to a treatment in-

stitution.

There were few studies in Thailand which used the

capture-recapture method for estimating the number of

drug users. Mastro et al. [2] estimated the number of

HIV-infected injection drug users in Bangkok under two-

sample sources of 18 methadone treatment centers and

72 urine testing police stations. Suppawattanabodee [3]

used two sources of health treatment records and police

arrestment records for estimating the number of drug

users in Bangkok 2001. However, for one source with

repeated count data, applications have been few relevant

studies in Thailand. Böhning et al. [4] estimated the

number of drug users in Bangkok 2001 by means of

zero-truncated count mixture distributions. Viwatwong-

kasem, Kuhnert, and Satitvipawee [5] projected the

number of heroin users in Bangkok 2002 using the mix-

ture of zero-truncated Poisson models. Note that the ze-

ro-truncated Poisson mixture distributions are different

from the mixture of zero-truncated Poisson models, at

least the mixing distribution in both estimations.

A mixture model is a flexible approach to cope with

long-tailed, skewed, and/or contaminated count distribu-

tions in a natural way. The mixing idea corresponds to a

mixture representing the presence of sub-populations

within an overall population. Formally, a mixture model

can cope with not only two or more distributions (het-

erogeneity) but also includes the case of one distribution

(homogeneous population) [6-8]. Böhning and Schön [9]

proposed the nonparametric maximum likelihood esti-

mators (NPMLE) of population size based on the count-

ing distribution. Böhning and Kuhnert [10] showed the

equivalence of the zero-truncated count mixture distribu-

tions and the mixture of zero-truncated count distribu-

tions. They stated that for any mixing distribution of the

truncated mixture, a usually different mixing distribution

of the mixture of truncated counts could be found so that

the likelihood surfaces for both models agreed; conse-

quently, for estimating population size, two estimators

associated with two models had equal values. Punya-

charoensin and Viwatwongkasem [11] predicted HIV

incidence in Thailand utilizing the backcalculation of

mixture of the past AIDS incidence and AIDS incubation

period distributions. Viwatwongkasem, Kuhnert, and Sa-

titvipawee [5] compared the performance of population

size estimators under the truncated count model with and

without allowance for contaminations among Mc-Ken-

drick’s, Mantel-Haenszel’s, Zelterman’s, Chao’s, the ma-

ximum likelihood, and their proposed methods of the

mixture of zero-truncated count models. The proposed

estimator provided the best choice according to its small-

est bias and smallest mean square error for a situation of

sufficiently large population sizes and it also performed

well even for a homogeneous situation.

Although, the mixture model has been used previously

in many fields of application, it is still not very common;

only few relevant studies were found in Thailand and, in

addition, the numerical computation of mixture model

estimates has not been directly provided in the existing

standard statistical packages. With the motivation of

having at present few relevant studies and unavailable

statistical packages with the option or focus on estimat-

ing the size of a hidden population, we take this opportu-

nity to address the gap by adopting the nonparametric

maximum likelihood estimators (NPMLE) for estimating

the mixture parameters of zero-truncated Poisson distri-

butions leading to the population size estimate of interest.

2. Methods

2.1. The Horvitz-Thompson Approach

Suppose that a registration system identifies n observed

cases, but not all cases of a population of size N, and the

system can identify a case with probability 0 where

p0 is probability of the unidentified cases. This leads to

the expected equation of the population size,

1p





1NNp Np

where 0 is the expected

number of cases identified by the system which simply

can be estimated by n, number of identified (observed)

cases. It leads to the estimating equation



1Np



NNpn



, (1)

which in other words can be stated that the population

size N is the sum of both the unobserved and the ob-

served cases (n). The Equation (1) can easily be solved

for N to provide the Horvitz-Thompson estimator





ˆ1Nn p

(2)

and 0ˆ

nNn



. The Horvitz-Thompson approach seems

easy, but the unknown 0 probability of unobserved

cases must be estimated and this is quite differently ac-

complished in the various methods of estimation.

2.2. Data Sources

The surveillance data on the drug addicts undergoing

treatment and rehabilitation in the country over 1140

health treatment centers (1144 centers in 2005 and 2006,

1258 centers in 2007) collected by the Bureau of Health

Service System Development (BHSSD), Ministry of

Public Health, were adopted during 2005 to 2007. Each

anonymous record of treatment episodes in database was

linked to the same patient with matching keywords, such

as age, gender, date of birth, district and city of birth,

present address, hospital number and name, date of re-

C. VIWATWONGKASEM ET AL.

1244

ceiving treatment episodes. This study was approved by

the Ethics Committee on Human Rights of the Faculty of

Public Health, Mahidol University, with the approved

number 105/2011.

2.3. Statistical Methods

Suppose that Y is the number of treatment episodes in a

case; obviously, Y has the values ranged from 1 to m

(without zero value) where m is the largest number of

treatment episodes in a case. Now data Y are tallied into a

frequency table like Table 1. We let i be the number of

treatment episodes in a case, ni be the number (frequency)

of cases identified with i episodes where

and a sample size 12 m is the total num-

ber of observed cases. In Table 1, the observed frequen-

cies of treatment episodes for heroin users in Thailand

2005 are n1 = 3057, n2 = 791, n3 = 351, n4 = 107, n5 = 80,

n6 = 59, n7+ = 22.

1, 2,,im

nnn n

To estimate the population size N and the size of zero

treatment episode n0, we let 1 be probabilities

of cases identified 1, times. Under homogeneity,

the density function pi is assumed to be a zero-truncated

Poisson since zero identification does not occur in the

sample; that is,

pp

,m

 











,exp

,10, 1exp

pfi f









 



where . However, frequently the homogeneous

model is not appropriate in real situations to fit an ade-

quate model. Mixture models allowing for heterogeneity

are more flexible and we consider a discrete mixture of

truncated Poisson densities of the form

1, 2,i





pfiQ qfi











 (3)

where the mixing distribution 12

Qqq q



















gives weights to parameters

q



for ,

k is the number of components in the mixture and

1, 2,,jk



. Then, the log-likelihood for the mixture of

zero-truncated count densities is

 



loglog ,

log ,

LQnf iQ

nqfi































(4)

In this situation, with the help of gradient functions

and the consideration at the boundaries of parameter

space, the log-likelihood is concave on the parameter

space of all discrete probability densities on which it can

be maximized, leading to the nonparametric maximum

likelihood estimator (NPMLE) of Q. To proceed in the

EM context, we need the complete data log likelihood,

which is given in this case as



loglog ,

log

CDiijj

iij j

LQnz fi

nz q













(5)

where the unobserved covariate ij is 1 if i belongs to

component j and 0 otherwise. In the E-step, the unob-

served indicator variates, ij, are replaced by their ex-

pected posterior probabilities, , leading to











1;,

ijij ijj

iji j j

eEznq

Pz nq

fi q

















(6)

In the M-step, the new values 1

ˆˆ



, 1

are found, which maximize the expected version of com-

plete log likelihood (5). The results of the weighting es-

timates are obtained by

ˆˆ

qq

ˆˆ

qq

ˆ,for 1,,

jiij

qnej

n



k (7)

Similarly, the solution after solving the equations of

derivatives with respect to ˆ



is obtained by



ˆˆ

1exp,for 1,,

iij

in ej









k (8)

Note that (8) does not provide a close form solution;

the iterative procedure is needed until the desired accu-

racy is achieved. Having identified the model and the

associated parameter estimates, we can estimate the

probability of zero treatment episodes p0 as



ˆˆ

exp







q (9)

so that the Horvitz-Thompson approach leads to a popu-

lation size estimate

 

ˆˆ

1exp1 exp

Nn q













 



 (10)

2.4. Model Evaluation

It is crucial to select an appropriate model among various

potential models differing in the number of components k.

The smallest value of the Bayesian Information Criterion

(BIC) is considered to choose the best model: the smaller

C. VIWATWONGKASEM ET AL.

1245



distribution with size parameter n and categorical prob-

ability parameters i

nthen ˆ

Nas constructed us-

ing the BIC-selected mixture model. Suppose that there

were B samples of size n each, population size estimates

BIC-value, the better model. n; w

NNailable. These resampled data were

used to compute variances and confidence intervals as

asymptotic normal intervals.





2log21log

BICL Qkn  (11)

The BIC adjusts the log-likelihood with the number of

parameters multiplied by the log-sample size;

BIC works well as model selection criterion in mixture

model since it does not suffer under likelihood irregulari-

ties that are typical for mixture models [8,12].



21kwere av

The appropriate NPMLE for k = 2 components of

mixture yielded . The associated 95% con-

fidence interval for heroin users in Thailand 2005 was

established and lied between (10,234, 12,884). As a gen-

eral trend, the estimated size was about 3 times

higher than the observed data



ˆ11,55N



n. Also, it was usually

important to provide an estimate of completeness (of the

surveillance stream) given as ˆ100%nN , which was

for heroin users 38.6% (95% CI: 34.7% - 43.6%).

3. An Application

For the surveillance data of heroin users 2005 in Table 1,

the observed frequencies were n1 = 3057, n2 = 791, n3 =

351, n4 = 107, n5 = 80, n6 = 59, n7+ = 22. Table 2 showed

that the mixture of two-components of zero-truncated

Poisson model was the best fitting with the smallest BIC

value. The results produced 0 for the unob-

served number of heroin users without any treatment

episodes and for the total number of heroin

users whereas a well-established alternative Chao’s [13]

estimator

ˆ7092n



ˆ11,559N



yielded

which was close to the appropriate NPMLE model with k

= 2 components in the mixture. Likewise, Figure 1

compared frequency distributions of treatment episodes

among the observed frequencies, single Poisson with

zero-truncation, and the mixture of zero-truncated Pois-

son with two components. The Poisson mixture provided

an excellent goodness-of-fit to the observed frequencies

whereas the simple Poisson was not adequate; it was

clearly evident with the smallest BIC value.

ˆ2

Chao

Nnn nˆ10,374

Chao

N

4. Results

Thailand Narcotics Annual Report 2006 of the Office of

Confidence Intervals

Bootstrap resampling technique was applied to compute

the variance of mixtures of truncated count data since the

direct computation via the information matrix was usu-

ally difficult. For each nonparametric bootstrap, frequen-

cies were sampled from a multinomial

,,,

nn n





Figure 1. Frequency distribution of treatment episodes

among the observed counts, single Poisson, and mixture of

two truncated Poisson.

Table 1. Observed frequencies of treatment episodes of heroin users in Thailand 2005: n1 = 3057, n2 = 791, n3 = 351, n4 = 107,

n5 = 80, n6 = 59, n7+ = 22.

Number of treatment episodes in a case (i) 0 1 2 … m = 7+ Total

Number of cases (Frequency ni) - 3057 791 … 22 n = 4467

Table 2. The k-components mixture of zero-truncated Poisson models for estimating the size of heroin users in Thailand 2005.

k ˆj



ˆj

q log-likelihood BIC 0

n ˆ

1 0.9869 1.000 −4893 9796 2654 7121

0.4094

2.7772

0.7965

0.2035 −4543 9111 7092 11,559

0.0419

0.8546

3.2260

0.3346

0.5361

0.1293

−4538 9119 36,757 41,224

C. VIWATWONGKASEM ET AL.

1246

the Narcotics Control Board [14] showed that the number

of drug addicts undergoing treatment and rehabilitation

had increased from 41,564 patients in year 2005 to

43,156 patients in 2006. Most of them were the drug pa-

tients who underwent treatment for the first time (about

80%). Adolescents aged between 15 and 24 years old

were the biggest group (49%). Among them, 83% were

new drug patients while 17% were the relapsing patients.

33% of the total drug patients were unemployed while

19% were laborers and 13% were students. Drug epi-

demics were mostly found in Bangkok (50%) and 30%

were located in the central region of Thailand while the

rest were in the North, the South, and the Northeast, re-

spectively. Methamphetamine (Yaba) addicts were still

the biggest group of drug patients in all treatment centers

(79%) because the ingredients were not hardly available

and the price was not too high in comparing purity and

severity. The second biggest group was cannabis (mari-

juana) addicts (11%); most marijuana was spread in

many urban and rural areas; however, the price of mari-

juana was still cheaper compared with other illicit drugs.

Heroin epidemic was still important though it has a high

price but the injuries were quite severe. Kratom plants

were abused in many areas of the country side; farmers

and peasants used kratom plants for working in the rice

fields to work longer. The number of club drugs like ec-

stasy, ketamine, cocaine, and crystallized methampheta-

mine had an increasing trend in big cities and the rich

persons.

The MIXTP program developed by authors was avail-

able to achieve the estimates of population size under the

mixture of truncated count models via FORTRAN

POWERSTATION and now it is available on the re-

quest.

Table 3 illustrated the sizes of drug users, estimated

by the mixture of truncated Poisson models and classi-

fied by types of drugs in Thailand 2005-2007. Metham-

phetamine users were the biggest group and tended to

increase from 2005-2007 while heroin users trended to

decrease slightly because of its high price. The numbers

of marijuana, kratom-plant, and inhalant users were un-

der-estimated because of their mild severities. Trend of

marijuana sizes increased from 2005-2007 while trends

of kratom-plant, inhalant, and others were difficult to

predict.

5. A Simulation Study

Although data fitting of mixture of truncated count

model was well in the examples, we wish to ensure this

in general case via the simulation experiment. Let count

variables Yi be generated from a two-component mixture

of Poisson distributions with equal weights attached to

the component means 11



 and 2



where

21, 2,,5





. That is,

 

0.51 0.5

YPo Po



 where

1, 2,,iN





,,,

nn n

. Population sizes were 200 (for small),

1000 (for medium), 5000 (for large), 10,000 (very large).

Each simulated datum i was tallied to get frequencies

with respect to the counts where

01 m

0,1,, m



n N



. Then was dropped and zero-

truncated frequencies 1 were used to compute

population size estimators of mixture model and Chao.

This was done under 5000 replications; mean, standard

deviation (SD), and root of mean square error (RMSE) of

all estimates were computed and determined from these

replications.

nn

The results are found in Table 4 and we can conclude

in the following:

 Under homogeneity







, mixture model estima-

tor with 1 component in this case performs well with

smaller RMSE, regardless of population size; Chao’s

estimator is worse with larger RMSE under this ho-

mogeneity.

 Under heterogeneity







, Chao’s estimator per-

forms better when population size is small to moder-

ate (N = 200, 1000); mixture model estimator is better

when population is large to very large (N ≥ 5000) and

degrees of heterogeneity are strong



24, 5



, at

least 3





 Furthermore, we found that if the weak degrees of

heterogeneity occur





22,3



 in combination with

small to moderate population size (N = 200, 1000),

mixture model estimator has a problem of the large

excess values of standard deviation.

6. Discussion

The NPMLE method provides well-estimated sizes of

various drug-user target populations, obtained from the

surveillance data on the drug addicts with emphasis on

methamphetamine (228,898 cases in 2007) and heroin

(8953 cases in 2007) users. It can be expected that these

surveillance data provide a high reliability because they

are based on clearly evident contact counts of drug ad-

dicts with a severe addiction problem to health treatment

centers. A comparison by means of a national household

survey of ONCB [15] yielded the under-estimated sizes

of 66,320 methamphetamine users and 3907 heroin users

per year.

In contrast, the estimated sizes of this study using

NPMLE of users with kratom-plant (less than 18,720

cases in 2007), marijuana (27,323 cases in 2007), and

inhalant (13,362 cases in 2007) are frequently under-

estimated because of their low severity of symptoms to

cure, leading to the smaller size of total number of drug

users (299,670 cases in 2007). This fact is confirmed by

a national household survey of the ONCB [15] that re-

ported an estimate of 378,214 kratom-plant users, 57,527

C. VIWATWONGKASEM ET AL. 1247

Table 3. Estimating number of drug users classified by type of drugs in Thailand 2005-2007.

Types of illegal drug users 2007

All type Methamphetamine Heroin Marijuana Opium Inhalants Others

1 66,991 50,485 2299 6054 2200 3027 2926

2 10,931 7925 477 991 703 477 358

3 1716 1042 219 76 197 78 104

4 439 154 121 31 76 25 32

5 174 54 34 17 50 - 19

6 86 6 47 - 33 - -

7 22 1 10 - 3 8 -

8 28 10 18 - - - -

9 23 - 20 - - - 3

10 23 11 9 1 - - 2

14 14 14 - - - - -

n 80,447 59,702 3254 7170 3262 3615 3444

k 3 3 2 2 2 2 2

N 299,670 228,898 8953 27,323 7193 13,362 18,720

ˆChao

N 285,725 220,505 8794 25,662 6704 13,220 15,401



SD N 2782.3 2108.0 548.6 731.4 264.9 1597.2 1859.8

95% lower 294,217 224,766 7878 25,889 6674 10,231 15,075

95% upper 305,123 233,030 10,028 28,757 7712 16,493 22,365

Types of illegal drug users 2006

All type Methamphetamine Heroin Marijuana Opium Inhalants Others

1 58,578 42,372 2690 5445 2418 2924 2729

2 11,239 7897 775 1025 559 578 405

3 2018 1240 320 158 98 101 101

4 439 182 157 21 35 28 16

5 138 61 27 1 31 5 13

6 132 29 49 - 33 4 17

7 14 14 - - - - -

8 - - - - - - -

9 9 9 - - - - -

13 14 14 - - - - -

14 29 29 - - - - -

n 72,610 51,847 4018 6650 3174 3640 3281

k 3 3 2 1 2 2 2

N 230,665 166,287 11,333 20,283 8973 11,773 13,186

ˆChao

N 225,265 165,522 8686 21,112 8404 11,036 12,475



SD N 2068.2 1563.7 1049.5 421.8 306.9 768.8 1184.4

95% lower 226,611 163,222 9276 19,456 8371 10,266 10,865

95% upper 234,719 169,352 13,390 21,110 9575 13,280 15,507

C. VIWATWONGKASEM ET AL.

1248

Continued

Types of illegal drug users 2005

All type Methamphetamine Heroin Marijuana Opium Inhalants Others

1 50,370 35,422 3057 4073 2478 2446 2894

2 10,760 7785 791 786 520 393 485

3 1959 1262 351 97 97 53 99

4 299 111 107 6 21 28 26

5 202 68 80 16 7 10 21

6 135 10 59 3 55 - 8

7 27 13 - 1 - 13 -

8 8 8 - - - - -

9 16 - 15 - - - 1

12 19 - 7 - - - 12

n 63,795 44,679 4467 4982 3178 2943 3546

k 4 2 2 2 2 2 3

N 184,045 125,117 11,559 16,041 9297 10,989 13,201

ˆChao

N 181,692 125,265 10,374 15,535 9082 10,555 12,180



SD N 1401.8 1033.0 676.0 418.9 311.2 562.1 571.6

95% lower 181,297 123,092 10,234 15,220 8687 9887 12,081

95% upper 186,793 127,142 12,884 16,862 9907 12,091 14,321

Table 4. Mean, SD, RMSE of population size estimators of Chao and mixture model.

Chao Mixture

N 2



Mean SD RMSE Mean SD RMSE

1 10000.4 159.3 159.3 999.4 117.7 117.7

2 9727.2 105.6

292.6 9983.4 387.4 387.8

3 9551.4 86.9 456.9 9997.6 185.8 185.9

4 9541.1 84.3 466.6 10002.4 143.4 143.5

10000

5 9630.2 90.3 380.7 10001.5 122.1 122.1

1 5000.8 113.6 113.6 4999.5 83.9 83.9

2 4865.5 74.9

153.9 5082.4 678.5 683.5

3 4776.8 60.9 231.4 5006.6 133.4 133.6

4 4771.7 59.6 235.4 5005.0 103.1 103.3

5000

5 4816.8 63.4 193.8 5003.6 87.1 87.2

1 1002.6 50.7 50.8 1001.4 37.4 37.4

2 974.0 33.3

42.3 1221.8 743.1 775.5

3 956.1 27.6

51.8 1051.2 293.4 297.8

4 955.0 27.2

52.6 1008.8 54.8 55.5

1000

5 964.3 28.9 45.9 1004.7 41.5 41.8

1 203.0 23.9 24.1 201.3 17.0 17.1

2 195.7 15.5

16.1 348.3 436.1 460.6

3 191.9 12.8

15.2 351.3 448.3 473.2

4 191.8 12.5

14.9 258.4 284.7 290.6

200

5 194.0 13.7

15.0 213.1 134.2 134.8

C. VIWATWONGKASEM ET AL.

1249

marijuana users, and 48,849 inhalant users in year 2007,

leading to 575,312 cases for total number of drug users.

The huge difference in values between two methods

mentioned, stem mainly from the severity of symptoms

of drug use. With this point of view, the estimated sizes

of methamphetamine and heroin users from the surveil-

lance data of this NPMLE study seem to be more useful

than those from the national survey, in particular, if

viewed from the perspective of a benefit of allocating

resources on health service, monitoring drug epidemics,

and planning policy on narcotics control. In general, the

estimated sizes from this study are at least three times

higher than the observed data. Hence, the completeness

of identification is about 30% - 40%.

Due to the result of simulation that mixture model es-

timator behaves well when population size is large, there

is no reason to reject the use of mixture model to esti-

mate the hidden population size and the total population

size for each type of drugs since the observed total num-

ber of drug users is large enough. However, there is

something called a boundary problem: extremely large

observations in some samples. This could explain the

overestimation effect seen in the simulation for N = 200.

Kuhnert et al. [16] used the median for a series of esti-

mates of population size in their simulation to avoid

highly influential size estimates. Basically, the mixture

model is a flexible approach to cope with homogeneity

and heterogeneity, including long-tailed, skewed, and/or

contaminated distributions in a natural way.

Recently, there has been an increased interest in zero-

truncated count models. These models can be applied in

many areas such as illegal immigrants, illegal gun own-

ers, HIV epidemic, scrapie disease on sheep, or criminal

persons. This article has shown how the mixture models

allowing for heterogeneity can be applied to estimate the

unobserved population size of drug users with zero

treatment episodes and then estimate the total population

size of illegal drug addicts. Indeed, there are not only the

estimators available based upon mixture models but also

there are the Mantel-Haenszel’s [17], Zelterman’s [18],

Chao’s [13], and maximum likelihood methods available

in estimating population sizes. Viwatwongkasem, Kuh-

nert, and Satitvipawee [5] found that the mixture of zero-

truncated count model and Chao’s model provided the

best choice among the above estimators, according to its

smallest bias and smallest mean square error, especially

for a situation of sufficiently large population sizes; fur-

thermore, the mixture itself also performed well even for

a homogeneous situation. Although the mixture model

provides a nice estimate, its variance estimate is usually

difficult to find. Bootstrap resampling technique was

applied to compute the variance of mixture of truncated

count data, instead of the direct computation via the in-

formation matrix. Further study should focus on the es-

timation of the variance of mixture models. But this is a

challenging task as stated by Chao [19], Cormack [20],

and Böhning and Schön [9]. Other parametric models

such as the binomial model, the hypergeometric model,

and the inverse sampling of the negative binomial model

should be considered in any future research.

Fortunately, the appropriate NPMLE models for these

surveillance data of drug users in 2005-2007 do not face

a spurious value of overestimation. However, in few oc-

currences, the NPMLE of mixture may provide an over-

estimation. The occurrence of overestimates is due to the

boundary problem of an estimate which is evaluated at

the boundary of parameter space. The improvement in

reducing overestimation bias should be investigated in

any further study.

7. Acknowledgements

We are grateful to Kanya Boonthongtho, our M.Sc. (Bio-

statistics) student as well as the Bureau of Health Service

System Development (BHSSD), Ministry of Public

Health, for providing the surveillance dataset. We would

like to thank the referees and the editors for comments

which greatly improved this paper. This study was par-

tially supported for publication by the China Medical

Board (CMB), Faculty of Public Health, Mahidol Uni-

versity, Bangkok, Thailand.

REFERENCES

[1] E. B. Hook and R. R. Regal, “Capture-Recapture Meth-

ods in Epidemiology: Methods and Limitations,” Epide-

miologic Reviews, Vol. 17, No. 2, 1995, pp. 243-264.

[2] T. D. Mastro, D. Kitayaporn, B. G. Weniger, et al., “Es-

timating the Number of HIV-Infected Injection Drug Us-

ers in Bangkok: A Capture-Recapture Method,” American

Journal of Public Health, Vol. 84, No. 7, 1994, pp. 1094-

1099. doi:10.2105/AJPH.84.7.1094

[3] B. Suppawattanabodee, “Estimating the Number of Drug

Users in Bangkok: A Capture-Recapture Method,” Master

of Sciences Thesis, Mahidol University, Bangkok, 2003.

[4] D. Böhning, B. Suppawattanabodee, W. Kusolvisitkul,

and C. Viwatwongkasem, “Estimating the Number of

Drug Users in Bangkok 2001: A Capture-Recapture Ap-

proach Using Repeated Entries in One List,” European

Journal of Epidemiology, Vol. 19, No. 12, 2004, pp.

1075-1083. doi:10.1007/s10654-004-3006-8

[5] C. Viwatwongkasem, R. Kuhnert and P. Satitvipawee, “A

Comparison of Population Size Estimators under the

Truncated Count Model with and without Allowance for

Contaminations,” Biometrical Journal, Vol. 50, No. 6,

2008, pp. 1006-1021. doi:10.1002/bimj.200810484

[6] D. Böhning, “Computer-Assisted Analysis of Mixtures

and Applications. Meta-Analysis, Disease Mapping and

Others,” Chapman & Hall/CRC, Boca Raton, 2000.

[7] B. G. Lindsay, “The Geometry of Mixture Likelihoods

C. VIWATWONGKASEM ET AL.

1250

Part I: A General Theory,” Annals of statistics, Vol. 11,

No. 3, 1983, pp. 783-792. doi:10.1214/aos/1176346245

[8] G. McLachlan and D. Peel, “Finite Mixture Models,”

Wiley, New York, 2000. doi:10.1002/0471721182

[9] D. Böhning and D. Schön, “Nonparametric Maximum

Likelihood Estimation of Population Size Based on the

Counting Distribution,” Journal of the Royal Statistical

Society: Series C (Applied Statistics), Vol. 54, No. 4,

2005, pp. 721-737.

doi:10.1111/j.1467-9876.2005.05324.x

[10] D. Böhning and R. Kuhnert, “Equivalence of Truncated

Count Mixture Distributions and Mixture of Truncated

Count Distributions,” Biometrics, Vol. 62, No. 4, 2006,

pp. 1207-1215. doi:10.1111/j.1541-0420.2006.00565.x

[11] N. Punyacharoensin and C. Viwatwongkasem, “Trends in

Three Decades of HIV/AIDS Epidemic in Thailand by

Nonparametric Backcalculation Method,” AIDS, Vol. 23,

No. 9, 2009, pp. 1143-1152.

doi:10.1097/QAD.0b013e32832baa1c

[12] N. M. Laird, “Nonparametric Maximum Likelihood Es-

timation of a Mixing Distribution,” Journal of the Ame-

rican Statistical Association, Vol. 73, No. 364, 1978, pp.

805-811. doi:10.1080/01621459.1978.10480103

[13] A. Chao, “Estimating the Population Size for Capture-

Recapture Data with Unequal Catchability,” Biometrics,

Vol. 43, No. 4, 1987, pp. 783-791.

doi:10.2307/2531532

[14] Office of the Narcotics Control Board (ONCB), “Thai-

land Narcotics Annual Report,” Aroon Printing Co., Ltd.,

Bangkok, 2006.

[15] Office of the Narcotics Control Board (ONCB), Aca-

demic Network Organization Board on Substance Abuse,

“Estimation of the Number of Drug Addicts in Thailand

2007,” Union Ultra Violet Co., Ltd., Bangkok, 2007.

[16] R. Kuhnert, V. J. Del Rio Vilas, J. Gallagher and D. Böh-

ning, “A Bagging-Based Correction for the Mixture

Model Estimator of Population Size,” Biometrical Jour-

nal, Vol. 50, No. 6, 2008, pp. 993-1005.

doi:10.1002/bimj.200810485

[17] N. Wannasirikul, “A Comparison of Truncated Poisson

Estimators of Population Size under Model Contamina-

tions,” Master of Sciences Thesis, Mahidol University,

Bangkok, 2005.

[18] D. Zelterman, “Robust Estimation in Truncated Discrete

Distributions with Application to Capture-Recapture Ex-

periments,” Journal of Statistical Planning and Inference,

Vol. 18, No. 2, 1988, pp. 225-237.

doi:10.1016/0378-3758(88)90007-9

[19] A. Chao, “Estimating Population Size for Sparse Data in

Capture-Recapture Experiments,” Biometrics, Vol. 45,

No. 2, 1989, pp. 427-438. doi:10.2307/2531487

[20] R. M. Cormack, “Interval Estimation for Mark-Recapture

Studies of Closed Populations,” Biometrics, Vol. 48, No.

2, 1992, pp. 567-576. doi:10.2307/2532310