J. Biomedical Science and Engineering, 2008, 1 85-90
Published Online August 2008 in SciRes. http://www.srpublishing.org/journal/jbise JBiSE
Retrospective analysis of chronic hepatitis C in un-
treated patients with nonlinear mixed effects mod-
el
Jian Huang1, Kathleen O’Sullivan1, John Levis2, Elizabeth Kenny-Walsh3, Orla Crosbie3 & Liam Jo-
seph Fanning2
1 Statistical Consultancy Unit, University College Cork, Ireland. 2 Molecular Virology, Department of Medicine, Hospital Cork University, Ireland. 3 Depart-
ment of Gastroenterology and Hepatology, University Hospital Cork,Ireland. Correspondence should be addressed to Jian Huang (j.huang@ucc.ie).
ABSTRACT
It is well known that viral load of the hepatitis C
virus (HCV) is related to the efficacy of interferon
therapy. The complex biological parameters that
impact on viral load are essentially unknown.
The current knowledge of the hepatitis C virus
does not provide a mathematical model for viral
load dynamics w ithin untreated patients. We car-
ried out an empirical modelling to investigate
whether different fluctuation patterns exist and
how these patterns (if exist) are related to host-
specific factors. Data was prospectively col-
lected from 147 untreated patients chronically
infected with hepatitis C, each contributing be-
tween 2 to 10 years of measurements. We pro-
pose to use a three parameter logistic model to
describe the overall pattern of viral load fluctua-
tion based on an exploratory analysis of the data.
To incorporate the correlation feature of longitu-
dinal data and patient to patient variation, we
introduced random effects components into the
model. On the basis of this nonlinear mixed ef-
fects modelling, we investigated effects of host-
specific factors on viral load fluctuation by in-
corporating covariates into the model. The pro-
posed model provided a good fit for describing
fluctuations of viral load measured with varying
frequency over different time intervals. The aver-
age viral load growth time was significantly dif-
ferent between infection sources. There was a
large patient to patient variation in viral load as-
ymptote.
Keywords: Logistic model, Viral load, Viral
genotype, Mixed effects modelling
1. INTRODUCTION
Approximately 3% of the world population is infected by
the hepatitis C virus (HCV). This virus is a single
stranded positive sense RNA virus and does not exist as a
single clonotype. It is found as a complex mixture of
similar but non-identical isolates, hence, quasispecies.
There are seven different genotypes of HCV each with a
unique population of subtypes. The nomenclature used to
describe these genotypes is numerical, while subtypes are
described alphabetically. The amount of virus present in
serum at any one time is referred to as the viral load,
which can be measured in serum by RT-PCR (a method
based on amplification of genomic RNA [1]). A wide
range of viral load fluctuation over time was observed
within some untreated patients [2]. Treatment efficacy is
reduced when viraemia is greater than 5.7-6.0 log10
IU/mL [3]. Knowledge of viral load fluctuations could
lead to a more optimised treatment initiation time point.
To date several studies have attempted to elucidate viral
load fluctuation within untreated patients. Halfon et al. [2]
showed that viral load fluctuation within untreated pa-
tients was significant. Arase et al. [4] illustrated the ratio
of the maximum viral load to the minimum viral load was
related to acute exacerbation. Pontisso et al. [5] demon-
strated that the mean difference between the maximum
viral load and minimum viral load was significantly dif-
ferent between normal transaminases and fluctuating
transaminases. Our previous studies [6, 7] have showed
that viral load does change over time in some patients and
exhibits periods of apparent stability in others.
The complex biological parameters that impact on
HCV viral load are essentially unknown. However, what
is known is that the magnitude of the viral load at any one
time represents the output of the equilibrium between
viral production and host mediated viral clearance. The
phenomenon of replicative homeostasis may explain viral
load fluctuation over time within an untreated patient [8].
Replicative homeostasis consists of a series of autoregula-
tory feedback epicycles that link RNA polymerase func-
tion, RNA replication and viral production and presents a
model which may rationalize why viraemia modulates
over time. Replicative homeostasis results dynamic equi-
librium controlled by the specificity of the interactions
between mutant or wild type envelope proteins and the
replicas, the RNA dependant RNA polymerase (RDRP).
In other words, a highly progressive RDRP exhibits a low
SciRes Copyright © 2008
86 J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90
SciRes Copyright © 2008 JBiSE
fidelity of replication yielding a high intracellular concen-
tration of mutant type envelope proteins. This mutant
population competes with wild type forms, resulting in a
progressive increase in fidelity and a shift in the dynamic
equilibrium which results in the generation of a dominant
viral quasispecies and which may be reflected in a change
in the absolute magnitude of the viral load. The rate of
oscillation between high and low fidelity is likely to be
influenced, in a unique way within each viral-host pairing
by the visibility of the antigenic quasispecies to immune
enhanced viral clearance. Low quasispecies complex is
associated with increased antigen concentration, if the
threshold of immune activation is passed, immune po-
tency is increased and active clearance of particular viral
isolates can take place. The removal of the dominant qua-
sispecies may be followed by a parallel temporally mis-
matched decrease in viral load.
Based on interaction of HIV with cells of the immune
system, various mathematical models have been devel-
oped to fit HIV viral load data [9, 10]. These models were
developed to describe viral dynamics during antiviral
treatment. Thus, they cannot be used for long term inves-
tigations of viral load progression in untreated patients.
The current knowledge of HCV does not provide a
mathematical model for describing long term HCV dy-
namics within an untreated patient. We carried out an
empirical modelling analysis to investigate whether dif-
ferent fluctuation patterns exist and how these patterns (if
exist) are related to host-specific factors. The analysis
comprised a two-step approach: first, a statistical model
was developed for describing overall viral load fluctua-
tion as a function of time, taking individualization into
account. Then, we examined effects of host-specific fac-
tors on viral load fluctuation by incorporating covariates
into the model.
A mixed effects logistic model for viral load fluctua-
tion is developed in Section 2. By sequential modelling,
the effects of host-specific factors (covariates) on viral
load fluctuation are investigated in Section 3. The results
are discussed in Section 4.
2. MODELLING VIRAL LOAD FLUCTUA-
TION
2.1. Data
The data used consisted of 147 untreated patients chroni-
cally infected with hepatitis C, each contributing between
2 to 10 years of measurements. The study population was
divided according to likely source of infection. Group A
consisted of 85 individuals whose sole risk factor for ac-
quisition of hepatitis C was iatrogenic infection
throughreceipt of HCV genotype 1 subtype b (HCV 1)
contaminated Anti-D immunoglobulin [11]. Group B
consisted of 62 individuals whose risk factors for acquisi-
tion of hepatitis C infection likely to be one or more of
the followings: intravenous drug use, receipt of
Time (year)
Viral load (log10IU/mL)
3
4
5
6
7
02468100246810024681002468100246810
3
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
7
024681002468100246810
Figure 1. For each patient viral load is plotted against time from
the first viral load sampling.
contaminated blood, or blood products other than Anti-D
immunoglobulin or infection was of undefined aetiology
(sexual). The genotype composition of Group B is 1 and
3.
2.2. Models for viral load fluctuation
Plotting viral load against time post infection is a natural
way for visualising viral load fluctuation. However, due
to the asymptomatic nature of HCV, the exact time of
when infection occurred is not available for patients in
group B. Figure 1 shows the profiles of the viral load of
the 147 available patients against time from the first viral
load sampling. The study duration time ranged from 24 to
120 months (25% percentile, median, 75% percentile: 60,
84, 108 months). The mean intervals of blood sampling
from each patient varied from 3 to 23 months (25% per-
centile, median, 75% percentile: 9, 10, 13 months). Figure
1 shows a wide range of viral load fluctuation in
J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90 87
SciRes Copyright © 2008 JBiSE
0246810
34567
Tim e
log10.IU.per.mL
Lowess
logistic
quadratic
Figure 2. Scatter plot of viral load overlaid with the LOWESS,
logistic and quadratic fitted curves.
some untreated patients. In addition, the viral load pro-
files vary between patients. To identify an appropriate
model to describe the overall fluctuation of viral load as a
function of time, the locally-weighted polynomial regres-
sion smoothing (the R function LOWESS) was performed
on the data (Figure 2). The pattern obtained from the
LOWESS shows that viral load increase over some time
period, followed by a period of the more stabilized viral
load. Such patterns can be described by logistic models
[12]
,))/)81ln()(exp(1/( ijijij ty
ε
γ
β
α
+
−+= (1)
where ij
y is the logarithm (base 10) of the measurement
of viral load for the ith patient at the jth measurement time
(from the first viral load sampling) ij
t and the model er-
ror ij
ε
is assumed to be i.i.d. N (0,2
σ
). The parameter
α
is the asymptote (the limit of viral load growth),
β
is
the midpoint, the time when the most rapid viral load
growth occurs. The scale parameter
γ
is the growth time,
time interval during which growth progresses from 10%
to 90% of the asymptote [12]. )81ln( is introduced into
model (1) to facilitate interpretation of
γ
.
As a comparison we fed the quadratic polynomial
model to data as well. The fitted curves corresponding to
the quadratic polynomial model (AIC=2896.37) and lo-
gistic model (AIC=2896.01) are also shown in Figure 2.
As can be seen from Figure 2 both models captured the
basic structure of the LOWESS smooth with the logistic
one performing better at the tail. Having a smaller AIC
value the logistic model was selected to describe the
overall pattern of viral load fluctuation.
It is also evident from Figure 1 that there exists a large
patient to patient variation in viral load fluctuation. To
account for this patient to patient variation random com-
ponents were introduced into model (1) yielding the fol-
lowing mixed effects model
,)))/(ln(81)
))(((/(1)(
iji
iijiij r
btexpay
εγ
βα
++×
+−−++= (2)
where ),,( iii rba are assumed to be the i.i.d random vec-
tor, independent of the model errors and follow the nor-
mal distribution N(0,Σ).
α
is replaced by i
a
+
α
to ac-
count for patient to patient variation in the viral load as-
ymptote.
α
is called the fixed effect and represents the
mean level of the viral load asymptote for the population.
i
a is called the random effect and represents the individ-
ual patient departure from the mean level. Similarly, the
fixed effects
β
and
γ
represent the mean levels of the
midpoint and growth time for the population, respectively.
The random effects i
band i
rare the individual patient
departures from the mean levels of the midpoint and
growth time, respectively.
The R package nlme [13] was used to fit nonlinear
mixed effects models. In model (2), all three parameters
consisted of a fixed effect term and a random effect term.
Such models might be over parameterized. In these cases,
the variance-covariance matrix of random effects become
seriously ill-conditioned, making convergence difficult or
impossible. We adopted model building strategies sug-
gested in [13] to determine an adequate but parsimonious
model.
We began with fitting model (2) to data. Convergence
was achieved. However, convergence was sensitive to
changes in the initial values and the algorithm failed to
converge using values slightly different from the fitted
values. Checking the fitted model we found that the esti-
mated covariance matrix
Σ
has off diagonal terms of
zero. Hence, we refitted model (2) with a diagonal co-
variance matrix
Σ
. Then the possibility of eliminating one
or more random effects from the model was investigated.
First, we compared models generated by eliminating one
of the three random effects from model (2) and found that
the model with random components in
α
and
β
had the
largest likelihood value, termed as model A. Then we
considered models generated by removing two of the
three random effects from model (2) and found that the
model with a random effect in
α
had the largest likeli-
hood value, termed as model B. To establish the signifi-
cance of random effects, we followed the procedure de-
scribed by Verbeke and Molenberghs [14], who provide
an outline for testing the need for random effects by com-
paring the log-likelihood between the nested models with
and without random effects. The asymptotic null distribu-
tion of the test statistics is a mixture of Chi-squares. The
results are summarised in Table 1. As can be seen from
the table there are significant random effects in
α
(P<0.0001) and
β
(P<0.0001). Model B was preferred.
Hence we re-parameterized model (2) as follows
ij
iijiij bty
εγ
βαα
+−−++=
))/)81ln(
))((exp(1/()( (3)
When fitting model (3) to data, we estimate the stan-
dard deviations of the random effects as 0.63 (log10IU/mL)
and 1.32 (year) for the asymptote and midpoint of growth
time, respectively.
The time variable used in the model is the time from
the first viral load sampling. It may not be related to the
time since onset of infection. A patient can enter the study
at any time since onset of infection. For example, some
88 J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90
SciRes Copyright © 2008 JBiSE
Table 1. Comparisons of nested models to establish significance
of random effects.
Random effects LR test P-Value
a Model (1) vs Model B <0.001
b Model A vs Model A <0.001
c Model A vs Model (2) Nonsignificant
may present two months after the time of infection; others
may present one year later. Such time differences can be
accounted for in the model by random effects in
β
. Ran-
dom effects in
β
allow the logistic growth curve to fit
viral load profiles measured at different time spans from
onset of infection. Significant patient to patient variation
in
β
(P<0.001) may reflect that patients may have en-
tered the study at different time from the times of infec-
tion. The significant random effects in
α
(P<0.001) indi-
cates that there exist large patient to patient variation in
the asymptote. However, from a clinical perspective our
current understanding of hepatitis C disease is such that
this variation is difficult to interpret with certainty.
3. ANALYSIS OF COVARIATES
In this section, we examined the effects of covariates on
the viral load fluctuation by sequentially incorporating
them into model (3). A similar approach has been used in
[15]. The covariates considered are viral genotype, gender,
infection source (Group A and Group B) and age (< 45
years verses 45 years). In model (3) there are three
fixed effects terms,
α
β
and .
γ
Any given covariate may
have a significant effect on at least one of the terms. For
example, gender had significant a effect on
γ
(P<0.05)
but not on
α
and .
β
When a covariate was incorporated
into the model only were significant terms retained. Inter-
actions were considered only if a fixed term was signifi-
cantly affected by at least two covariates. Initially, one
covariate was incorporated into model (3). Incorporating
age into model (3) failed to produce any significant term.
Hence, three one-covariate models were produced. The
one-covariate model with the maximum likelihood was
selected as the optimum one-covariate model. Subse-
quently, one of the remaining covariates was incorporated
into this model and yielded no significant term. The selec-
tion process was stopped. The process is summarized in
Table 2. The final model selected to fit the viral load data
is model (4) with
γ
influenced by infection source and
can be expressed as follows
,)))/()81ln(
))((exp(1/()(
ijr
iijiij btay
εγγ
βα
=+×
+−−++= (4)
where r
γ
represents infection source effect on the growth
time
γ
. The significance of this term was assessed using
the F-tests (P<0.0001).
As an alternative approach we incorporated covariates
into model (3) using the backward elimination approach.
The selection process began with incorporating all the
Table 2. Sequentially incorporating covariates into model (3) to
determine important covariates. Significant term is indicated in the
bracket.
Models P-value Log-
likelihood
Time (model (3)) -946.40
Time + gender (
γ
) <0.001 -931.34
Time + infection source (
γ
) <0.001 -925.63
Time + genotype (
γ
) <0.001 -946.01
covariates into model (3). Then non significant terms
were removed from the model recursively. Using a
significance level of 0.05, this approach also selected
model (4) as the final model.
A likelihood ratio test was used to test the difference be-
tween the fixed effects represented by model (3) and(4).
The result favors model (4) with P<0.0001.
Plotting the standardized residuals against the fitted
values of model (4) and the Normal plot of the residuals
(not shown) did not illustrate any concerns about the
model assumptions. The individual fitted curves (Figure
3) indicate good fits to the data, keeping in mind that the
data were irregularly and sparsely sampled.
The fitting of model (4) to the data is summarised in
Table 3. The results show that the mean viral load as-
ymptote for the population was 6.44 (log10IU/mL). The
most rapid viral load growth occurred, on average, 3.29
years before patients entered the study. The mean growth
time for the population was 11.04 years. The patients in
Group A experienced, on average, significantly longer
growth time (delta = 3.80 year) compared to those in
Group B.
Since all patients in Group A are female and have vi-
ral genotype 1, to separate the gender and genotype ef-
fects from the infection source effect, the previous ap-
proach to evaluating the effects of covariates by sequen-
tially incorporating them into model (3) was applied to
Group B. No significant one-covariate model was pro-
duced. This finding is very different from the results de-
scribed earlier. Using the full data we found that all co-
variates except age had a significant effect on viral load
growth pattern (see Table 2). One possible explanation is
that the gender and genotype significance found in the
previous analysis may be due to the significance of infec-
tion source. Another possibility is that number of patient
in Group B was small and there may have not been suffi-
cient power to identify gender and genotype significance.
4. CONCLUSIONS AND DISCUSSIONS
We have developed a mixed effects logistic model for
describing the viral load fluctuation within untreated pa-
tients with chronically infected HCV. The model’s diag-
nostic analysis indicated that the proposed model pro-
vided a good fit to the data. Due to the asymptomatic na-
ture of HCV, it is impossible to determine the exact time
when a patient was infected. Hence the time variable used
in the model is the time since an individual patient pre-
J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90 89
SciRes Copyright © 2008 JBiSE
sented.
Table 3. The results of fitting model (4) to the data.
Terms Value Std.errors p-Value
α
: Grand mean 6.44 0.068 <0.0001
β
: Grand mean -3.290.223 <0.0001
γ
: Grand mean 11.040.750 <0.0001
γ
:Group B- Group A -3.800.444 <0.0001
Time (year)
Viral load (log10IU/mL)
3
4
5
6
7
02468100246810 024681002468100246810
3
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
73
4
5
6
7
3
4
5
6
7
02468100246810 0246810
Figure 3. Viral load is ploted against time from the first viral load
sampling, overplayed with the individual fitted curve (solid line),
for each of 147 patients.
Random effects in
β
allow the model to fit viral load
profiles beginning at different time from the time of in-
fection. As shown in Figure 3, partially measured viral
load profile can be fitted by the corresponding section of
the logistic growth curve.
Recently, Gray et al. [15] conducted an empirical mod-
elling of HIV-RNA viral load in vertically infected chil-
dren. They chose a linear model to represent the overall
pattern of HIV-RNA viral load among conventional poly-
nomials, change point models, and fractional models.
They compared various algorithms to fit the linear model.
Based on exploratory analysis of data we chose a nonlin-
ear model to describe the overall pattern of HCV viral
load within untreated patients. We investigated effects of
host-specific factors based on nonlinear mixed effects
models.
The quantification of viraemia is a snap shot in time of
the amount of virus present. The diurnal variation in viral
load is unknown; however, the amount of virus present at
any one time is the outcome of the equilibrium between
viral production (replication of genomic RNA and pro-
duction of infectious virions) and destruction of infected
hepatocytes. Although, there is considerable patient to
patient variation in viral load asymptote, from a clinical
perspective our current understanding of hepatitis C dis-
ease is such that these variations are difficult to interpret
with certainty. What is of interest is the difference in viral
load growth time. The growth time determines the
“speed” at which the asymptote is reached. A clinically
important parameter is the pre-treatment viral load. Viral
load has been established by numerous studies as an in-
dependent variable which determines outcome while on
anti-viral treatment. Patients with a viral load value below
6 log10 IU/mL are known to respond with higher efficacy
to therapy [3,16]. The difference in the growth time be-
tween groups A and B would indicate that more time
could be afforded to the pre-treatment assessment period
for group A while viral load remained within the range of
optimum efficacy.
The distinguishing feature of data presented here is the
longitudinal nature of the viral load measurement. Group
A represents a globally unique homogeneous cohort with
respect to the investigation of the natural history of hepa-
titis C infection. The empirical model proposed here is the
first attempt to describe long term viral load fluctuation in
untreated patients.
REFERENCES
[1] Fanning at el. (1999) Viral load and clinic pathological features of
chronic hepatitis C (1b) in a homogeneous patient population.
Hepatology, 29, 904-7.
[2] Halfon et al. (1998) Assessment of spontaneous fluctuations of
viral load in untreated patients with chronic hepatitis C by two
standardized quantitation methods: branched DNA and amplicor
monitor. J Clin Microbiol, 36(7), 2073–2075.
[3] Shiffman et al. (2007) Peginterferon alfa-2a and ribavirin for 16 or
24 weeks in HCV genotype 2 or 3. N Engl J Med,357(2), 124-34.
[4] Arase, Y., Ikeda, K., and Chayama, K. (2000) Fluctuation patterns
of HCV-RNA serum level in patients with chronic hepatitis C. J
Gastroenterol, 35, 221-225.
[5] Pontisso, P., Bellati, G., and Brunetto, M. (1999) Hepatitis C virus
RNA profiles in chronically infected individuals: Do they relate to
disease activity? Hepatology, 29, 585–589.
[6] Fanning et al. (2000) Natural fluctuations of hepatitis C viral load
in a homogeneous patient population: a prospective study.
Hepatology, 31, 225-9.
[7] Fanning et al. (2001) LA class II genes determine the natural
variance of hepatitis C viral load. Hepatology, 33, 224-30.
[8] R. Sallie. (2005) Replicative homeostasis II: influence of
polymerase fidelity on RNA virus quasispecies biology:
implications for immune recognition, viral autoimmunity and other
"virus receptor", diseases. Virol J, 22, 2-70,.
90 J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90
SciRes Copyright © 2008 JBiSE
[9] Wu, H. (2005) Statistical methods for HIV dynamic studies in
AIDS clinical trials. Statistical Methods in Medical Research, 14,
171-192.
[10] Donnelly, C.A. and Cox, D.R. (2001) Mathematical biology and
medical statistics: contributions to the understanding of AIDS
epidemiology. Stat Methods Mes Res, 10(2), 141-54.
[11] Kenny-Walsh E. (1999) Clinical outcomes after hepatitis C
infection from contaminated anti-D immune globulin. Irish
Hepatology Research Group. N Engl J Med, 16, 1228-33.
[12] Meyer, P.S., Yung, J.W., and Ausubel, J.H. (1999) A Primer on
Logistic Growth and Substitution: The Mathematics of the Loglet
Lab Software. Technological Forecasting and Social Change, 61,
247-271.
[13] Pinheiro, J.C., and Bates, D.M. (2000) Mixed-Effects Models in S
and S-PLUS, Springer.
[14] Verbeke, G., Molenberghs, G. (2000) Linear Mixed Models for
Longitudinal Data. Springer, New York.
[15] L. Gray, M. Cortina-Borja and ML Newell. (2004) Modelling
HIV-RNA viral load in vertically infected children. Statist. Med,
23, 769-781.
[16] Zeuzem et al. (2006) Efficacy of 24 weeks treatment with
peginterferon alfa-2b plus ribavirin in patients with chronic
hepatitis C infected with genotype 1 and low pretreatment viremia.
J Hepatol, 44, 97-103.