Retrospective analysis of chronic hepatitis C in untreated patients with nonlinear mixed effects model

doi:10.4236/jbise.2008.12014

Paper Menu >>

Journal Menu >>

J. Biomedical Science and Engineering, 2008, 1 85-90

Published Online August 2008 in SciRes. http://www.srpublishing.org/journal/jbise JBiSE

Retrospective analysis of chronic hepatitis C in un-

treated patients with nonlinear mixed effects mod-

Jian Huang1, Kathleen O’Sullivan1, John Levis2, Elizabeth Kenny-Walsh3, Orla Crosbie3 & Liam Jo-

seph Fanning2

1 Statistical Consultancy Unit, University College Cork, Ireland. 2 Molecular Virology, Department of Medicine, Hospital Cork University, Ireland. 3 Depart-

ment of Gastroenterology and Hepatology, University Hospital Cork,Ireland. Correspondence should be addressed to Jian Huang (j.huang@ucc.ie).

ABSTRACT

It is well known that viral load of the hepatitis C

virus (HCV) is related to the efficacy of interferon

therapy. The complex biological parameters that

impact on viral load are essentially unknown.

The current knowledge of the hepatitis C virus

does not provide a mathematical model for viral

load dynamics w ithin untreated patients. We car-

ried out an empirical modelling to investigate

whether different fluctuation patterns exist and

how these patterns (if exist) are related to host-

specific factors. Data was prospectively col-

lected from 147 untreated patients chronically

infected with hepatitis C, each contributing be-

tween 2 to 10 years of measurements. We pro-

pose to use a three parameter logistic model to

describe the overall pattern of viral load fluctua-

tion based on an exploratory analysis of the data.

To incorporate the correlation feature of longitu-

dinal data and patient to patient variation, we

introduced random effects components into the

model. On the basis of this nonlinear mixed ef-

fects modelling, we investigated effects of host-

specific factors on viral load fluctuation by in-

corporating covariates into the model. The pro-

posed model provided a good fit for describing

fluctuations of viral load measured with varying

frequency over different time intervals. The aver-

age viral load growth time was significantly dif-

ferent between infection sources. There was a

large patient to patient variation in viral load as-

ymptote.

Keywords: Logistic model, Viral load, Viral

genotype, Mixed effects modelling

1. INTRODUCTION

Approximately 3% of the world population is infected by

the hepatitis C virus (HCV). This virus is a single

stranded positive sense RNA virus and does not exist as a

single clonotype. It is found as a complex mixture of

similar but non-identical isolates, hence, quasispecies.

There are seven different genotypes of HCV each with a

unique population of subtypes. The nomenclature used to

describe these genotypes is numerical, while subtypes are

described alphabetically. The amount of virus present in

serum at any one time is referred to as the viral load,

which can be measured in serum by RT-PCR (a method

based on amplification of genomic RNA [1]). A wide

range of viral load fluctuation over time was observed

within some untreated patients [2]. Treatment efficacy is

reduced when viraemia is greater than 5.7-6.0 log10

IU/mL [3]. Knowledge of viral load fluctuations could

lead to a more optimised treatment initiation time point.

To date several studies have attempted to elucidate viral

load fluctuation within untreated patients. Halfon et al. [2]

showed that viral load fluctuation within untreated pa-

tients was significant. Arase et al. [4] illustrated the ratio

of the maximum viral load to the minimum viral load was

related to acute exacerbation. Pontisso et al. [5] demon-

strated that the mean difference between the maximum

viral load and minimum viral load was significantly dif-

ferent between normal transaminases and fluctuating

transaminases. Our previous studies [6, 7] have showed

that viral load does change over time in some patients and

exhibits periods of apparent stability in others.

The complex biological parameters that impact on

HCV viral load are essentially unknown. However, what

is known is that the magnitude of the viral load at any one

time represents the output of the equilibrium between

viral production and host mediated viral clearance. The

phenomenon of replicative homeostasis may explain viral

load fluctuation over time within an untreated patient [8].

Replicative homeostasis consists of a series of autoregula-

tory feedback epicycles that link RNA polymerase func-

tion, RNA replication and viral production and presents a

model which may rationalize why viraemia modulates

over time. Replicative homeostasis results dynamic equi-

librium controlled by the specificity of the interactions

between mutant or wild type envelope proteins and the

replicas, the RNA dependant RNA polymerase (RDRP).

In other words, a highly progressive RDRP exhibits a low

86 J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90

fidelity of replication yielding a high intracellular concen-

tration of mutant type envelope proteins. This mutant

population competes with wild type forms, resulting in a

progressive increase in fidelity and a shift in the dynamic

equilibrium which results in the generation of a dominant

viral quasispecies and which may be reflected in a change

in the absolute magnitude of the viral load. The rate of

oscillation between high and low fidelity is likely to be

influenced, in a unique way within each viral-host pairing

by the visibility of the antigenic quasispecies to immune

enhanced viral clearance. Low quasispecies complex is

associated with increased antigen concentration, if the

threshold of immune activation is passed, immune po-

tency is increased and active clearance of particular viral

isolates can take place. The removal of the dominant qua-

sispecies may be followed by a parallel temporally mis-

matched decrease in viral load.

Based on interaction of HIV with cells of the immune

system, various mathematical models have been devel-

oped to fit HIV viral load data [9, 10]. These models were

developed to describe viral dynamics during antiviral

treatment. Thus, they cannot be used for long term inves-

tigations of viral load progression in untreated patients.

The current knowledge of HCV does not provide a

mathematical model for describing long term HCV dy-

namics within an untreated patient. We carried out an

empirical modelling analysis to investigate whether dif-

ferent fluctuation patterns exist and how these patterns (if

exist) are related to host-specific factors. The analysis

comprised a two-step approach: first, a statistical model

was developed for describing overall viral load fluctua-

tion as a function of time, taking individualization into

account. Then, we examined effects of host-specific fac-

tors on viral load fluctuation by incorporating covariates

into the model.

A mixed effects logistic model for viral load fluctua-

tion is developed in Section 2. By sequential modelling,

the effects of host-specific factors (covariates) on viral

load fluctuation are investigated in Section 3. The results

are discussed in Section 4.

2. MODELLING VIRAL LOAD FLUCTUA-

TION

2.1. Data

The data used consisted of 147 untreated patients chroni-

cally infected with hepatitis C, each contributing between

2 to 10 years of measurements. The study population was

divided according to likely source of infection. Group A

consisted of 85 individuals whose sole risk factor for ac-

quisition of hepatitis C was iatrogenic infection

throughreceipt of HCV genotype 1 subtype b (HCV 1)

contaminated Anti-D immunoglobulin [11]. Group B

consisted of 62 individuals whose risk factors for acquisi-

tion of hepatitis C infection likely to be one or more of

the followings: intravenous drug use, receipt of

Time (year)

Viral load (log10IU/mL)

02468100246810024681002468100246810

024681002468100246810

Figure 1. For each patient viral load is plotted against time from

the first viral load sampling.

contaminated blood, or blood products other than Anti-D

immunoglobulin or infection was of undefined aetiology

(sexual). The genotype composition of Group B is 1 and

2.2. Models for viral load fluctuation

Plotting viral load against time post infection is a natural

way for visualising viral load fluctuation. However, due

to the asymptomatic nature of HCV, the exact time of

when infection occurred is not available for patients in

group B. Figure 1 shows the profiles of the viral load of

the 147 available patients against time from the first viral

load sampling. The study duration time ranged from 24 to

120 months (25% percentile, median, 75% percentile: 60,

84, 108 months). The mean intervals of blood sampling

from each patient varied from 3 to 23 months (25% per-

centile, median, 75% percentile: 9, 10, 13 months). Figure

1 shows a wide range of viral load fluctuation in

J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90 87

0246810

34567

Tim e

log10.IU.per.mL

Lowess

logistic

quadratic

Figure 2. Scatter plot of viral load overlaid with the LOWESS,

logistic and quadratic fitted curves.

some untreated patients. In addition, the viral load pro-

files vary between patients. To identify an appropriate

model to describe the overall fluctuation of viral load as a

function of time, the locally-weighted polynomial regres-

sion smoothing (the R function LOWESS) was performed

on the data (Figure 2). The pattern obtained from the

LOWESS shows that viral load increase over some time

period, followed by a period of the more stabilized viral

load. Such patterns can be described by logistic models

[12]

,))/)81ln()(exp(1/( ijijij ty

−

−+= (1)

where ij

y is the logarithm (base 10) of the measurement

of viral load for the ith patient at the jth measurement time

(from the first viral load sampling) ij

t and the model er-

ror ij

is assumed to be i.i.d. N (0,2

). The parameter

is the asymptote (the limit of viral load growth),

the midpoint, the time when the most rapid viral load

growth occurs. The scale parameter

is the growth time,

time interval during which growth progresses from 10%

to 90% of the asymptote [12]. )81ln( is introduced into

model (1) to facilitate interpretation of

As a comparison we fed the quadratic polynomial

model to data as well. The fitted curves corresponding to

the quadratic polynomial model (AIC=2896.37) and lo-

gistic model (AIC=2896.01) are also shown in Figure 2.

As can be seen from Figure 2 both models captured the

basic structure of the LOWESS smooth with the logistic

one performing better at the tail. Having a smaller AIC

value the logistic model was selected to describe the

overall pattern of viral load fluctuation.

It is also evident from Figure 1 that there exists a large

patient to patient variation in viral load fluctuation. To

account for this patient to patient variation random com-

ponents were introduced into model (1) yielding the fol-

lowing mixed effects model

,)))/(ln(81)

))(((/(1)(

iji

iijiij r

btexpay

εγ

βα

++×

+−−++= (2)

where ),,( iii rba are assumed to be the i.i.d random vec-

tor, independent of the model errors and follow the nor-

mal distribution N(0,Σ).

is replaced by i

to ac-

count for patient to patient variation in the viral load as-

ymptote.

is called the fixed effect and represents the

mean level of the viral load asymptote for the population.

a is called the random effect and represents the individ-

ual patient departure from the mean level. Similarly, the

fixed effects

and

represent the mean levels of the

midpoint and growth time for the population, respectively.

The random effects i

band i

rare the individual patient

departures from the mean levels of the midpoint and

growth time, respectively.

The R package nlme [13] was used to fit nonlinear

mixed effects models. In model (2), all three parameters

consisted of a fixed effect term and a random effect term.

Such models might be over parameterized. In these cases,

the variance-covariance matrix of random effects become

seriously ill-conditioned, making convergence difficult or

impossible. We adopted model building strategies sug-

gested in [13] to determine an adequate but parsimonious

model.

We began with fitting model (2) to data. Convergence

was achieved. However, convergence was sensitive to

changes in the initial values and the algorithm failed to

converge using values slightly different from the fitted

values. Checking the fitted model we found that the esti-

mated covariance matrix

)

has off diagonal terms of

zero. Hence, we refitted model (2) with a diagonal co-

variance matrix

. Then the possibility of eliminating one

or more random effects from the model was investigated.

First, we compared models generated by eliminating one

of the three random effects from model (2) and found that

the model with random components in

and

had the

largest likelihood value, termed as model A. Then we

considered models generated by removing two of the

three random effects from model (2) and found that the

model with a random effect in

had the largest likeli-

hood value, termed as model B. To establish the signifi-

cance of random effects, we followed the procedure de-

scribed by Verbeke and Molenberghs [14], who provide

an outline for testing the need for random effects by com-

paring the log-likelihood between the nested models with

and without random effects. The asymptotic null distribu-

tion of the test statistics is a mixture of Chi-squares. The

results are summarised in Table 1. As can be seen from

the table there are significant random effects in

(P<0.0001) and

(P<0.0001). Model B was preferred.

Hence we re-parameterized model (2) as follows

iijiij bty

εγ

βαα

+×

+−−++=

))/)81ln(

))((exp(1/()( (3)

When fitting model (3) to data, we estimate the stan-

dard deviations of the random effects as 0.63 (log10IU/mL)

and 1.32 (year) for the asymptote and midpoint of growth

time, respectively.

The time variable used in the model is the time from

the first viral load sampling. It may not be related to the

time since onset of infection. A patient can enter the study

at any time since onset of infection. For example, some

88 J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90

Table 1. Comparisons of nested models to establish significance

of random effects.

Random effects LR test P-Value

a Model (1) vs Model B <0.001

b Model A vs Model A <0.001

c Model A vs Model (2) Nonsignificant

may present two months after the time of infection; others

may present one year later. Such time differences can be

accounted for in the model by random effects in

. Ran-

dom effects in

allow the logistic growth curve to fit

viral load profiles measured at different time spans from

onset of infection. Significant patient to patient variation

(P<0.001) may reflect that patients may have en-

tered the study at different time from the times of infec-

tion. The significant random effects in

(P<0.001) indi-

cates that there exist large patient to patient variation in

the asymptote. However, from a clinical perspective our

current understanding of hepatitis C disease is such that

this variation is difficult to interpret with certainty.

3. ANALYSIS OF COVARIATES

In this section, we examined the effects of covariates on

the viral load fluctuation by sequentially incorporating

them into model (3). A similar approach has been used in

[15]. The covariates considered are viral genotype, gender,

infection source (Group A and Group B) and age (< 45

years verses ≥ 45 years). In model (3) there are three

fixed effects terms,

and .

Any given covariate may

have a significant effect on at least one of the terms. For

example, gender had significant a effect on

(P<0.05)

but not on

and .

When a covariate was incorporated

into the model only were significant terms retained. Inter-

actions were considered only if a fixed term was signifi-

cantly affected by at least two covariates. Initially, one

covariate was incorporated into model (3). Incorporating

age into model (3) failed to produce any significant term.

Hence, three one-covariate models were produced. The

one-covariate model with the maximum likelihood was

selected as the optimum one-covariate model. Subse-

quently, one of the remaining covariates was incorporated

into this model and yielded no significant term. The selec-

tion process was stopped. The process is summarized in

Table 2. The final model selected to fit the viral load data

is model (4) with

influenced by infection source and

can be expressed as follows

,)))/()81ln(

))((exp(1/()(

ijr

iijiij btay

εγγ

βα

=+×

+−−++= (4)

where r

represents infection source effect on the growth

time

. The significance of this term was assessed using

the F-tests (P<0.0001).

As an alternative approach we incorporated covariates

into model (3) using the backward elimination approach.

The selection process began with incorporating all the

Table 2. Sequentially incorporating covariates into model (3) to

determine important covariates. Significant term is indicated in the

bracket.

Models P-value Log-

likelihood

Time (model (3)) -946.40

Time + gender (

) <0.001 -931.34

Time + infection source (

) <0.001 -925.63

Time + genotype (

) <0.001 -946.01

covariates into model (3). Then non significant terms

were removed from the model recursively. Using a

significance level of 0.05, this approach also selected

model (4) as the final model.

A likelihood ratio test was used to test the difference be-

tween the fixed effects represented by model (3) and(4).

The result favors model (4) with P<0.0001.

Plotting the standardized residuals against the fitted

values of model (4) and the Normal plot of the residuals

(not shown) did not illustrate any concerns about the

model assumptions. The individual fitted curves (Figure

3) indicate good fits to the data, keeping in mind that the

data were irregularly and sparsely sampled.

The fitting of model (4) to the data is summarised in

Table 3. The results show that the mean viral load as-

ymptote for the population was 6.44 (log10IU/mL). The

most rapid viral load growth occurred, on average, 3.29

years before patients entered the study. The mean growth

time for the population was 11.04 years. The patients in

Group A experienced, on average, significantly longer

growth time (delta = 3.80 year) compared to those in

Group B.

Since all patients in Group A are female and have vi-

ral genotype 1, to separate the gender and genotype ef-

fects from the infection source effect, the previous ap-

proach to evaluating the effects of covariates by sequen-

tially incorporating them into model (3) was applied to

Group B. No significant one-covariate model was pro-

duced. This finding is very different from the results de-

scribed earlier. Using the full data we found that all co-

variates except age had a significant effect on viral load

growth pattern (see Table 2). One possible explanation is

that the gender and genotype significance found in the

previous analysis may be due to the significance of infec-

tion source. Another possibility is that number of patient

in Group B was small and there may have not been suffi-

cient power to identify gender and genotype significance.

4. CONCLUSIONS AND DISCUSSIONS

We have developed a mixed effects logistic model for

describing the viral load fluctuation within untreated pa-

tients with chronically infected HCV. The model’s diag-

nostic analysis indicated that the proposed model pro-

vided a good fit to the data. Due to the asymptomatic na-

ture of HCV, it is impossible to determine the exact time

when a patient was infected. Hence the time variable used

in the model is the time since an individual patient pre-

J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90 89

sented.

Table 3. The results of fitting model (4) to the data.

Terms Value Std.errors p-Value

: Grand mean 6.44 0.068 <0.0001

: Grand mean -3.290.223 <0.0001

: Grand mean 11.040.750 <0.0001

:Group B- Group A -3.800.444 <0.0001

Time (year)

Viral load (log10IU/mL)

02468100246810 024681002468100246810

02468100246810 0246810

Figure 3. Viral load is ploted against time from the first viral load

sampling, overplayed with the individual fitted curve (solid line),

for each of 147 patients.

Random effects in

allow the model to fit viral load

profiles beginning at different time from the time of in-

fection. As shown in Figure 3, partially measured viral

load profile can be fitted by the corresponding section of

the logistic growth curve.

Recently, Gray et al. [15] conducted an empirical mod-

elling of HIV-RNA viral load in vertically infected chil-

dren. They chose a linear model to represent the overall

pattern of HIV-RNA viral load among conventional poly-

nomials, change point models, and fractional models.

They compared various algorithms to fit the linear model.

Based on exploratory analysis of data we chose a nonlin-

ear model to describe the overall pattern of HCV viral

load within untreated patients. We investigated effects of

host-specific factors based on nonlinear mixed effects

models.

The quantification of viraemia is a snap shot in time of

the amount of virus present. The diurnal variation in viral

load is unknown; however, the amount of virus present at

any one time is the outcome of the equilibrium between

viral production (replication of genomic RNA and pro-

duction of infectious virions) and destruction of infected

hepatocytes. Although, there is considerable patient to

patient variation in viral load asymptote, from a clinical

perspective our current understanding of hepatitis C dis-

ease is such that these variations are difficult to interpret

with certainty. What is of interest is the difference in viral

load growth time. The growth time determines the

“speed” at which the asymptote is reached. A clinically

important parameter is the pre-treatment viral load. Viral

load has been established by numerous studies as an in-

dependent variable which determines outcome while on

anti-viral treatment. Patients with a viral load value below

6 log10 IU/mL are known to respond with higher efficacy

to therapy [3,16]. The difference in the growth time be-

tween groups A and B would indicate that more time

could be afforded to the pre-treatment assessment period

for group A while viral load remained within the range of

optimum efficacy.

The distinguishing feature of data presented here is the

longitudinal nature of the viral load measurement. Group

A represents a globally unique homogeneous cohort with

respect to the investigation of the natural history of hepa-

titis C infection. The empirical model proposed here is the

first attempt to describe long term viral load fluctuation in

untreated patients.

REFERENCES

[1] Fanning at el. (1999) Viral load and clinic pathological features of

chronic hepatitis C (1b) in a homogeneous patient population.

Hepatology, 29, 904-7.

[2] Halfon et al. (1998) Assessment of spontaneous fluctuations of

viral load in untreated patients with chronic hepatitis C by two

standardized quantitation methods: branched DNA and amplicor

monitor. J Clin Microbiol, 36(7), 2073–2075.

[3] Shiffman et al. (2007) Peginterferon alfa-2a and ribavirin for 16 or

24 weeks in HCV genotype 2 or 3. N Engl J Med,357(2), 124-34.

[4] Arase, Y., Ikeda, K., and Chayama, K. (2000) Fluctuation patterns

of HCV-RNA serum level in patients with chronic hepatitis C. J

Gastroenterol, 35, 221-225.

[5] Pontisso, P., Bellati, G., and Brunetto, M. (1999) Hepatitis C virus

RNA profiles in chronically infected individuals: Do they relate to

disease activity? Hepatology, 29, 585–589.

[6] Fanning et al. (2000) Natural fluctuations of hepatitis C viral load

in a homogeneous patient population: a prospective study.

Hepatology, 31, 225-9.

[7] Fanning et al. (2001) LA class II genes determine the natural

variance of hepatitis C viral load. Hepatology, 33, 224-30.

[8] R. Sallie. (2005) Replicative homeostasis II: influence of

polymerase fidelity on RNA virus quasispecies biology:

implications for immune recognition, viral autoimmunity and other

"virus receptor", diseases. Virol J, 22, 2-70,.

90 J. Huang et al. / J. Biomedical Science and Engineering 1 (2008) 85-90

[9] Wu, H. (2005) Statistical methods for HIV dynamic studies in

AIDS clinical trials. Statistical Methods in Medical Research, 14,

171-192.

[10] Donnelly, C.A. and Cox, D.R. (2001) Mathematical biology and

medical statistics: contributions to the understanding of AIDS

epidemiology. Stat Methods Mes Res, 10(2), 141-54.

[11] Kenny-Walsh E. (1999) Clinical outcomes after hepatitis C

infection from contaminated anti-D immune globulin. Irish

Hepatology Research Group. N Engl J Med, 16, 1228-33.

[12] Meyer, P.S., Yung, J.W., and Ausubel, J.H. (1999) A Primer on

Logistic Growth and Substitution: The Mathematics of the Loglet

Lab Software. Technological Forecasting and Social Change, 61,

247-271.

[13] Pinheiro, J.C., and Bates, D.M. (2000) Mixed-Effects Models in S

and S-PLUS, Springer.

[14] Verbeke, G., Molenberghs, G. (2000) Linear Mixed Models for

Longitudinal Data. Springer, New York.

[15] L. Gray, M. Cortina-Borja and ML Newell. (2004) Modelling

HIV-RNA viral load in vertically infected children. Statist. Med,

23, 769-781.

[16] Zeuzem et al. (2006) Efficacy of 24 weeks treatment with

peginterferon alfa-2b plus ribavirin in patients with chronic

hepatitis C infected with genotype 1 and low pretreatment viremia.

J Hepatol, 44, 97-103.