﻿ Credit Scoring with Ego-Network Data

Journal of Mathematical Finance
Vol.09 No.03(2019), Article ID:94539,13 pages
10.4236/jmf.2019.93027

Credit Scoring with Ego-Network Data

Stanley Sewe1 , Philip Ngare2, Patrick Weke2

1Department of Mathematics, Pan African University Institute of Basic Sciences, Technology and Innovation, Juja, Kenya

2School of Mathematics, University of Nairobi, Nairobi, Kenya    Received: April 10, 2019; Accepted: August 19, 2019; Published: August 22, 2019

ABSTRACT

This article investigates a stochastic filtering problem whereby the borrower’s hidden credit quality is estimated using ego-network signals. The hidden credit quality process is modeled as a mean reverting Ornstein-Ulehnbeck process. The lender observes the borrower’s behavior modeled as a continuous time diffusion process. The drift of the diffusion process is driven by the hidden credit quality. At discrete fixed times, the lender gets ego-network signals from the borrower and the borrower’s direct friends. The observation filtration thus contains continuous time borrower data augmented with discrete time ego-network signals. Combining the continuous time observation data and ego-network information, we derive filter equations for the hidden process and the properties of the conditional variance. Further, we study the asymptotic properties of the conditional variance when the frequency of arrival of ego-network signals is increased.

Keywords:

Stochastic Filtering, Bayesian Updating, Credit Scoring, Filtration, Ego-Network 1. Introduction

In this article we propose a filtering technique that uses ego-network signals to estimate a hidden process. Consider a financial market with a single lender and borrowers, who are represented by nodes in a dynamic social network. For a particular borrower, let the process ${X}_{t}$ modeled as a mean reverting Ornstein-Ulehnbeck process capture her1 true credit quality. On account of the information asymmetry between the borrower and the lender, the lender is unable to directly observe ${X}_{t}$. However through interactions with the borrower, the lender gets to observe a continuous time process ${Y}_{t}$, modeled as a linear diffusion process. ${X}_{t}$ drives the drift of ${Y}_{t}$. This is a continuous time linear state space model with ${X}_{t}$ being the state process and ${Y}_{t}$ the observation process. Kalman-Bucy filtering can thus be used to obtain the “optimal” estimate of ${X}_{t}$ in the mean square sense.

We assume that network ties are based on homophily. Homophily  , is the idea that individuals with similar characteristics are likely to be friends than individuals with different characteristics. Thus social network ties are based on closeness in credit type: the probability that two individuals will create/maintain a network tie between them is proportional to the distance between their credit types. The probability of a network tie formation/termination is conditional on the parties meeting. The meeting process is modeled as a random event whose probability is a deterministic function of the population size of the network, large population leading to sparse networks. Individuals know their credit type and can also observe the credit type of their direct friends (alters) in the network. The social network is thus modeled as a dynamic latent space network. The lender’s view of the network is restricted to ego-network signals of borrowers at fixed discrete times. At times $0={t}_{0}<{t}_{1}<\cdots <{t}_{N-1}, the lender observes the particular borrower’s ego network and receives unbiased signals related to her credit quality and the credit quality of her alters. Thus at the information times ${t}_{k},\text{}k=0,1,\cdots ,N-1\text{}$, the lender gets to observe the vector ${Z}_{k}$ constituting the unbiased signals of the credit quality of the borrower and her alters. The dimension of the vector is a function of the actual degree (number of alters) of the borrower at the time ${t}_{k}$.

Our model proposes the inclusion of the ego-network signals ${Z}_{k}$ into the filtering of the process ${X}_{t}$. The lender’s observation filtration is augmented by the filtration generated by ${Z}_{k}$ at discrete time points. In the proposed model, Bayesian updating at times ${t}_{k}$ is used to incorporate the information from the ego-network signals into the estimation of ${X}_{t}$. We note that by the Gaussian nature of the processes ${X}_{t},{Y}_{t}$ and ${Z}_{k}$ and the formulation of the ego-network likelihood, the updated estimate of ${X}_{t}$ at times ${t}_{k}$ remains Gaussian and we derive explicit results for its mean and variance. We also derive results showing that the inclusion of the signals ${Z}_{k}$ leads to lower conditional variance for the filtered process. By introducing the meeting probability in network tie formation, our model is an extension of  , where the conditional expected number of friends was treated as a constant. Further, we study the asymptotic behavior of the conditional variance as the frequency of network information arrivals $N\to \infty$. Increasing the frequency of network information arrival times leads to clearer signals, and in the limit as $N\to \infty$, we get to the full information scenario.

There exists several studies on the statistical modeling of social network. Some of the models proposed in these studies include the coevolution model of  whereby the authors proposed a continuous time network model. The nodal attributes modeled as Markov chains influence the formation of network ties, which in turn influence the transition probabilities of the nodal attributes. In  , the authors proposed a static latent space network model, where the nodal attributes are in a low dimensional Euclidean space, and these attributes influence the formation of network ties. The static model has been extended severally to the time varying case by among others,  who proposed a directed dynamic latent space model. For a review of the recent studies on latent space network models, see e.g.  .

Existing studies on the mathematical modeling of consumer credit risk include  where the authors proposed a continuous time model of a borrower’s credit type. By modeling the credit type as a jump diffusion process and applying the Option pricing theory, the authors were able to derive explicit formulation of the borrower’s default probability. Consumer credit risk modelling is mainly focused on credit scoring, the use of statistical models to aid in credit granting decisions. Common techniques used for credit scoring include linear discriminant analysis, logistic regression, bayesian classifiers, random forest and finite Markov chains, see e.g.  for a review. In recent times, Hidden Markov models (HMM) have been applied for credit scoring e.g.  who compared the performance of HMM and logistic regression in the classification of customers and evaluation of the probability of default.  modeled the consumer’s credit rating as a discrete time Markov chain process upon incorporating a latent variable which captures the prevailing economic conditions. In  , the author proposed a credit scoring model whereby the borrower’s hidden credit type modeled as a discrete time Markov chain is learned through observing network related variables including reputation, trust and distrust. Proposing a static credit scoring model,  used ego-network signals to update the lender’s belief of the borrower’s unobserved credit type modeled as a Gaussian random variable. For a review of the application of social network data to consumer credit risk modeling , see  .

In  , the authors augmented the observation filtration with discrete time expert opinion to estimate the hidden Gaussian process driving the drift of the stock price. The model is an extension of the Black-Litterman model of  to the continuous time case. The authors in  estimated the unobserved drift parameter on an observation filtration initially enlarged with some anticipative information perturbed by independent noise. Text book treatment of stochastic filtering includes  ,  and  .

The article is organized as follows. In Section 2, the credit risk and dynamic network models are presented. Results on stochastic filtering are presented in Section 3. In Section 4, the properties of conditional variances are derived in details under the various information setups. Brief numerical results are presented in Section 5, whilst Section 6 concludes.

2. The Model Setup

Consider a filtered probability space $\left(\Omega ,\mathcal{A},\mathbb{A},ℙ\right)$ with $\mathbb{A}={\left({\mathcal{A}}_{t}\right)}_{t\ge 0}$ satisfying the usual conditions of right continuity and completeness. All processes are assumed to be $\mathbb{A}$ adapted.

Borrower’s Behavioral Dynamics

The borrower’s hidden credit quality process ${X}_{t}$ is modeled as a mean-reverting Ornstein-Ulehnbeck process defined as

$\text{d}{X}_{t}=\mu \left(\delta -{X}_{t}\right)\text{d}t+\gamma \text{d}{B}_{t}$ (1)

${X}_{0}=x$

where $\delta \in ℝ,\mu >0,\gamma >0$ are constants and ${B}_{t}$ is a Brownian motion. $x\sim \mathcal{N}\left({m}_{0},{v}_{0}\right)$,$\mathbb{A}$ measurable and is independent of B. Thus ${X}_{t}$ is a Gaussian process with the mean and variance given by

$\delta +{\text{e}}^{-\mu t}\left({m}_{0}-\delta \right)$ (2)

$\frac{{\gamma }^{2}}{2\mu }+{\text{e}}^{-2\mu t}\left({v}_{0}-\frac{{\gamma }^{2}}{2\mu }\right)$ (3)

respectively. The hidden credit quality ${X}_{t}$ drives the drift of the borrower’s observed behavioral dynamics which is modeled as a diffusion process defined as

$\text{d}{Y}_{t}=\alpha {X}_{t}\text{d}t+\sigma \text{d}{W}_{t}$ (4)

The parameters $\alpha ,\sigma >0$ are assumed to be constants and ${W}_{t}$ is a $\mathbb{A}$ adapted one dimensional Brownian motion. ${W}_{t}$ and ${X}_{t}$ are assumed to be independent.

Network Dynamics

Let $\mathcal{Z}$ be the population of a society, such that individuals are represented as nodes in a dynamic network. Each individual in the population is assumed to have an independent and time varying credit quality ${X}_{it}$ modeled as a Gaussian process. When a pair of individuals i and j get the opportunity to meet, they may decide to create, terminate or continue a network tie by mutual consent. Thus network tie formation and termination are conditioned on the

meeting probability $\upsilon =\frac{1}{\mathcal{Z}\left(\mathcal{Z}-1\right)}$. Modeling the meeting probability as a

function of population size captures network sparseness, which is a property observed in real life social networks. Thus the meeting probability reduces with increased number of individuals in the population. Assuming an undirected network i.e. ${\mathcal{Y}}_{ij}\left(t\right)={\mathcal{Y}}_{ji}\left(t\right)$, then we let

${\mathcal{Y}}_{ij}\left(t\right)|{\pi }_{ij}\left(t\right)~\text{Bern}\left({\pi }_{ij}\left(t\right)\right)$ (5)

for every $i\ne j$ and $t\ge 0$ with

${\pi }_{ij}\left(t\right)={\text{e}}^{-\frac{1}{2}{\left({X}_{it}-{X}_{jt}\right)}^{2}}$ (6)

Hence the network ties ${\mathcal{Y}}_{ij}\left(t\right)\in \left\{0,1\right\}$ are independent Bernoulli random variables conditioned on the nodal attributes ${X}_{t}$. The network tie formation probability ${\pi }_{ij}\left(t\right)\in \left(0,1\right)$ is modeled as a probit link function. Conditional on the meeting process, existence of a network tie between individual i and j at time t is a function of the Euclidean distance between their respective credit types. The network model captures homophily, since shorter distance between credit types leads to higher probability of network tie formation. The model assumes zero cost incurred on network tie formation or termination.

Define ${G}_{t}$ as the graph of friendship ties in the society at time t. The set of borrower i’s direct friends (alters) at time t, known as her ego-network is defined as ${g}_{t}^{1}=\left\{ij|ij\in {G}_{t}\right\}$. For a particular borrower i, we consider her hidden credit quality process ${X}_{t}$ and observed behavioral score ${Y}_{t}$.

Lender’s information

The lender observes in continuous time the process ${Y}_{t}$ denoting the borrower’s behavior. Further, at discrete fixed times $0={t}_{0}<{t}_{1}<\cdots <{t}_{N-1} the lender observes the borrower’s ego network and receives signals from her and her alters. Let the vector ${Z}_{k}=\left\{{Z}_{ik},{Z}_{jk}|ij\in {g}_{t}^{1}\right\}$ denote the ego-network signals received by the lender at times ${t}_{k},k=0,1,\cdots ,N$, comprising the borrower’s own signal ${Z}_{ik}={X}_{i{t}_{k}}+{\sqrt{\Lambda }}_{k}{\epsilon }_{ik}$ and the signals from her alters ${Z}_{jk}={X}_{j{t}_{k}}+{\sqrt{\Lambda }}_{k}{\epsilon }_{jk}$. The variables ${\epsilon }_{jk}~\mathcal{N}\left(0,1\right)$ are i.i.d across individuals with $\mathbb{E}\left({\epsilon }_{js}{\epsilon }_{lu}\right)=0$ for $j\ne l$,$s\ne u$. Thus the lender receives noisy but unbiased signals upon observing the borrower’s ego-network at time ${t}_{k}$.

The information available to the lender can thus be represented by the following filtrations

$\begin{array}{l}{\mathbb{F}}^{Y}={\left({\mathcal{F}}_{t}^{Y}\right)}_{t\in \left[0,T\right]}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}{\mathcal{F}}_{t}^{Y}\text{\hspace{0.17em}}\text{generated}\text{\hspace{0.17em}}\text{by}\text{\hspace{0.17em}}\left\{{Y}_{s},s\le t\right\}\\ {\mathbb{F}}^{Z}={\left({\mathcal{F}}_{t}^{Z}\right)}_{t\in \left[0,T\right]}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}{\mathcal{F}}_{t}^{Z}\text{\hspace{0.17em}}\text{generated}\text{\hspace{0.17em}}\text{by}\text{\hspace{0.17em}}\left\{{Y}_{s},s\le t,{Z}_{k},{t}_{k}\le t\right\}\\ {\mathbb{F}}^{O}={\left({\mathcal{F}}_{t}^{O}\right)}_{t\in \left[0,T\right]}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}{\mathcal{F}}_{t}^{O}\text{\hspace{0.17em}}\text{generated}\text{\hspace{0.17em}}\text{by}\text{\hspace{0.17em}}\left\{{Z}_{k},{t}_{k}\le t\right\}\end{array}$

${\mathbb{F}}^{Y}$ corresponds to the continuous time behavioral information only, ${\mathbb{F}}^{O}$ consists of the ego-network signals received at discrete times whilst ${\mathbb{F}}^{Z}$ is the combination of behavioral information and the ego-network signals. We assume that the σ-algebras ${\mathcal{F}}_{t}^{Y}$ and ${\mathcal{F}}_{t}^{Z}$ are augmented with the $ℙ$ null sets. Note that for each $t\ge 0$,${\mathcal{F}}_{t}^{Z}={\mathcal{F}}_{t}^{Y}\vee {\mathcal{F}}_{t}^{O}$.

3. Stochastic Filtering

The focus of stochastic filtering is to estimate the hidden stochastic process ${X}_{t}$ based on observations up to time t. Let ${\stackrel{^}{X}}_{t}^{H}$ be the projection of the process ${X}_{t}$ onto the observed filtration ${\mathbb{F}}^{H},H\in \left\{O,Y,Z\right\}$ i.e. ${\stackrel{^}{X}}_{t}^{H}=\mathbb{E}\left({X}_{t}|{\mathcal{F}}_{t}^{H}\right)$. In this section, we derive explicit results for the filtering equations and the conditional variance of the hidden process ${X}_{t}$.

Behavioral Observations

When the lender’s observation σ-algebra is ${\mathcal{F}}_{t}^{Y}$, i.e when the lender does not receive any ego-network signals, we are in the realm of the classical Kalman-Bucy filter, see e.g.  and  . This is since the state and observation equations constitute a linear Gaussian state space model. Let ${\stackrel{^}{X}}_{t}^{Y}=\mathbb{E}\left({X}_{t}|{\mathcal{F}}_{t}^{Y}\right)$

and ${\lambda }_{t}^{Y}=\mathbb{E}\left[{\left({X}_{t}-{\stackrel{^}{X}}_{t}^{Y}\right)}^{2}|{\mathcal{F}}_{t}^{Y}\right]$ be the conditional mean and variance respectively in the σ-algebra ${\mathcal{F}}_{t}^{Y}$.

The dynamics of ${\stackrel{^}{X}}_{t}^{Y}$ is given by the following SDE

$\text{d}{\stackrel{^}{X}}_{t}^{Y}=\left(\mu \left(\delta -{\stackrel{^}{X}}_{t}^{Y}\right)-{\alpha }^{2}{\sigma }^{-2}{\lambda }_{t}^{Y}{\stackrel{^}{X}}_{t}^{Y}\right)\text{d}t+\alpha {\sigma }^{-2}{\lambda }_{t}^{Y}\text{d}{Y}_{t},{\stackrel{^}{X}}_{0}^{Y}={m}_{0}$ (7)

whilst the dynamics of ${\lambda }_{t}^{Y}$ is given by the deterministic ODE

$\frac{\text{d}{\lambda }_{t}^{Y}}{\text{d}t}=-2\mu {\lambda }_{t}^{Y}+{\gamma }^{2}-{\alpha }^{2}{\sigma }^{-2}{\left({\lambda }_{t}^{Y}\right)}^{2},{\lambda }_{0}^{Y}={v}_{0}$ (8)

Equation (8) is the well known Riccati equation, a deterministic equation whose unique solution is given as

${\lambda }_{t}^{Y}=\frac{-\mu {\sigma }^{2}}{{\alpha }^{2}}+{C}_{0}\frac{{C}_{1}+{C}_{2}{\text{e}}^{-2\frac{{\alpha }^{2}}{{\sigma }^{2}}{C}_{0}t}}{{C}_{1}-{C}_{2}{\text{e}}^{-2\frac{{\alpha }^{2}}{{\sigma }^{2}}{C}_{0}t}}$ (9)

given that the initial value is ${\lambda }_{0}^{Y}={v}_{0}$. In Equation (9), ${C}_{0}=\frac{\sigma }{\alpha }\sqrt{\frac{{\mu }^{2}{\sigma }^{2}}{{\alpha }^{2}}+{\gamma }^{2}}$,${C}_{1}={v}_{0}+{C}_{0}+\frac{\mu {\sigma }^{2}}{{\alpha }^{2}}$ and ${C}_{2}={v}_{0}-{C}_{0}+\frac{\mu {\sigma }^{2}}{{\alpha }^{2}}$ (see e.g.  ).

Behavioral Observations and Network Information

This is the case of most interest in the study. The lender’s observation σ-algebra is ${\mathcal{F}}_{t}^{Z}={\mathcal{F}}_{t}^{Y}\vee {\mathcal{F}}_{t}^{O}$ being the augmentation of ${\mathcal{F}}_{t}^{Y}$ with discrete time ego-network signals. Since the lender’s observation of the network is restricted to borrower i’s ego network, at each time $t\ge 0$, with no other additional borrower information, an individual’s credit quality is assumed to have the distribution ${X}_{jt}~\mathcal{N}\left(0,{q}^{-1}\right)$. The lender uses the assumed density for all other individuals ${X}_{jt}$ who are alters to borrower ${X}_{it}$. The following lemma gives the expected degree (number of direct friends) conditional on the borrower’s true credit type.

Lemma 1

At each time $t\ge 0$, conditional on the meeting process and the borrower’s credit quality ${X}_{t}$, the expected number of friends $\mathbb{E}\left({\eta }_{t}|{X}_{t}\right)$ is given as

$\frac{1}{\mathcal{Z}-1}\sqrt{\frac{q}{q+1}}\text{ }{\text{e}}^{-\frac{q}{2\left(q+1\right)}{X}_{t}^{2}}$

Proof.

Conditioned on the borrower’s true credit type ${X}_{t}$ and the meeting process, the probability of having a network tie with any other individual is

$\underset{-\infty }{\overset{\infty }{\int }}\text{ }{\text{e}}^{-\frac{{\left({X}_{t}-s\right)}^{2}}{2}}\sqrt{\frac{q}{2\text{π}}}\text{ }{\text{e}}^{-q\frac{{s}^{2}}{2}}\text{d}s=\sqrt{\frac{q}{q+1}}\text{ }{\text{e}}^{-\frac{q}{2\left(q+1\right)}{X}_{t}^{2}}$ (10)

Thus the conditional expected number of friends $\mathbb{E}\left({\eta }_{t}|{X}_{t}\right)$ is given by

$\mathbb{E}\left({\eta }_{t}|{X}_{t}\right)=\mathcal{Z}\upsilon \sqrt{\frac{q}{q+1}}\text{ }{\text{e}}^{-\frac{q}{2\left(q+1\right)}{X}_{t}^{2}}=\frac{1}{\mathcal{Z}-1}\sqrt{\frac{q}{q+1}}\text{ }{\text{e}}^{-\frac{q}{2\left(q+1\right)}{X}_{t}^{2}}$ (11)

Proposition 1.

For any $k=0,1,\cdots ,N-1$ and $t>0$, let ${p}_{k}={\Lambda }_{k}^{-1}$ denote the precision of the ego-network signals at time ${t}_{k}$. Further define the variable

${\theta }_{k}^{Z}=\frac{\left({p}_{k}+q+1\right)}{{\lambda }_{tk-}^{Z}\left({p}_{k}\left({p}_{k}+q+1\right)+{\eta }_{k}\left({p}_{k}+q\right)\right)+\left({p}_{k}+q+1\right)}$ (12)

Then it holds that:

1) For any $t\in \left[{t}_{k},{t}_{k+1}\right)$, the filtered estimate ${\stackrel{^}{X}}_{t}^{Z}$ is Gaussian with the dynamics

$\text{d}{\stackrel{^}{X}}_{t}^{Z}=\left(\mu \left(\delta -{\stackrel{^}{X}}_{t}^{Z}\right)-{\alpha }^{2}{\sigma }^{-2}{\lambda }_{t}^{Z}{\stackrel{^}{X}}_{t}^{Z}\right)\text{d}t+\alpha {\sigma }^{-2}{\lambda }_{t}^{Z}\text{d}{Y}_{t}$ (13)

whilst the equation of the conditional variance is given as

${\lambda }_{t}^{Z}=\frac{-\mu {\sigma }^{2}}{{\alpha }^{2}}+{C}_{0}\frac{{C}_{1k}+{C}_{2k}{\text{e}}^{-2\frac{{\alpha }^{2}}{{\sigma }^{2}}{C}_{0}t}}{{C}_{1k}-{C}_{2k}{\text{e}}^{-2\frac{{\alpha }^{2}}{{\sigma }^{2}}{C}_{0}t}}$ (14)

with initial values ${\stackrel{^}{X}}_{tk}^{Z}$ and ${\lambda }_{tk}^{Z}$. ${C}_{0}$ is same as in Equation (9) whilst ${C}_{1k}={\lambda }_{tk}^{Z}+{C}_{0}+\frac{\mu {\sigma }^{2}}{{\alpha }^{2}}$ and ${C}_{2k}={\lambda }_{tk}^{Z}-{C}_{0}+\frac{\mu {\sigma }^{2}}{{\alpha }^{2}}.$

2) At information date ${t}_{k}$,${\stackrel{^}{X}}_{t,k}^{Z}$ is Gaussian. The mean ${\stackrel{^}{X}}_{t,k}^{Z}$ and variance ${\lambda }_{t,k}^{Z}$ are updated from their respective values before the arrival of ego-network signals ${t}_{k-}$ to

${\stackrel{^}{X}}_{t,k}^{Z}={\theta }_{k}^{Z}{\lambda }_{tk-}^{Z}\left(\frac{{\stackrel{^}{X}}_{t,k-}^{Z}}{{\lambda }_{tk-}^{Z}}+{p}_{k}{Z}_{ik}+\frac{{p}_{k}}{{p}_{k}+q+1}\underset{j\in Z}{\sum }\text{ }\text{ }{Z}_{jk}\right)$ (15)

and variance

${\theta }_{k}^{Z}{\lambda }_{tk-}^{Z}=\frac{{\lambda }_{tk-}^{Z}\left({p}_{k}+q+1\right)}{{\lambda }_{tk-}^{Z}\left({p}_{k}\left({p}_{k}+q+1\right)+{\eta }_{k}\left({p}_{k}+q\right)\right)+\left({p}_{k}+q+1\right)}$ (16)

Proof.

1) Between two information dates, $t\in \left[{t}_{k},{t}_{k+1}\right)$, there is no new arrival of ego-network signals. The lender’s σ-algebra is defined as ${\mathcal{F}}_{t}^{Z}={\mathcal{F}}_{t,k}^{Z}\vee \sigma \left\{{Y}_{s},{t}_{k}. Thus we revert to the classical Kalman-Bucy filtering situation with the respective initial values for the conditional mean and variances given as ${\stackrel{^}{X}}_{tk}^{Z}$ and ${\lambda }_{tk}^{Z}$. For this case, the formulations for conditional mean and variance follow closely from Equations (7) and (8).

2) On the information arrival date ${t}_{k}$, the lender receives ego-network signals ${Z}_{k}$ and gets to update the conditional mean and variance of the filtered estimate. To incorporate the ego-network signals into the estimate, Bayesian updating is carried out since there is no time evolution from ${t}_{k-}$ to t. At time ${t}_{k-}$, the conditional prior distribution of ${X}_{t}$ is Gaussian and the signals received are also Gaussian. The posterior probability of the borrower’s credit type $ℙ\left({X}_{{t}_{k}}|{Z}_{k}\right)$ is obtained by

$\begin{array}{l}ℙ\left({X}_{{t}_{k}}|{Z}_{k}\right)\propto ℙ\left({X}_{{t}_{k}},{Z}_{k}\right)\\ =\underset{-\infty }{\overset{\infty }{\int }}\text{ }ℙ\left({Z}_{k}|{X}_{{t}_{k}},{X}_{j,{t}_{k}}\right)ℙ\left({X}_{{t}_{k}}\right)ℙ\left({X}_{j,{t}_{k}}|{X}_{{t}_{k}}\right)\text{d}{X}_{j,{t}_{k}}\\ =\underset{-\infty }{\overset{\infty }{\int }}\text{ }ℙ\left({Z}_{k}|{X}_{{t}_{k}},{X}_{j,{t}_{k}}\right)ℙ\left({X}_{{t}_{k}}\right)ℙ\left({X}_{j,{t}_{k}}\right)\text{d}{X}_{j,{t}_{k}}\end{array}$

The last equality is as a result of the assumption of independence for the ${X}_{jtk}$. We have $ℙ\left({X}_{j,{t}_{k}}\right)\propto \underset{ij\in {g}_{k}^{1}}{\prod }{\text{e}}^{-\frac{q}{2}{X}_{j,{t}_{k}}^{2}}$ being the assumed density of any individual ${X}_{j}$ for $j\ne i$. Thus the integrand is given by

(17)

where

 The first term denotes the product of the conditional prior density (before the arrival of network information) and the likelihood function for the observation ${Z}_{it,k}$.

 (a) denotes the assumed prior density for ${X}_{jt}$ for $ij\in {g}_{{t}_{k}}^{1}$ times the probability that at time ${t}_{k}$ borrower i is friends with the individuals $j,\text{}ij\in {g}_{{t}_{k}}^{1}$ within her ego-network whose signals are in ${Z}_{k}$ and that these friends have the signals as collected in ${Z}_{k}$

 (b) denotes the probability that at time ${t}_{k}$ borrower i is not friends with anyone outside ${g}_{{t}_{k}}^{1}$.

As $\mathcal{Z}\to \infty$,$\upsilon \to 0$, then by the monotone convergence theorem and applying lemma 1 we have

$\underset{ij\notin {g}_{k}^{1}}{\prod }\left(1-\frac{\mathbb{E}\left({\eta }_{tk}|{X}_{tk}\right)}{\mathcal{Z}}\right)\to 1$ (18)

Hence,

$\begin{array}{l}ℙ\left({X}_{{t}_{k}}|{Z}_{k}\right)\propto \underset{-\infty }{\overset{\infty }{\int }}\text{ }{\text{e}}^{-\frac{{\left({X}_{tk}-{X}_{tk-}^{Z}\right)}^{2}}{2{\lambda }_{tk-}^{Z}}}×{\text{e}}^{-\frac{{p}_{k}}{2}{\left({Z}_{ik}-{X}_{tk}\right)}^{2}}×\underset{ij\in {g}_{k}^{1}}{\prod }{\text{e}}^{-\frac{q}{2}{X}_{j,{t}_{k}}^{2}}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}×\underset{j\in {Z}_{k}}{\prod }{\text{e}}^{-\frac{{p}_{k}}{2}{\left({Z}_{jk}-{X}_{jtk}\right)}^{2}}×\underset{ij\in {g}_{k}^{1}}{\prod }{\text{e}}^{-\frac{1}{2}{\left({X}_{tk}-{X}_{jtk}\right)}^{2}}\text{d}{X}_{j,{t}_{k}}\end{array}$ (19)

whereby the integrand is a product of Gaussian densities. Upon integrating out ${X}_{jtk}$, and matching the terms of ${X}_{tk}$ and ${X}_{tk}^{2}$ we obtain the posterior distribution as Gaussian with the given expectation and variance. The modeling of the network tie probability ${\pi }_{ij}\left(t\right)$ as a probit link function enables the elegant formulation of the posterior probability as a Gaussian.

Lastly, we consider a unique case, whereby the lender does not observe the continuous time information but only receives the discrete time ego-network signals i.e. when $H=O$. Thus between network information times $t\in \left[{t}_{k},{t}_{k+1}\right)$, the lender receives no information. We thus have the following corollary.

Corollary 1.

When the lender’s information set is ${\mathcal{F}}_{t}^{O}$ we have

 For $t\in \left[{t}_{k},{t}_{k+1}\right)$ between information arrival times, the respective conditional mean and variance are given by the equations.

${\stackrel{^}{X}}_{t}^{O}=\delta +{\text{e}}^{-\mu \left(t-{t}_{k}\right)}\left({\stackrel{^}{X}}_{tk}^{O}-\delta \right)$ (20)

${\lambda }_{t}^{O}={\text{e}}^{-2\mu \left(t-{t}_{k}\right)}{\lambda }_{{t}_{k}}^{O}+\frac{{\gamma }^{2}}{2\mu }\left(1-{\text{e}}^{-2\mu \left(t-{t}_{k}\right)}\right)$ (21)

 At information date ${t}_{k}$, it holds that ${X}_{t}^{O}$ is Gaussian with mean and variance

${\stackrel{^}{X}}_{t,k}^{0}={\theta }_{k}^{0}{\lambda }_{tk-}^{0}\left(\frac{{\stackrel{^}{X}}_{t,k-}^{0}}{{\lambda }_{tk-}^{0}}+{p}_{k}{Z}_{ik}+\frac{{p}_{k}}{{p}_{k}+q+1}\underset{j\in Z}{\sum }\text{ }{Z}_{jk}\right)$ (22)

${\theta }_{k}^{0}{\lambda }_{tk-}^{0}=\frac{{\lambda }_{tk-}^{O}\left({p}_{k}+q+1\right)}{{\lambda }_{tk-}^{O}\left({p}_{k}\left({p}_{k}+q+1\right)+{\eta }_{k}\left({p}_{k}+q\right)\right)+\left({p}_{k}+q+1\right)}$ (23)

respectively.

Proof.

For $t\in \left[{t}_{k},{t}_{k+1}\right)$, the proof can be found in corollary 4 of  . At information date ${t}_{k}$, the conditional prior distribution at time ${t}_{k-}$ is updated using the ego-network signals received from the vector ${Z}_{k}$. Following in a similar fashion to proposition 1 (part (2)), the conditional prior density is updated to a Gaussian posterior density with the given mean and variance.

4. Properties of the Conditional Variance

We study the properties of the conditional variance under the various information settings discussed in Section 3. We show that inclusion of ego-network signals leads to better estimates of the hidden process ${X}_{t}$. A key result within this section is proposition 2 where we show that increasing the frequency of network information arrival times leads to the full information case in the limit as $N\to \infty$ The following lemma shows that the ego-network signals improves the lender’s estimate of the credit quality.

Lemma 2.

For $H\in \left\{Y,O\right\}$,$t\in \left[0,T\right]$ and $k=0,1,\cdots ,N-1$

${\lambda }_{t}^{Z}\le {\lambda }_{t}^{H}$ (24)

Proof.

The proof similar to proposition 6 of  , where a detailed proof is available.

The following proposition shows that as we increase the frequency of arrivals of ego-network information i.e. as $N\to \infty$ then the variances ${\lambda }_{t}^{Z},{\lambda }_{t}^{O}$ tends to zero. It is an adaptation of the asymptotic result of  .

Proposition 2.

Let ${\left\{{\pi }^{N}={\left({t}_{k}^{N}\right)}_{k=0}^{N-1}\right\}}_{N\ge 1}$ be a refining sequence of partitions of the interval

$\left[0,T\right]$ such that information dates are retained i.e. for $N\le {N}^{\prime }$ then $\left({\pi }^{N}\right)\subset \left({\pi }^{{N}^{\prime }}\right)$. Let ${\Delta }_{N}=\underset{k=1,\cdots ,N}{\mathrm{max}}\left\{{t}_{k}^{N}-{t}_{k-1}^{N}\right\}$ be the mesh size. Further, let ${\left({\Lambda }_{k}^{N}\right)}_{k=0,\cdots ,N-1}$ be a sequence of corresponding variances at information times ${t}_{k}^{N}$. Assume that there exists a constant $\stackrel{¯}{\Lambda }>0$ such that ${\Lambda }_{k}^{N}\le \stackrel{¯}{\Lambda }$ for all $k=0,1,\cdots ,N-1$ and all $N\in ℕ$. Then it holds that for all $t\in \left(0,T\right]$, the conditional variances ${\lambda }_{t}^{ON}$ and ${\lambda }_{t}^{ZN}$ tend to 0 as $N\to \infty$ and ${\Delta }_{N}\to 0$.

Proof.

From lemma 2, since $0\le {\lambda }_{t}^{Z}\le {\lambda }_{t}^{0}$ we need only prove the assertion for ${\lambda }_{t}^{O,N}$. Further, we can assume ego-network information with constant variances i.e. ${\Lambda }_{k}=\stackrel{¯}{\Lambda }$ for all $k=0,1,\cdots ,N-1$. This assumption generalizes the proof even for the case where ${\Lambda }_{k}<\stackrel{¯}{\Lambda }$. For ease of notation we write ${t}_{k}$ instead of ${t}_{k}^{\left(N\right)}$ noting the dependence on N. For any $k=0,1,\cdots ,N-1$ and any $t\in \left[{t}_{k},{t}_{k+1}\right)$ we know that ${\lambda }_{t}^{0,N}$ is given by

${\lambda }_{t}^{0,N}={\text{e}}^{-2\mu \left(t-{t}_{k}\right)}{\lambda }_{{t}_{k}}^{O}+\frac{{\gamma }^{2}}{2\mu }\left(1-{\text{e}}^{-2\mu \left(t-{t}_{k}\right)}\right)$ (25)

where ${\lambda }_{{t}_{k}}^{O}={\theta }_{k}^{0}{\lambda }_{{t}_{k}-}^{O}$ with

${\theta }_{k}^{0}=\frac{\left({p}_{k}+q+1\right)}{{\lambda }_{tk-}^{O}\left({p}_{k}\left({p}_{k}+q+1\right)+{\eta }_{k}\left({p}_{k}+q\right)\right)+\left({p}_{k}+q+1\right)}$ (26)

Since $\left(1-{\text{e}}^{-2\mu \left(t-{t}_{k}\right)}\right)\le 2\mu \left(t-{t}_{k}\right)\le 2\mu {\Delta }_{N}$ it follows that

${\lambda }_{t}^{0,N}\le {\theta }_{k}^{0}{\lambda }_{tk-}^{0}+{\gamma }^{2}{\Delta }_{N}$ (27)

We iterate this inequality for all $l\le k$ and denote ${\stackrel{¯}{\theta }}_{k}^{0}=\underset{l\in \left(0,1,\cdots ,k\right)}{\mathrm{max}}\left\{{\theta }_{l}^{0}\right\}$. This yields for ${\lambda }_{0-}^{N}={v}_{0}$ and $t\in \left[{t}_{k},{t}_{k+1}\right)$

${\lambda }_{t}^{0}\le {v}_{0}{\left({\stackrel{¯}{\theta }}_{k}^{0}\right)}^{k+1}+{\gamma }^{2}{\Delta }_{N}\underset{l=0}{\overset{k}{\sum }}{\left({\stackrel{¯}{\theta }}_{k}^{0}\right)}^{l}\le {v}_{0}{\left({\stackrel{¯}{\theta }}_{k}^{0}\right)}^{k+1}+\frac{{\gamma }^{2}{\Delta }_{N}}{1-{\stackrel{¯}{\theta }}_{k}^{0}}$ (28)

Let $u\in \left(0,T\right]$,$\epsilon >0$ and $\stackrel{¯}{p}={\stackrel{¯}{\Lambda }}^{-1}$. We desire to show that ${\lambda }_{u}^{0}<\epsilon$ for a suitably chosen N. Define ${k}_{N}$ to be the index for which $u\in \left[{t}_{kN},{t}_{kN+1}\right)$. Suppose that for all ${N}_{0}$ there exists a $N\ge {N}_{0}$ such that

$\mathrm{min}\left\{{\lambda }_{{t}_{0-}}^{0,N},{\lambda }_{{t}_{1-}}^{0,N},\cdots ,{\lambda }_{{t}_{{k}_{N}-}}^{0,N}\right\}\ge \epsilon /2$ (29)

Then we have that

${\stackrel{¯}{\theta }}_{k}^{0,N}\le \frac{1}{\mathrm{min}\left\{{\lambda }_{tk-}^{O,N}\right\}\left({p}_{k}\left({p}_{k}+1\right)\right)+1}\le \frac{2}{\stackrel{¯}{p}\epsilon +2}$

where the minimum is over all ${\lambda }_{tk-}^{O,N}$. This yields (with one iteration less)

${\lambda }_{u}^{0,N}\le {v}_{0}{\left(\frac{2}{\epsilon \stackrel{¯}{p}+2}\right)}^{{k}_{N}}+{\gamma }^{2}{\Delta }_{N}\left(\frac{\epsilon \stackrel{¯}{p}+2}{\epsilon \stackrel{¯}{p}}\right)$ (30)

Note that in this case ${\stackrel{¯}{\theta }}_{k}^{0,N}<1$ and ${k}_{N}\to \infty$ as $N\to \infty$. Thus for all $N\ge {N}_{0}$,${\lambda }_{u}^{0,N}\to 0$. Thus we can choose ${N}_{0}$ such that ${\lambda }_{{t}_{k}N-}^{0,N}<\epsilon /2$ which is a contradiction of the assumption in Equation (29).

Thus there exists a ${N}_{0}$ such that for all $N\ge {N}_{0}$ there exists an index set ${j}_{N}\le {k}_{N}$ with ${\lambda }_{{t}_{jN-}}^{0,N}<\epsilon /2$. For each such N we choose ${t}_{{j}_{N}-}\le {t}_{{k}_{N}-}$ as the last information arrival time before ${t}_{{k}_{N}-}$ such that ${\lambda }_{{t}_{j}N-}<\epsilon /2$. In the case that ${j}_{N}={k}_{N}$, then for a suitably large N from Equation (27) implies that ${\lambda }_{u}^{0}\le \epsilon$.

For the case when ${j}_{N}<{k}_{N}$ for $k={j}_{N}+1,\cdots ,{k}_{N}$ we have that ${\lambda }_{{t}_{kN-}}^{0,N}\le \epsilon /2$ and ${\stackrel{¯}{\theta }}_{k}^{0,N}\le \frac{2}{\stackrel{¯}{p}\epsilon +2}$. We can choose a suitable ${N}_{1}\ge {N}_{0}$ such that ${\gamma }^{2}{\Delta }_{N}\left(\frac{\epsilon \stackrel{¯}{p}+2}{\epsilon \stackrel{¯}{p}}\right)<\epsilon /2$. An iteration similar to Equation (28) starting from ${j}_{N}$ with initial value ${\lambda }_{{t}_{j}N-}^{0,N}$ yields that

${\lambda }_{u}^{0,N}<\epsilon$

for all $N\ge {N}_{1}$ as desired.

5. Numerical Results

In this section we provide a brief illustration of our findings on the properties of the conditional variance. We assume that the ego-network signals ${Z}_{k}$ arrive at equidistant time points ${t}_{k},\text{}k=0,1,\cdots ,N-1$. We simulate the processes ${X}_{t},{Y}_{t}$ and ${Z}_{k}$ using the parameter values in Table 1 with $q=0$.

To illustrate the impact of the number of alters on the borrower’s conditional variances ${\lambda }_{t}^{O}$ and ${\lambda }_{t}^{Z}$, Figure 1 plots a comparison between the variances of ${\lambda }_{t}^{O}$ and ${\lambda }_{t}^{Z}$ with and with no friends. The left panel plots a comparison of the variances for the case when the number of friends is constant at zero and five respectively. In the right panel, the number of friends is modeled as a Poisson random variable with parameter $\rho =5$. In both plots, the conditional variances for the case when there exists friends’ data in ${Z}_{k}$ is lower as compared to the case with zero friends. Besides, the perturbations in the conditional variance with randomly varying number of friends are well depicted in panel 2 of the plot (see Figure 1).

Figure 1. Left: plot of variances ${\lambda }_{t}^{H}$ when no. of friends ${\eta }_{t}=0$ and ${\eta }_{t}=5$ Right: plot of variances ${\lambda }_{t}^{H}$ when no. of friends ${\eta }_{t}=0$ and ${\eta }_{t}=\rho$.

Table 1. Model parameter values.

6. Conclusions

In this article, we have presented stochastic filtering results whereby the hidden credit quality process, modeled as an Ornstein-Ulehnbeck equation drives the drift process of the borrower’s observed behavior score. We have formulated a latent space network model such that the ego-network signals received at discrete fixed times are incorporated into the credit quality filtering by way of Bayesian updating. Modeling of network tie probability using the probit link function enabled the elegant formulation of the conditional posterior density of the hidden process. We have presented explicit results for the conditional mean and conditional variance under the various information setups. Further, we have presented asymptotic properties of the conditional variance when the frequency of the network information arrival times is increased.

The results in this article thus present a theoretical justification of including ego-network data in credit scoring, for a network model based on credit type homophily. Future studies may consider network models whereby the network ties capture the strength or frequency of interaction between the nodes, instead of binary network ties.

Acknowledgements

The authors wish to thank African Union and Pan African University Institute of Basic Sciences Technology and Innovation, Kenya, for their financial support for this research.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Sewe, S., Ngare, P. and Weke, P. (2019) Credit Scoring with Ego-Network Data. Journal of Mathematical Finance, 9, 522-534. https://doi.org/10.4236/jmf.2019.93027

References

1. 1. McPherson, M., Smith-Lovin, L. and Cook, J.M. (2001) Birds of a Feather: Homophily in Social Networks. Annual Review of Sociology, 27, 415-444.https://doi.org/10.1146/annurev.soc.27.1.415

2. 2. Sewe, S., Ngare, P. and Weke, P. (2019) Dynamic Credit Quality Evaluation with Social Network Data Journal of Applied Mathematics, 2019, Article ID: 8350464.https://doi.org/10.1155/2019/8350464

3. 3. Snijders, T., Steglich, C. and Schweinberger, M. (2017) Modeling the Coevolution of Networks and Behavior. In: van Montfort, K., Oud, J. and Satorra, A., Eds., Longitudinal Models in the Behavioral and Related Sciences, Routledge, London, 41-71.https://doi.org/10.4324/9781315091655-3

4. 4. Hoff, P.D., Raftery, A.E. and Handcock, M.S. (2002) Latent Space Approaches to Social Network Analysis. Journal of the American Statistical Association, 97, 1090-1098. https://doi.org/10.1198/016214502388618906

5. 5. Sewell, D.K. and Chen, Y. (2015) Latent Space Models for Dynamic Networks. Journal of the American Statistical Association, 110, 1646-1657.https://doi.org/10.1080/01621459.2014.988214

6. 6. Kim, B., Lee, K., Xue, L. and Niu, X. (2017) A Review of Dynamic Network Models with Latent Variables. ArXiv: 1711.10421.

7. 7. De Andrade, F.W.M. and Thomas, L.C. (2007) Structural Models in Consumer Credit. European Journal of Operational Research, 183, 1569-1581.https://doi.org/10.1016/j.ejor.2006.07.049

8. 8. Crook, J.N., Edelman, D.B. and Thomas, L.C. (2007) Recent Developments in Consumer Credit Risk Assessment. European Journal of Operational Research, 183, 1447-1465. https://doi.org/10.1016/j.ejor.2006.09.100

9. 9. Oguz, H.T. and Gurgen, F.S. (2008) Credit Risk Analysis Using Hidden Markov Model. 2008 23rd International Symposium on Computer and Information Sciences, Istanbul, 27-29 October 2008, 1-5.https://doi.org/10.1109/ISCIS.2008.4717932

10. 10. Malik, M. and Thomas, L.C. (2012) Transition Matrix Models of Consumer Credit Ratings. International Journal of Forecasting, 28, 261-272.https://doi.org/10.1016/j.ijforecast.2011.01.007

11. 11. Bundi, D.N. (2016) Social Network Analysis for Credit Risk Modeling. Unpublished Doctoral Dissertation, University of Nairobi, Kenya.

12. 12. Wei, Y., Yildirim, P., Van den Bulte, C. and Dellarocas, C. (2015) Credit Scoring with Social Network Data. Marketing Science, 35, 201-340.https://doi.org/10.1287/mksc.2015.0949

13. 13. Francis, E., Blumenstock, J. and Robinson, J. (2017) Digital Credit: A Snapshot of the Current Landscape and Open Research Questions. CEGA White Paper.

14. 14. Gabih, A., Kondakji, H., Sass, J. and Wunderlich, R. (2014) Expert Opinions and Logarithmic Utility Maximization in a Market with Gaussian Drift. Communications on Stochastic Analysis, 8, No. 1. https://doi.org/10.31390/cosa.8.1.03

15. 15. Black, F. and Litterman, R. (1992) Global Portfolio Optimization. Financial Analysts Journal, 48, 28-43. https://doi.org/10.2469/faj.v48.n5.28

16. 16. Danilova, A., Monoyios, M. and Ng, A. (2010) Optimal Investment with Inside Information and Parameter Uncertainty. Mathematics and Financial Economics, 3, 13-38. https://doi.org/10.1007/s11579-010-0025-y

17. 17. Elliott, R.J., Aggoun, L. and Moore, J.B. (2008) Hidden Markov Models: Estimation and Control. Springer Science & Business Media, Berlin, Heidelberg.

18. 18. Cohen, S.N. and Elliott, R.J. (2015) Stochastic Calculus and Applications. Springer, Switzerland. https://doi.org/10.1007/978-1-4939-2867-5

19. 19. Bain, A. and Crisan, D. (2009) Fundamentals of Stochastic Filtering. Springer, Switzerland. https://doi.org/10.1007/978-0-387-76896-0

20. 20. Lakner, P. (1998) Optimal Trading Strategy for an Investor: The Case of Partial Information. Stochastic Processes and Their Applications, 76, 77-97. https://doi.org/10.1016/S0304-4149(98)00032-5

NOTES

1In this article we refer to the borrower as female and the lender as male.