The definition, classification and assessment of personality disorders (PDs) have attracted considerable debate for nearly 50 years. This paper attempts a comprehensive review of the instruments to assess all, or specific, individual disorders as described in DSM-5, including structured interviews and inventories. The review should be helpful for clinicians, researchers and also industrial and organizational psychologists, to screen and assess the personality pathology spectrum from subclinical manifestations to full blown personality pathology. A decision tree helpful to choose among the different measures is also provided.


Personality Disorders, Personality Pathology, Measures, Questionnaires, Structured Interviews

1. Introduction

There has been much debate about personality disorders (PDs) over the years, particularly their definition, conceptualization, occurrence and assessment. Perhaps the greatest “shake up” in the way PDs were discussed has occurred in the move from DSM-IV (American Psychiatric Association, 2000) to DSM-5, proposing dimensional alternatives for the DSM-IV categorical diagnoses (Widiger, Livesley, & Clark, 2009). Notwithstanding this lively debate, DSM-5 preserved in its Section 2 the categorical PDs like distinguished in DSM-IV, whilst an alternative trait system is referred to Section 3 for further evaluation and research (American Psychiatric Association, 2013). Although, there is a great deal of activity developing and validating new instruments like the Personality Inventory for DSM-5 (PID-5; Krueger, Derringer, Markon, Watson, & Skodol, 2012) to assess and evaluate (Bagby, 2013) this new trait model, the diagnosis and assessment of categorical PDs is primarily advocated in the official nomenclature of the American Psychiatric Association.

At the same time, the attention to PDs, both from an academic and societal perspective, expanded dramatically due to the impairing character of the diagnoses and the increasingly high financial costs involved in the treatment of patients with personality pathology (Gustavsson et al., 2012). Apart from attention from clinical psychologists to full-blown personality pathology, selection and human resources psychologists have become interested in subclinical manifestations of aberrant personality and the impact on individuals’ workplace functioning. This is because a substantial proportion of the general population and workforce has personality problems themselves or has to deal with (subclinically) disordered persons as colleagues or supervisors (Wille, De Fruyt, & De Clercq, 2013; De Fruyt, Wille, & Furnham, 2013b). Whereas clinical psychologists have been treating patients with one or more PDs and co-occurring pathology, industrial and organisational psychologists run career development programs to coach people on how to deal with the dark sides of their personality, a common need for all these professional groups is well-designed and psychometric sound assessment instruments. In addition, they also need criteria to choose among the different instruments currently available.

The present paper provides a broad review of current PD measures together with a decision tree to choose among them. Length constraints meant we could not consider proposed personality disorders like Depressive Personality Disorder. The aim is to be comprehensive and descriptive rather than (psychometrically or conceptually) critical which would involve a different paper. We have attempted to catalogue all measures, which has not been done before. The measures are in no way psychometrically equivalent though each paper has been peer reviewed.

Over the years a large number of measures have been devised for research and practice. The aim of this review is to alert psychologists and researchers to the range of instruments available to assess the categorically conceived PDs listed in DSM-5 and provide a set of criteria by which professionals may choose one over another. In the introduction of this paper we refer to DSM-5 PDs, though it should be clear that almost all measures were developed before the release of DSM-5, so we refer to these previous DSM-editions when describing these measures. The available PD measures differ on at least four major characteristics.

First, some instruments attempt to be comprehensive and measure all of the PDs currently (or previously) thought to exist, because the nature and number of PDs have shifted across the different DSM editions. Some “admit” disorders that others discount, but the usual number is around 10 - 15 disorders. On the other hand, some instruments set out simply to measure one very specific disorder. Second, there seem to be four most common methods to assess the PDs: structured diagnostic interviews, rating instruments for clinicians, self-re- port questionnaires and other-report questionnaires (Friedman, Oltmanns, & Turkheimer, 2007). Thus, two use observer data (clinician, family) and two use self-report data approaches towards measurement. By far the most common however are questionnaires and structured interviews. Third, some measures are about subtypes of the PD in the sense that they are multidimensional measures that yield scores on different, but related facets of the disorder. For example, some measures and theorists may distinguish between grandiose and vulnerable, or communal and agentic Narcisistic PD (NPD; Gebauer, Sedikides, Verplanken, & Maio, 2012). Most measures, however, mimic DSM-5 categorical criteria and are not about the distinction among subtypes of a specific PD.

Fourth and finally, PD measures have been developed for essentially five target groups. The first group of users are clinicians attempting a reliable and valid diagnosis of a PD. The second is a related group, namely academic researchers who may be testing theories of the aetiology or prognosis of a PD eventually after treatment. Industrial and organisational psychologists form a third professional group interested in evaluating aberrant personality and subclinical forms of personality pathology in the context of personnel selection or career coaching and development. Finally, there are two other groups, namely “lay people” who may be interested in self-diagnosis, but also relatives of those with a specific PD requiring information about personality disorder symptoms and its prognosis.

There are, inevitably, a number of instruments on the web with unknown psychometric properties as well as various “popular books” that attempt to explain and describe the PDs for the lay public. The present review, however, primarily attempts a comprehensive overview for the first three groups interested in the professional assessment of personality pathology.

Before listing and discussing the different measures, we provide an overview of the DSM-5 PDs with their clinical labels and a short description in Table 1. This table further shows the labels and descriptions of PDs as they are used in a popular measure frequently used in occupational and career coaching and development settings (Hogan & Hogan, 1997; Furnham, Trickey, & Hyde, 2012). The remaining columns illustrate the labels used in books written by psychiatrists (Oldham & Morris, 1991), clinical psychologists (Miller, 2008) and I/O

Table 1. Different labels for traits associated with similar disorders.

psychologists (Dotlich & Cairo, 2003) to explain the PDs to lay people.

2. Available Measures

This paper covers the measures available, including those assessing all PDs, as well as each PD in turn. We also acknowledge the fact that there are instruments intended to measure the prevalence of specific symptoms of PDs, yet have excluded these from our analysis due to space constraints. Likewise, we have also excluded alternative dimensional conceptualisations of PDs and personality pathology (Clark, 2007), except when these methods are specifically targeted to assess the categorical DSM-5 PDs. We hence do not explicitly discuss and reiterate the discussion on alternative dimensional models of PDs (Widiger & Clark, 2000; Widiger & Costa, 2013), except when these provide direct assessments of the categorical PDs. To our knowledge a review such as this has not been done before, though there are review papers that have reviewed some instruments at the same time (Clark & Harrison, 2001; McDermut & Zimmerman, 2008; Segal & Coolidge, 2007; Widiger & Boyd, 2009; Zimmerman, 2003).

Apart from usual bibliometric investigations, we emailed over 50 experts (mainly those on the editorial board of specialist PD journals) in the area showing them our list and asking if they knew of any measures that we were not aware of. This did yield half a dozen extra, and we are reasonably satisfied that we have been able to locate most important measures.

3. Measures of all the Personality Disorders (See Table 2)

3.1. Structured Interviews

The Structured Interview for DSM-III Personality Disorders (SIDP; Pfohl, Stangl, & Zimmerman, 1983) has largely fallen out of favour because of its focus on DSM-III PDs. Despite this, it has been shown to hold highly variable test-retest reliabilities ranging from .24 for obsessive-compulsive PD to .74 for histrionic PD, with an average level of .54 (First et al., 1995). Pfohl, Blum and Zimmerman (1997) adapted the SIDP at the advent of the DSM-IV, releasing The Structured Interview for DSM-IV Personality Disorders (SIDP-IV)—a fairly brief interview (lasting roughly 60 minutes) that features both a patient and an informant. This is beneficial as it helps gain a different perspective on the patient in question. There are two versions of the SIDP-IV: a diagnostic version and a “topical” version, though the only difference is the order of the questions. The benefit of including a topical version is that it includes natural questions that are designed to make interviewing defensive patients easier. Much like the International Personality Disorder Examination (IPDE; Loranger, 1999; see below), the SIDP-IV can also assess for Personality Disorder Not Specified (PDNOS) however the SIDP-IV will diagnose a PDNOS only when two or more disorders are one criterion short of the diagnostic threshold. Jane, Pagan, Turkheimer, Fiedler and Oltmanns (2006) found inter-rater reliability for each PD being greater than .70, a finding also supported by Damen, De Jong and Van Der Kroft (2004).

The Diagnostic Interview for DSM-IV Personality Disorders (DIPD-IV; Zanarini, Frankenburg, Sickel, & Yong, 1996). This semi-structured clinical interview assesses all DSM-IV PDs, and like most clinical interviews, specialised training is required before the interview can be administered. The interview has 108 items, with each disorder rated on a scale of 0 (disorder is absent) to 2 (disorder is present). If the totalled scores exceed a threshold the clinician can diagnose a disorder. The original paper cites internal consistency levels ranging from .64 to .93, with six of the disorders having levels greater than .70; acceptable levels of test-retest reliability with Kappa = .58 to 1 are reported over a 6-month period. These are also called dependability coefficients (Chmielewski & Watson, 2009). The DIPD-IV was used in the Collaborative Longitudinal Personality Disorders Study (CLPS).

The Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II; First, Gibbons, Spitzer, Williams, & Benjamin, 1997) is widely used and researched, unlike the DIPD-IV. The respondent typically first completes a questionnaire and interviewers then follow up responses. It is also the shortest interview (140 items), lasting minimally 30 minutes (the DIPD-IV lasts around 90 minutes). The SCID-II measures all DSM-IV PDs and the associated symptoms in the order they are presented in the DSM-IV. Some have criticised its brevity (Rogers, 2003). Investigations into the instrument’s reliability and validity have shown considerable support. Lobbestael, Leurgans and Arntz (2011) found mean kappa scores of .84. Moran et al. (2003) provided further support with mean kappa scores of .71, but others have reported lower Kappas (Hyler, Skoldol et al., 1990, 1992). Skodol et al. (1991) investigated the convergent validity of the SCID-II by comparing it to diagnoses made by the International Personality Disorder Examination (IPDE; Loranger, 1999). The authors found that the two instruments’ diagnoses for each PD correlated from .58 to .87, suggesting that both instruments measure the same PDs to a “reasonable” extent.

The Personality Disorder Interview (PDI-IV; Widiger, Mangine, Crobitt, Ellis, & Thomas, 1995) is another semi-structured interview that assesses each of the 94 personality disorder criteria displayed in the DSM-IV, making it a lengthy interview lasting around 90 - 120 minutes. Rogers (2001) supports the instrument’s extensive criteria, however criticises its sometimes sophisticated and complex language. This is a particularly valid concern when using the instrument with adolescents and cognitively impaired patients. Rogers (2001) also notes how, despite high levels of reliability, its little adoption within clinical environments has proven to be an

Table 2. A review of measures that attempt to measure all the disorders.

obstacle when evaluating its validity. Widiger, Costa and Samuel (2006) argue that the PDI-IV’s strength lies within its manual and compared the PDI-IV’s manual to manuals of other semi-structured interviews. Most are lacking normative data, statistical evidence for reliability and validity, and practical guidance, issues covered in the PDI-IV’s manual.

The International Personality Disorder Examination (IPDE; Loranger, 1999) is a structured interview that is able to assess PDs across both the DSM-IV and ICD-10. The IPDE scores individuals dimensionally (“negative”, “probable” and “definite”) (Rogers, 2001). It demonstrates excellent inter-rater reliabilities (.81 to .92; Lenzenweger, 1999). The IPDE’s strengths are that it can assess PDNOS, providing that the individual has scored on at least 10 different PD criteria, and has internationally validity as it was developed alongside the WHO. It is used in the Longitudinal Study of Personality Disorders (LSPD).

The International Personality Disorder Examination Questionnaire (IPDEQ, Loranger, 1999) is a screening tool to be used alongside the IPDE. Consisting of 99 items, it assesses PDs across six scales that represent everyday functioning (work, self, interpersonal relationships, affect, reality testing and impulse control). Loranger suggests that if an individual scores highly on at least three of the scales, then the IPDE should be subsequently used. Slade, Peters, Schneiden and Andrews (2006) found the IPDEQ to accurately predict anti-social PD, and Lewin, Slade, Andrews, Carr and Hornabrook (2005) found that the IPDEQ’s scores were not only psychometrically sound, but also provided a good index for the likelihood of developing a PD in an epidemiological study, therefore suggesting its use outside of clinical environments. Fountoulakis et al. (2002) compared IPDEQ scores with the PD diagnoses, and found that the onset of specific PDs had strong phi coefficients > .91, suggesting good reliability.

The Iowa Personality Disorder Screen (IPDS; Langbehn et al., 1999) is an 11 item semi-structured interview which is essentially a screening instrument measuring DSM-III PDs. The interview only lasts around five minutes. The original authors found sensitivity validities being as high as 92% and specificity validities as high as 72%—a finding further supported by Trull and Amdur (2001) in a non-clinical population. Olssøn, Sørebø, and Dahl (2011) also found that within psychiatric outpatients, the 11 items held an average internal consistency of .70, with positive prediction power averaging at .66 and correctly classifying PDs in comparison to the SCID-II on average at 64%. Similarly to the Personality Beliefs Questionnaire (PBQ-SF; Beck & Beck, 1991; see further in this paper), the IPDS stands up well against other instruments due to its extreme brevity and good statistical properties.

The Shedler-Westen Assessment Procedure 200 (SWAP-200). There are various versions of this measure including the SWAP-II and SWAP-II-A (Westen & Shedler, 2007), an adolescent version. Clinically experienced interviewers are required to sort the 200 personality descriptive items into 8 categories from most descriptive to not descriptive or irrelevant. A computer program then reports DSM-IV PD diagnoses, personality diagnoses for alternative, empirically derived personality syndromes, and dimensional trait scores. Westen and Shedler (2007) provide both reliability and validity evidence of both versions of the test.

3.2. Questionnaires (See Table 3)

The Personality Diagnostic Questionnaire-4 (PDQ-4, Hyler, 1994) consists of 99 items, that measure all 10 of the DSM-IV PDs. Okada and Oltmanns (2009) found acceptable test-retest validities over different three time periods, with an average of .67. Abdin et al. (2011) investigated the PDQ -4’ s efficacy of being used as a screening instrument for mentally ill inmates, by comparing its validity with the SCID-II. Generally there was moderate agreement between the two instruments with Kappa levels no lower than .50. They did find that the PDQ-4 held high sensitivity (the probability of the likelihood of being diagnosed with a PD across both instruments) and low specificity (the likelihood that both instruments detect an absence of PD). When looking at specific PDs, sensitivities ranged from poor (dependent PD; .30) to good (antisocial PD; .78). Abdin et al. (2011) who used a translation in Singapore concluded that the PDQ-4 is statistically robust enough to be used as a screening instrument.

There are earlier versions of the PDQ-4: the PDQ (Hyler, Rieder, Williams, Spitzer, Hendler, & Lyons, 1988) and the PDQ-R (Hyler & Rieder, 1987). The PDQ consists of 163 items, but it is in accordance with the DSM-III. Participants respond to items with true/false answers. The PDQ has been found to have poor levels of internal consistency (.43 to .70; Hyler & Lyons, 1988) and test-retest reliabilities larger than .56 (Hurt, Hyler, Frances, Clarkin, & Brent, 1984). The PDQ-R was created as a response to the changes found within the

Table 3. A review of measures that attempt to measure specific disorders.

DSM-III-R. The PDQ-R consists of 152 items that are also answered by a forced choice true/false paradigm. Uehara, Sakado and Sato (1997) found test-retest reliabilities ranging from .76 to 1.0. Hyler, Skodol, Oldham, Kellman and Doidge (1992) found satisfactory levels of sensitivity and moderate levels of specificity for most of the PDs. They did note however that the PDQ-R did yield many false-positives, concluding that this cannot be used diagnostically as a replacement of a structured interview.

The Coolidge Axis-II Inventory (CATI; Coolidge, 1984) is a 225 item self-report scale, measuring DSM-III-R PDs (13 PDs) alongside three axis-I disorders (anxiety, depression and brain dysfunction). Coolidge and Merwin (1992) found an average test-retest reliability of .90 and a Cronbach Alpha level of .71 suggesting that this instrument is highly reliable in its efficacy to predict PDs. Watson and Sinha (1996) did find some gender bias regarding the antisocial PD scale and age biases for younger respondents (17 - 24 years old) in comparison to older respondents (25 - 57 years old). Ramanaiah and Sharpe (1998) demonstrated through varimax rotation that each of the CATI’s scales could be mapped onto the five factor model of personality, providing support for its dimensional nature. Silberman, Roth, Segal and Burns (1997) compared the convergent validities between the CATI and the Millon Clinical Multiaxial Inventory II (MCMI-II; Millon, 1983) in a sample of elderly inpatients. Generally they found that validities between the two widely ranged, however 7 out of the 13 PDs had coefficients higher than .54.

The Short Coolidge Axis-II Inventory (SCATI; Coolidge, 2001) is a 70 item abbreviation of the CATI. Watson and Sinha (2007) found internal reliabilities to range from .46 to .72 (averaging at .61), which is impressive in comparison to the CATI’s (.70 to.86, averaging at .78; Watson and Sinha, 1996). Much like the CATI, evidence for gender biases in the SCATI were also found, with Cohen effect sizes ranging from .25 (schizoid PD) to 1.13 (sadistic). The authors were also able to map the SCATI’s factors onto Eysenck and Eysenck’s (1975) PEN model of personality. Lastly, the authors used confirmatory factor analysis (CFA) to demonstrate that the SCATI and the CATI are convergent.

The Millon Clinical Multiaxial Inventory-III (MCMI-III; Millon, Millon, Davis & Grossman, 2009) is a 175 item questionnaire that consists of 14 PD scales (11 moderate PD scales and 3 severe personality pathology scales), 10 clinical syndrome scales, five correction scales (designed to identify random response styles and modifying indices) and 42 Grossman personality facet scales. Participants respond via true/false answers. There are two earlier versions of the MCMI-III, the MCMI (Millon, 1983) and the MCMI-II (Millon, 1987) that hold acceptable psychometric properties, but for the purpose of this review the MCMI-III will be of primary focus due to its accordance to the DSM-IV. One standout feature of MCMI-III is its theoretical anchoring.

The MCMI-III is built upon four domains of evolutionary theory: existence, adaptation, reproduction and abstraction. Another differentiating factor of the MCMI-III is that diagnoses are made based on respondents scoring higher than the base rate score of 84. Retzlaff (1996) investigated the MCMI-III’s diagnostic validity and found that the instrument’s positive predictive power varied widely. For example, validity coefficients ranged from .00 (sadistic personality) to .27 and .32 (narcissistic and histrionic respectively). This suggested that the instrument’s predictive validity could be improved. Millon, Davis and Millon (1997) however report very different positive predictive validities, ranging from .33 (delusional) to .93 (drug abuse), with an average of .64 across all PD and axis 1 disorders. Millon et al. (1997) also report Kappa levels ranging from .23 (anxiety) to .84 (paranoid), suggesting highly discrepant levels of test-retest reliability.

The Personality Beliefs Questionnaire (PBQ; Beck & Beck, 1991) is a self-report questionnaire that measures beliefs associated with 9 DSM-III-R PDs across 126 items. Like the MCMI-III, the PBQ is based on a foundation of cognitive theory whereby disorders are maintained by maladaptive thinking styles. Trull et al. (1993) found that the PBQ had alphas ranging from .77 to .93, with test-retest reliabilities from .63 (passive aggressive) to .82 (paranoid)—a finding further supported by Connan et al. (2009). The PBQ has also good discriminant validity, with psychiatric patients scoring significantly higher on the scale associated with their mental illness in contrast to any other scale on the questionnaire (Beck et al., 2001). This finding was also replicated by Jones, Burrell-Hodgson and Tate (2007) who compared the PBQ to MCMI-III diagnoses on 164 psychiatric patients. In addition to the PBQ, Butler, Beck and Cohen (2007) created a short form of the PBQ (PBQ-SF) that features only 65 items and measures all Axis II PDs except borderline PD. Similarly to the PBQ, the PBQ-SF has been found to hold desirable psychometric properties: Cronbach alphas ranged from .81 to .92, and test-retest correlations ranged from .57 to .82. Both the PBQ and the PBQ-SF seem to have the most impressive levels of reliability and validity in comparison to the other self-report instruments reported in this paper, with the added bonus of having a detailed theoretical grounding. The PBQ-SF is of use to both clinicians (when it is not feasible to administer the PBQ) and academics (the PBQ-SF can be easily inserted into a battery of self-report instruments without taking too much time or space). A recent review of the scale was very positive (Bhar, Beck, & Butler, 2011).

Minnesota Multiphasic Personality Inventory for the DSM-III (MMPI; Morey, Waugh, & Blashfield, 1985). The MMPI features a scale for each of the DSM-III PDs measured via 556 items. The original authors found good levels of internal consistency ranging from .67 (compulsive) to .85 (avoidant). When Schuler, Snibbe and Buckwalter (1994) investigated the MMPI’s concurrent validity by correlating its scales with diagnoses made via the MCMI, only five out of the eleven MMPI scales were positively correlated: schizoid, avoidant, dependent, histrionic and narcissistic. Although this suggests some evidence of concurrent validity, it is limited at best. Much like the CATI, clinicians and academics wanting to use this instrument should be aware that it is constructed around the DSM-III.

The Omnibus Personality Inventory (OMNI; Loranger, 1994, 2002) is a 375 item self-report instrument that assesses all ten DSM-IV PDs, and also the traditional five-factor model and 25 normal personality traits (e.g. warmth, trustfulness, and modesty). A practical strength of the OMNI is that it is used in conjunction with computer software that can automatically generate a comprehensive evaluation report. It can therefore be deployed in a variety of clinical, occupational and academic settings easily and quickly. Despite this, the instrument is not widely used. It has shown to demonstrate acceptable levels of internal consistency and reliability (Lenzenweger, Loranger, Korfine, & Neff, 1997).

The Schedule for Nonadaptive and Adaptive Personality (SNAP; Clark, 1993) is a 375-item questionnaire that consists of 15 scales; 12 of which are focused on maladaptive traits and the remaining scales assess negative and positive temperament and disinhibition. The original SNAP assessed the DSM-III-R PDs and the SNAP-2 assesses both the 10 primary PDS in DSM-IV and also the PDs in the appendix. The SNAP is designed to measure dimensional correlates to DSM-IV PDs, and has also 11 diagnostic scales for the DSM-IV-TR PDs. Clark (1993) found that the SNAP scores correlated on average .54 to DSM-IV diagnoses—a coefficient surprisingly high for a self-report instrument. Melley, Oltmanns and Turkheimer (2002) investigated the test-retest reliability and predictive validity of the SNAP; there were satisfactory levels of temporal stability (.58 to .81), however the authors found mixed support for its predictive validity. The SNAP scores did modestly predict cluster A and C PD onset. SNAP -2’ s predictive validity has been shown to be higher than both the NEO PI R and the DSM-IV diagnoses themselves (Morey, Hopwood et al., 2007, 2012).

The Wisconsin Personality Disorder Inventory-IV (WISPI; Klein et al., 1993; Klein & Benjamin, 1996). The WISPI-IV is an updated version of the WISPI-III and WISPI-III-R (Klein et al., 1993) self-report inventories using 204 items to assess DSM-IV criteria for PDs relying on an analysis of DSM items according to Benjamin’s Structural Analysis of Social Behavior model (SASB, Benjamin, 1996). Its validity with the SCID-II has been examined in adult psychiatric inpatients, showing poor convergence at the level of categorical diagnoses, but better convergent and discriminant validity for 5 out of 11 WISPI-IV dimensional scales (Smith et al., 2011).

The Personality Assessment Inventory (PAI; Morey, 1991), is a 344 item questionnaire presented with a four point Likert scale, used in both clinical and community settings. It measures 22 non-overlapping scales that are arranged into four clusters: clinical (this comprises DSM-IV PDs alongside addictive disorders), interpersonal (this seeks to measure interpersonal strategies), treatment (this cluster provides insight into the various efficacies of certain clinical treatments in relation to the individual’s personality, or other risk factors not held in the domain of clinical disorders) and validity (attempts to identify “faking good”, defensive or exaggerated responses to the questionnaire). Essentially it aims to assess Borderline and Antisocial Personality Disorder.

Boone (1998) found acceptable levels of internal consistency for the clusters (averaging .82) and the subscales (averaging .66). Some research has questioned its validity. Slavin-Mulford et al. (2012) researched the inventory’s convergent and discriminant validity by correlating PAI scales with the prevalence of life-events in psychiatric patients. The majority of the scales did hold meaningful correlations with at least one life-event, except for the mania and anxiety scales. The aforementioned research suggests that despite the self-report methodology, the PAI has some utility in a variety of environments. It is important to note that this instrument is not designed to be used as a tool for diagnosing disorders like the DIPD-IV; instead it provides an insight into the individual’s personality and temperament in a variety of contexts.

The Standardised Assessment of Personality: Abbreviated Scale (SAPAS; Moran et al., 2003) is an eight item screening interview that aims to provide a dimensional score as to whether the individual has a PD in general, rather than screening for specific disorders. Scores can range from 0 to 8; any scores higher than 3 indicate the high possibility of a PD. The benefit of such a short interview is that it can be used in clinical environments when pressed for time. Hesse and Moran (2010) compared SAPAS scores with a variety of comprehensive personality inventories and found that SAPAS scores did regress on most DSM-IV PDs when controlling for demographic variables, suggesting convergent validity. However SAPAS scores were less likely to be associated with cluster B disorders (Antisocial, Histrionic and Borderline).

The Hogan Development Survey (HDS; Hogan & Hogan, 1997) is a self-report scale that renames the DSM-IV PDs into lay terms and is also contextualised for the work environment. Just like the Hogan Personality Inventory (HPI; Hogan & Hogan, 1992), the HDS is not a clinical instrument; instead it is mainly used for coaching, leadership development, and personnel selection. Furnham, Trickey and Hyde (2012) found various facets of the HDS to predict work success. Furnham et al. also found that the 11 scales can be clustered into three formations that are similar to clusters A, B, and C suggested by the DSM-IV. Over a dozen published studies have attested to its reliability, validity and dimensional structure (De Fruyt, Wille, & Furnham, 2013b).

The Tendances Dysfonctionelles-12 (TD-12; Rolland & Pichot, 2007) inventory examines the DSM-IV PDs, supplemented with the passive-aggressive and the depressive PDs. TD-12 is developed for the assessment of personality tendencies that may potentially harm and affect personal, social and professional functioning and is, like the HDS, mostly used in personnel selection and professional development contexts. The significance of the description of these personality tendencies for understanding behavior and performance at work is further described by Furnham and Taylor (2004) and Miller (2008).

3.3. Multidimensional measures targeting categorical PDs

A series of dimensional approaches towards the description of personality pathology have been suggested as alternatives to the categorical DSM-IV PD diagnoses, including general dimensional models such as the FFM and more specific personality pathology dimensional representations like the SNAP, the DAPP-BQ or the DSM-5 trait model described in DSM-5 Section 3. Although these approaches advocate a dimensional instead of a categorical description of personality (pathology), in-between proposals to bridge categorical and dimensional diagnostics were proposed for most of these models. How such dimensional models translate into categorical PDs is described here.

FFM-based measures of Axis-II PDs. Widiger and Costa (2013) recently updated and summarized the available evidence for using a general trait model like the FFM for the description of PDs relying on the assumption that the distinction between general traits and personality pathology reflects more a quantitative than a qualitative difference (Simms & Clark, 2006). Samuel and Widiger (2008), for example, recently meta-analyzed the associations between FFM facets and DSM-IV PDs, demonstrating that most PDs could be described in terms of a particular set of FFM facets. Miller and colleagues (2005) corroborated on such findings and proposed an easy-to-use system to describe DSM-IV PDs in terms of aggregates of a specific set of FFM facets per PD. Scoring in a more extreme range on such a FFM PD count (for example 1.5 SD beyond the mean) is considered indicative of a specific PD, requiring further attention. Bastiaansen, Rossi and De Fruyt (2012) examined the concurrent validity of different FFM PD counts in an attempt to optimize this proposed scoring system. The utility of these FFM PD counts has been further supported in the meantime for both clinical and professional developmental diagnostic purposes. Miller and colleagues (2010) demonstrated the utility of this system for clinical decision making, whereas Wille et al. (2013) and De Fruyt et al. (2009, 2013b) investigated the applicability of the counts to screen for aberrant traits observable in the working population to identify dark side personality tendencies that may hinder performance or functioning at work.

DSM-5 trait model. A trait-set for describing personality pathology structured under the five broad dimensions of Negative Affectivity, Detachment, Disinhibition, Antagonism and Psychoticism is described in section 3 of DSM-5 (APA, 2013) for further review and evaluation. Although the labels for this five-factor structure are different from the defining dimensions of the FFM, there is strong support that at least four dimensions are conceptually and empirically related to the FFM dimensions, with some disagreement on the association between the FFM Openness to experience and the DSM-5 Psychoticism dimension (De Fruyt et al., 2013a). Krueger et al. (2012) developed the Personality Inventory for DSM-5, assessing 25 traits that can be combined to assess either six categorical PDs (borderline, avoidant, schizotypal, antisocial, obsessive-compulsive and narcissistic) or lead to a diagnosis of personality disorder trait-specified, when patients demonstrate elevated trait levels.

4. Assessments of Single Personality Disorders

The review of instruments assessing single PDs follows the order in which PDs are listed in DSM-5

4.1. Paranoid

There is much debate surrounding the paranoid personality disorder due to its shared symptomology with disorders such as schizotypal PD, narcissistic PD, and schizophrenia. The paranoid PD is marked by an entrenched mistrust towards others. Only one instrument was found that specifically diagnoses the paranoid PD.

The Paranoid Personality Disorder Features Questionnaire (Useda, 2002) is a 23-item inventory that measures six scales: mistrust, antagonism, introversion, hypersensitivity, hyper vigilance and rigidity. The authors intend for the six dimensions to closely map the current literature and DSM-IV criteria. There is a shortage of papers showing the efficacy of the instrument. However the original author did find that the instrument showed satisfactory test-retest reliabilities after a six-week period.

4.2. Schizoid

There appears to be only one measure of the schizoid PD. This may be due again to the controversy between distinguishing between schizoid and schizotypal PDs and schizophrenia. Nevertheless, the DSM-5 defines schizoid PD to be characterised by a lack of interest in social relationships and a stunted range of emotions. This contrasts with schizotypal PD that is characterised by unusual thinking styles and paranoia.

The Interpersonal Measure of Schizoid Personality Disorder (IM-SZ) (Kosson, Blackburn, Byrnes, Park, Logan, & Donnelly (2008)) consists of 12 items that measure various aspects of interpersonal interaction (e.g. rapport, absence of spontaneity in speech, poor interpersonal hygiene, and lack of verbal responsiveness). In two cross-validation studies (total N = 731), acceptable levels of internal reliability were achieved (.88), with inter-rater agreement (the inventory was completed after a semi-structured interview that was focused on the individual’s quality of held interpersonal relationships) to yield a Kappa level of .69. The authors also found the measure to hold good construct validity, but do call for further validations.

4.3. Schizotypal

As already mentioned there is a difference between schizoid and schizotypal PD (STPD), with the latter characterised by abnormal thought patterns, paranoia, and referential thinking. Our review identified seven specific measures of STPD, with each measure offering varying levels of, or focusing on specific, dimensions. Of all the instruments reviewed, the variety of STPD instruments is a strength as it allows the PD to be investigated from varying approaches and perspectives.

The Referential Thinking Scale (REF; Lenzenweger, Bennett, & Lilienfeld, 1997) is a unidimensional questionnaire, featuring 34 items that measures simple and guilty ideas of reference. A referential idea is a thought that is perceived to originate from within the individual. A simple referential idea is when the individual believes that other people are observing something about themselves that they would rather remain private. A guilty referential idea is when the individual feels that they are being accused of some wrongdoing. The authors also note that referential thinking is not exclusive to schizotypal PD, therefore the instrument is designed to measure referential thought in the clinical domain, and be wholly independent of normative referential thought processes such as self-monitoring and self-consciousness. The original paper cites Cronbach’s alpha levels to be .83 and a test-retest reliability of .86. Furthermore the original authors found convergent validity for the instrument by demonstrating high REF scores to be associated with increased levels of schizophrenia-related psychological deviance, magical ideation and perceptual aberration. The measure also held weak correlations with unrelated measures of normative self-awareness, suggesting that the REF is measuring a psychologically independent aspect of referential thought processes.

The Schizotypal Personality Questionnaire (SPQ; Raine, 1991) is a 74 item self-report questionnaire that measures all nine schizotypal traits that are laid out in the DSM-III-R. These are: Ideas of reference, excessive social anxiety, odd beliefs, unusual perceptual experiences, odd behaviour, no close friends, odd speech, constricted affect and suspiciousness. Some have questioned the heterogeneous structure of schioztypal PD (Chmielewski & Watson, 2008). The measure is designed to act as a screening tool for the diagnosis of the disorder, alongside furthering research into the field by gathering data on the individual subscales of the PD. The measure produces an overall SPQ score as well as a score for each subscale. Participants respond via a forced-choice “yes/no” paradigm. Cronbach’s alpha level for the total measure was .91, with each subscale varying from .63 (constricted affect) to .81 (odd beliefs). Raine also found a test-retest correlation of .82. The instrument was validated by comparing its scores with the SCID-II omnibus measure of PD. Raine found that of the top 10% of scorers on the SPQ, fifty-five percent were also diagnosed with schizotypal PD by the SCID-II (suggesting high criterion validity as the bottom 10% of SPQ scorers received no SCID-II diagnosis). Convergent validity was also supported as each of the nine subscales (including the total score) held significant positive correlations with SCID-II scores (total SPQ score r = .68, p < .001).

Raine and Benishay (1995) developed the Schizotypal Personality Questionnaire-Brief (SPQ-B). It was designed to be a shorter version of the SPQ so that it could be used as a screening measure. Featuring only 22 items, it measures three dimensions: interpersonal deficits, cognitive-perceptual deficits and disorganisation (which combine to produce a total measure). The SPQ-B only takes two minutes to complete (lending its usefulness to academics and clinicians prior to a confirmatory interview). The original paper cites a Cronbach’s alpha level of .76, with Compton, Chien and Bollini (2007) finding a level of .86 within a psychiatric sample. Compton and colleagues also found the SPQ-B to be positively correlated with the SCID-II STPD subscale (r = .49, p < .001). Much like the SPQ, while it is a very robust measure psychometrically and across different samples (Axelrod, Grilo, Sanislow, & McGlashan, 2001), it still remains based on the DSM-III-R, though it remained largely unchanged in DSM-IV.

The Structured Interview for Schizotypy (SIS) (Kendler, Lieberman, & Walsh, 1989) is an interview-based research instrument for assessing symptoms of Schizotypal PD. The interview differed from other interviews at the time of its development in that it includes contextualised assessments of the pathological nature of specific symptoms (e.g. referential ideas) and symptom probes that aim to make faking good appear non-deviant. Based on the DSM-III-R, it features five kinds of items: closed-options (i.e. Likert scales), field-coded questions (i.e. an open ended question is asked and the interviewer codes the answer with a number that lays on a continuum), global symptom scores (measured via a one to seven Likert scale), specific symptom scores (the interviewer rates the responder’s severity of behaviours) and global scores on broad categories of behaviours. There are 19 sections to the SIS, measured across 16 dimensions. The interview takes around one hour to complete, and interviewers must be especially trained. The authors found inter-rater reliability to be high for the subscales (.87 ± .12) except magical thinking (.79 to .67), as assessed across two clinical samples (total N = 58). Discriminant validity was demonstrated, as across three pilot studies, SIS scores discriminated between participants who had schizophrenic relatives and those who did not.

Wisconsin Schizotypy Scales (Winsterstein et al., 2011). The magical ideation, perceptual aberration, revised social anhedonia and physical anhedonia scales each contain 15 items and measure a single dimension. They are designed solely for academic use. The authors found each scale to have high Cronbach’s alpha levels (.84 to .88). The scales correlated with the HEXACO-60 (Ashton & Lee, 2009), measures of curiosity (Kashdan et al., 2009), sensation seeking (Hoyle, Stephenson, Palmgreen, Lorch, & Donohew, 2002) and hypomania (Eckblad & Chapman, 1986). These measures were thought to represent the positive and negative dimensions of schizotypy. In agreement with previous studies (Kwapil, Barrantes-Vidal, & Silvia, 2008), modest correlations were found between the four scales and the aforementioned measures.

The Oxford-Liverpool Inventory of Feelings and Experiences (O-LIFE; Mason, Claridge, & Jackson, 1995). The O-LIFE views schizotypy to lie on a continuum between normality and abnormality. It was designed to be primarily used within a normal population. It comprises four dimensions: unusual experiences, cognitive disorganisation, introvertive anhedonia and impulsive nonconformity, with each ranging between 24 - 30 items. Cronbach’s alpha level for the scales ranged between .77 and .89. The O-LIFE’s validity has been supported in a variety of laboratory studies ranging from neurological functioning, performance on reasoning tasks, face- processing and childhood abuse (Avons, Nunn, Chan, & Armstrong, 2003; Sellen, Oaksford, & Gray, 2005; Mason & Claridge, 1999; Startup, 1999).

The Five-Factor Measure of Schizotypal Personality Traits (FFM STPT; Edmundson, Lynam, Miller, Gore, & Widiger, 2011) groups nine scales, constructed as maladaptive variants of FFM general traits, including Social Anxiousness (the STPT variant of FFM N1: Anxiety), Social Discomfort (the STPT variant of FFM N4: Self-consciousness), Social Anhedonia (low E1: Warmth), Social Isolation and Withdrawal (low E2: Gregariousness), Physical Anhedonia (Low FFM E6: Positive emotions), Aberrant Perceptions (O1: Fantasy), Odd and Eccentric (O4: Actions), Aberrant Ideas (O5: Ideas), and Interpersonal Suspiciousness (low FFM A1: Trust). Together, these scales are called the Five-Factor Schizotypal Inventory (FFSI). The FFSI-scales showed good psychometric properties, including support for its convergent, divergent and incremental validity beyond already existing measures.

4.4. Antisocial

Antisocial PD (APD) continues to be of a popular interest to researchers and clinicians of various fields. Eight measures of APD will be detailed below, however there is some inconsistency in terminology within the domain. For instance psychopathy can be referenced as a specific subtype of APD or as a synonym (Pickersgill, 2010).

The Antisocial Personality Questionnaire (APQ; Blackburn & Fawcet, 1999) is a 125-item self-report inventory that is designed to measure APD holistically in criminal offender populations. The measure features eight scales: self-control, self-esteem, avoidance, paranoid suspicion, resentment, aggression, deviance and extraversion, with Cronbach’s alpha levels ranging from .77 to .87. Validated in both a clinical and normal population, all scales were found to hold concurrent validity with the MCMI alongside predicting the age of an inmate’s first criminal offence and the length of detention.

The Psychopathy Checklist (PCL; Hare, 1980) has become the standard measure of psychopathy in clinical environments since it was created. When categorised as a subtype of APD, Hare defines psychopathy to be characterised by callous and unemotional traits. The PCL assesses two factors: the first focusing on the individual’s grandiose, callous and manipulative personality, and the second focusing on the individual’s deviant and impulsive life history. It has 22 items (each of which represent a facet of either factor) that form the basis of a semi-structured interview. To administer this checklist, interviewers must first take a short course to become accredited. Each item is scored on a range from 0 to 3, where higher scores represent severity. With scores ranging from 0 to 44, scores above 30 indicate a diagnosis of clinical psychopathy. Hare found that the PCL held a Cronbach’s alpha level of .90 and interrater reliability levels of .89, demonstrating good psychometric properties for a clinical interview. Hart and Hare (1989) demonstrated discriminant validity as the PCL scores were related to substance abuse.

The Psychopathy Checklist-Revised (PCL-R; Hare et al., 1990) is based on the PCL, however only features 20 items. It remains as a clinical interview that requires specific training to be administered, with scores over 30 indicating the presence of psychopathy. Furthermore, it still assesses the same two factors. The PCL-R removed two items that were found to hold low correlations with the total PCL score, as well as slightly modified the scoring criteria. The PCL-R has been well validated, and has superseded its precursor in popularity. Being validated in five clinical samples (N = 925), the average inter-rater reliability and Cronbach’s alpha level was .86 and .88 respectively. Salekin, Rogers, and Sewell (2006) found that PCL and PCL-R scores were significant predictors of future violence and aggression. Vitale, Smith, Brinkley and Newman (2002) found convergent validity with Eysenck’s Personality Questionnaire, in particular the psychoticism scale (Eysenck & Eysenck, 1975). The PCL-R is often favoured over the PCL due to its psychometric refinements. Furthermore, it is worth highlighting that the PCL/PCL-R measures a slightly different construct to the APQ: the APQ measures APD as defined by the DSM-IV, whereas the PCL-R measures psychopathy as a subtype of APD.

Levenson’s Self-Report Psychopathy Scale (SRPS; Levenson, Kiehl, & Fitzpatrick, 1995) is a 26-item self- report questionnaire that measures primary psychopathy (a synonym for factor 1 of psychopathy as measured by the PCL-R) and secondary psychopathy (a synonym for factor 2). Unlike the PCL-R, the SRPS was designed to be used in a nonclinical population. Brinkley, Schmitt, Smith and Newman (2000) found the measure to have Cronbach alphas of .83 for primary psychopathy, .69 for secondary, and .85 for the total measure. The authors also found the SRPS to positively correlate with the PCL-R and demonstrate similar correlations to measures of substance abuse and criminal versatility which may be evidence of its poor discriminant validity. Akhtar, Ahmetoglu, and Chamorro-Premuzic (2013) demonstrated its predictive validity by using the measure to predict entrepreneurial outcomes, therefore providing support for the SRPS’ utility in nonclinical samples.

The Psychopathic Personality Inventory Revised (PPI-R; Lilienfield & Widows, 2005) is another self-report measure of personality traits associated with psychopathy. Featuring 157 items, the PPI-R contrasts both the PCL-R and SRPS as it measures three dimensions of psychopathy: fearless dominance (traits that reflect social potency, shallow affect and stress immunity), impulsive antisociality (traits that reflect impulsive nonconformity, externalisation of blame and Machiavellian behaviours), and cold heartedness (lack of empathy). The original authors found Cronbach’s alpha levels to range from .79 to .92. Uzieblo, Verschuere, Van den Bussche, and Crombez, (2010) further found that the measure held convergent and discriminant validity with the SRPS; all three factors of the PPI-R were positively correlated with primary psychopathy, fearless dominance and impulsive anti-sociality correlated with secondary psychopathy. The authors created the PPI-R to challenge the monopoly the PCL-R had within the field and to address a variety of theoretical challenges within the field. The measure was designed to be used within both non- and clinical samples.

The Elemental Psychopathy Assessment (EPA; Lynam et al., 2011) is a 178 self-report measure that features 18 psychopathy subscales. The subscales are grouped into four unidimensional factors that are based on the FFM: antagonism, conscientiousness, extraversion and neuroticism. The authors chose this approach as they argued that psychopathy could be understood if it is related to the basic units of personality. For each of the subscales, Cronbach’s alpha levels ranged between .63 and .88. In a clinical sample, the original authors showed that the EPA correlated with three measures of psychopathy, including the PPI-R and SRPS (mean r = .81). EPA scores were also correlated with externalised behaviours such as alcohol abuse and antisocial behaviour in a prison sample. The benefit of using the EPA over other measures of psychopathy and APD is that its framework of the FFM allows other assessments that are based upon the FFM to be congruent with each other.

The Inventory of Callous-Unemotional Traits (ICU; Frick, 2004). The ICU differentiates itself from psychopathy, as it focuses on an individual’s empathy and caring behaviours, particularly in adolescents. Therefore the measure tries to capture behaviour across 24 items to help identify those most at risk. There are four scales; careless, callous, unemotional and uncaring. Participants respond via a 4-point Likert scale. Kimonis et al. (2008a, 2008b) found the total 24 items to hold a Cronbach’s alpha of .81, with the subscales ranging between .53 and .81. They also found the measure to hold concurrent validity as it was found to positively correlate with proactive and reactive aggression, delinquency and sexual offences, while negatively correlating with empathy and positive affect.

The Business-Scan 360 (B-Scan 360; Babiak & Hare, 2012) has been developed to investigate psychopathic features in business settings. This 360 degree tool is designed for managers, subordinates, and peers to assess corporate psychopathy in others. Initially, the B-Scan consisted of 113 items, to be rated on a 5-point Likert Scale, ranging from 1 (strongly Disagree) to 5 (Strongly Agree). Mahieu and colleagues (2012) collected data in two large independent samples of business personnel who rated their supervisors on the original B-Scan items and on several external variables. They identified a preliminary 20-item B-Scan scale that is consistent with the four PCL-based factors of psychopathy, being Interpersonal (superficial, grandiose, deceitful), Affective (lacks remorse, lacks empathy, doesn’t accept responsibility for actions), Lifestyle (impulsive, lacks realistic goals, irresponsible), and Antisocial (poor behavior controls, adolescent antisocial behavior, adult antisocial behavior). Because the B-Scan factors are meant to have utility in an organizational environment, these four factors were relabelled as Manipulative/Unethical, Callous/Insensitive, Unreliable/Unfocused, and Intimidating/Aggressive. In both samples, internal consistencies of the scales and total score were acceptable to excellent, ranging between .70 (Intimidating/Aggressive) to .99 (Callous/Insensitive). Initial data on the external validity seem promising as the psychopathic features measured with the B-Scan seem to be related to deviant behaviors at the workplace, including organizational retaliatory behavior, bullying, and interpersonal deviance (Mahieu et al., 2012).

4.5. Borderline

To our knowledge there are seven specific measures of borderline PD (BPD), nearly all adopt a multidimensional approach to measuring BPD. However there is some variation between the number and labels of the dimensions measured.

The Zanarini Rating Scale for Borderline Personality Disorder (ZAN-BPD; Zanarini et al., 2003), features nine items where respondents answer via a 5-point Likert scale. It was found to have reasonably good levels of internal consistency and convergent validity with other BPD measures. The ZAN-BPD defines BPD as consisting of four dimensions: affective, cognitive, impulsive and interpersonal. The scale is designed to be used within a clinical setting to quickly measure changes in symptomatology over time.

The Diagnostic Interview for Borderlines (DIB-R; Zanarini, Gunderson, Frankenburg, & Chauncey, 1989) is a semi-structured interview that lasts between 50 - 90 minutes. Using 132 items, the DIB-R measures the same four dimensions as the ZAN-BPD but in considerably more depth. The DIB-R is one of the most widely used instruments for BPD diagnoses due to it being: in the public domain, possessing established psychometric qualities such as good interrater reliabilities (Kappa levels range between .57 and .73; Zanarini, Frankenbug & Vujanovic, 2002) and longitudinal predictive validity of remission rates (Zanarini, Frankenburg, Hennen, & Silk, 2003).

The McLean Screening Instrument for Borderline Personality Disorder (MSI-BPD; Zanarini et al; 2003) is a 10-item self-report scale that is designed to complement clinical interview instruments as it identifies the potential of a possible diagnosis. Its brevity, test-retest reliability (.72), good sensitivity (correctly identified 81% of diagnoses) and specificity (correctly identified 85% of non-diagnoses) make it a useful screener.

The Borderline Personality Disorder Beliefs Scale (BPDBS; Butler, Brown, Beck, & Grisham; 2002) differs from the aforementioned BPD instruments, as the BPDBS is influenced by cognitive behavioural therapy (Beck & Freeman, 1990) as it is constructed out of 14 items found on the Personality Belief Scale (PBS; Beck & Beck, 1991). Butler and colleagues (2002) identified that these 14 items could discriminate between BPD patients and patients with other PDs. Although not widely used, the BPDBS’s partnership with cognitive therapy offers a different approach for clinicians and researchers alike.

The Borderline Personality Questionnaire (BPQ; Poreh et al., 2006) is an 80 item self-report scale. Unlike the aforementioned instruments that focus around four dimensions, the BPQ has nine. Although there is some similarity (affective instability, impulsivity and relationships), it introduces dimensions such as intense anger, suicide/self-mutilation and quasi-psychotic states. When compared with the MMPI and the SPQ, it showed significant coefficients of .48 and .45 respectively, suggesting acceptable discriminant validity. Similarly, convergent validity with the MMPI yielded a coefficient of .85.

The Borderline Evaluation of Severity Over Time (BEST; Pfohl, Blum, St. John, McCormick, Allen, & Black; 2009) is a 15 item self-report instrument that differs from other measures of BPD as it is focused on how symptomatology changes over time, rather than diagnosing the disorder. The BEST features three subscales each focusing on either: problematic thoughts and emotions that are characteristic of BPD (e.g. suicidal flirtations), problematic behaviours, or the use of positive behaviours. Each item is rated on a 5-point Likert scale. The tool measures respondents’ behaviour over the previous week, explaining its utility for therapists and clinicians. The original paper cites test-retest reliabilities (r = .61, p < .001), alongside face validity (due to its focus on the thoughts and feelings of the individual), and concurrent validity (it was shown to correlate significantly with the ZAN-BPD).

Five Factor Borderline Inventory (FFBI; Mullins-Sweatt, Edmundson, Sauer-Zavala, Lynam, Miller, & Widiger; 2012). This measure is unique in that it builds its structure around the five-factor model of personality, promoting a new multi-dimensional approach of BPD. The authors argue that by viewing BPD as comprising a constellation of maladaptive traits rather than a homogenous category, the full range of the disorder can be assessed. It assesses eight scales, each with 10 items. Items vary from five factor-based traits such as rashness, distrustfulness, and manipulativeness, to more traditional behavioural and affective dysregulation. The original paper cites internal consistency levels no less than .77 and good convergent validities with the PDQ, PAI and SNAP. It was also shown to hold incremental validity at predicting BPD over the NEO PI-R and the BPD scale of the PAI.

4.6. Histrionic

The Five-Factor Measure of Histrionic Traits (FFM-HIS; Tomiatti, Gore, Lynam, Miller, & Widiger, 2012). Parallel to maladaptive variants of FFM schizotypal and borderline traits, 13 Five Factor Histrionic trait scales were constructed as maladaptive variants of FFM general traits, including Neediness for Attention and Rapidly Shifting Emotions, Intimacy Seeking, Attention Seeking, Social Butterfly, Flirtatiousness, Melodramatic-Emo- tionality, Romantic Fantasies, Touchy Feely Suggestibility, Vanity, Disorderliness, and Impressionistic Thinking. Tomiatti and colleagues (2012) reported good psychometric qualities, adequate convergent validity for 11 out of the 13 FFHI scales and incremental validity over their respective NEO-PI-R facet scales for 12 out of 13 scales in accounting for PDQ-4 histrionic variance.

4.7. Narcissistic

Narcissistic PD (NPD) has generated a lot of interest in the past 40 years, with various instruments dedicated to identify and diagnose this PD. There are many different approaches to NPD, debating whether it is multidimensional and its place within the clinical sphere.

Murray’s Narcissism Scale (1938) is a 20-item self-report scale that measures narcissism in terms of overt, grandiose behaviour and covert feelings of insecurity. Despite the age of the instrument, its theoretical underpinnings are convergent with recent literature. Hendin and Cheek (1997) found the measure to have a Cronbach’s alpha of .76 and found 10 of the items from Murray’s scale to significantly correlate with MMPI’s narcissism scale, demonstrating concurrent validity. They constructed a Hypersensitive Narcissism Scale (HSNS) that measures the covert aspect of narcissism. This was shown to have a Cronbach’s alpha of .82 and correlated with the five-factor model of personality, MMPI and the exploitativeness/entitlement dimension of the NPI-40. Arble (2008) further found this measure to hold negative correlations with self-esteem and positive correlations with self-reported measures of shame, masochism, social inhibition, social incompetence and egocentricity.

The Narcissistic Personality Inventory (NPI; Raskin & Hall, 1979) has received a considerable amount of attention since its first conception. Based on the DSM-III definition, it features 54 items that measure a general construct of trait narcissism (that is narcissism within nonclinical populations). Raskin and Hall (1981) reported that over an eight-week period the instrument’s test-retest reliability was .72. In contrast to Raskin and Hall’s original conception of a single construct, a factor analysis revealed four salient dimensions: leadership/authority, superiority/arrogance, self-absorption/self-admiration and exploitativeness/entitlement (Emmons, 1984). These four factors accounted for 72% of all variance with respective Cronbach alpha’s of .86, .74, .79, .69, and .69 for the total scale. Emmons (1984) also found the NPI to correlate with normal personality dimensions and peer ratings of narcissism, providing support for the validity of the construct. Lastly, Priftera and Ryan (1984) found the NPI to be strongly correlated with the MCMI narcissism scale within a clinical sample.

Raskin and Terry (1988) further developed the NPI, producing the NPI-40, which has become the most popular measure to assess NPD. The NPI-40 features just 40 items with a Guttman alpha of internal consistency statistic of .83. Most interesting about this version of the NPI is that seven dimensions were also identified: authority, exhibitionism, superiority, vanity, exploitativeness, entitlement, and self-sufficiency—all of which were found to have internal consistency levels no lower than .50. The total scale and its dimensions were found to correlate with various trait rankings on self-confidence, physical attractiveness, pleasure seeking and assertiveness as measured by other instruments included in the Institute of Personality Assessment and Research (IPAR) battery.

Ames, Rose and Anderson (2006) reduced the NPI-40 to a 16-item, uni-dimensional measure. The NPI-16 incorporates the seven factors found within the NPI-40, however it produces a single score that simply represents how narcissistic the individual is. The NPI-16 was found to have a Cronbach’s alpha level of .72, which is satisfactory as the authors found the NPI-40 to have a level of .84. Both the NPI-40 and NPI-16 correlated with self-ratings of attractiveness, competence, and big five measures, suggesting evidence of predictive validity.

Margolis-Thomas Measure of Narcissism (M-T; Margolis & Thomas, 1980). The M-T, an unpublished master’s thesis, differs from the NPI as it is able to make clinical diagnoses. The scale features 60 items, and measures the six dimensions of NPD defined by the DSM-III. Each item is a paired statement, with one being narcissistic and the other not, and respondents choose the statement they believe is most true to themselves. The original authors cite an internal consistency coefficient of .84, with the measure successfully differentiating between adolescents with and without NPD, suggesting concurrent validity. Mullins and Kopelman (1988) created a short version that consisted of only 24 paired samples, as they wanted a smaller, more efficient battery. This scale had an internal consistency level of .69. This short version was inversely related to life, self-, family and job satisfaction demonstrating evidence of predictive validity.

The Pathological Narcissism Inventory (PNI; Pincus, Ansell, Pimentel, Cain, Wright, & Levy, 2009). The PNI differs from various other narcissism measures as it aims to distinguish between pathological/clinical and normal narcissism. Comprising of 52 items, it assesses two overarching dimensions: narcissistic grandiosity and narcissistic vulnerability. Narcissistic grandiosity features four pathological facets that are not dissimilar to those found in the NPI-40: entitlement rage, exploitativeness, grandiose fantasy, and self-sacrificing self-enhancement. Narcissistic vulnerability comprises of three facets: contingent self-esteem, hiding the self, and devaluing. Therefore the PNI is similar to Murray’s original conception of narcissism that consists of overt and covert behaviours. The PNI was found to have a Cronbach’s Alpha of .95, with all the facets scoring between .75 and .93. Concurrent validity was found as PNI scores held significant correlations with NPI and HSNS scores (this is true for both total and facet scores). PNI scores were also found to predict empathy, aggression and low moral values, as well as psychiatric variables such as number of suicide attempts, no-shows to therapy sessions, and whether psychiatric medication is being taken.

Five Factor Narcissism Inventory (FFNI; Glover et al., 2012). Like its sister inventories, the FFNI groups 15 narcissism trait scales that are constructed as maladaptive and narcissistic variants of FFM facets, including Reactive Anger, Shame, Indifference, Need for Admiration, Exhibitionism, Thrill-Seeking (also represented in the EPA; Lynam et al., 2011), Authoritativeness, Grandiose Fantasies, Cynicism/Distrust (also represented in the EPA; Lynam et al., 2011), Manipulativeness, Exploitativeness, Entitlement, Arrogance, Lack of Empathy, and Acclaim-Seeking. Good to excellent internal consistencies were reported, including adequate convergent validity for 14 out of 15 FFNI scales. It was concluded that the 15 different scales provided a comprehensive and multifaceted description of narcissistic pathology.

There are also other less well known measures of NPD (Ashby, Lee, & Duke, 1979; Richman & Flahery, 1990).

4.8. Avoidant

No specific measure of avoidant personality could be found, except for the Five-Factor Avoidant Assessment (FFAvA; Lynam, Loehr, Miller, & Widiger, 2012), proposing 10 maladaptive avoidant variants of FFM traits, including scales assessing Evaluation Apprehension, Despair, Mortified, Overcome, Social Dread, Shrinking, Risk Averse, Joyless, Rigidity, and Timorous. Initial validation results are promising, showing homogenous and reliable scales, that are strongly related to their respective NEO-PI-R facets, and with all FFAvA scales, except Timorous, showing expected relationships with AVD measures enclosed in comprehensive PD inventories.

4.9. Dependent

The DSM-5 characterises DPD as displaying maladaptive clinging behaviour towards others for care, advice and support. Our review identified four instruments that specifically measured DPD.

The Minnesota Multiphasic Personality Inventory-2 Social Introversion Subscales (MMPI-2 Si1, 2, 3; Ben-Porath, Hostetler, Butcher, & Graham, 1989). The instrument consists of 38 items and is based on the subscales of MMPI-2. Using item-level factor-analysis, Ben-Porath and colleagues produced three exclusive subscales that are designed to collectively measure DPD, in particular the avoidance of being alone. The three scales, Shyness/Self Consciousness, Social Avoidance, and Self/Other Alienation, were found to hold acceptable internal consistency coefficients (ranging between .75 and .82). The authors found that two subscales predicted 80% of the variance in a study on social introversion. Further ad hoc validity for the role of shyness comes from Lorant, Henderson, and Zimbardo (2000) who found that in a clinical sample (N = 107), 60 participants that were found to be shy, also had a personality disorder, with DPD being the most common. Unfortunately there have not been many studies to further validate the three subscales. It could even be criticised in that the scales are not directly measuring DPD—rather a facet, or another comorbid construct altogether.

The Dependent Personality Questionnaire (DPQ; Tyrer, Morgan, & Cicchetti, 2004) is an eight item self-report questionnaire. The instrument is intended to be used as a screening tool to identify patients that potentially have DPD. Participants rate themselves using a 4-point Likert scale that ranges from 0 to 3. Although the original authors do not cite any statistics on the instrument’s reliability, they demonstrate that the DPQ holds good diagnostic validity. The DPQ held an overall diagnostic accuracy of 87.5% in psychiatric patients diagnosed with DPD according to the ICD-10 version of the Personality Assessment Schedule (Tyrer, 2000). The DPQ’s diagnostic sensitivity, specificity, predicted positive and negative accuracies were all 87.5%. Compared to matched controls, patients diagnosed with the disorder had a mean score of 13 (controls comparably scored a mean of 7).

The Dependent Personality Inventory (DPI; Huber, 2005). The DPI is a 55-item questionnaire that measures seven independent factors representing various symptoms of DPD as defined by the DSM-IV, including: difficulty making decisions, assuming responsibility, difficulty expressing disagreement, difficulty initiating projects, seeking support from others and feeling helpless and alone. The original paper found the DPI to have a high internal consistency with a Cronbach’s alpha of .90.

The Five-Factor Measure of Dependent Traits (FFM DPT; Gore et al., 2012). Gore and colleagues created 12 scales to assess FFM dependent traits, including Separation Insecurity, Pessimism, Shamefulness, Helplessness, Intimacy Needs, Unassertiveness, Gullibility, Selflessness, Subservience, Self-effacing, Ineptitude, and Negligence. Internal consistencies of the scales were good, and scales correlated with their NEO-PI-R equivalents, also demonstrating discriminant validity towards different NEO-traits, and incremental validity beyond their corresponding NEO PI-R facets to explain variance in the SNAP DPD scale.

There are also some other measures of DPD (Hirschfeld, Klerman, Gouch et al., 1977; Bornstein, Geiselman, Eisenhart, & Languirand, 2002; Bornstein, Languirand, Geisleman et al., 2003).

4.10. Obsessive-Compulsive

Obsessive-compulsive PD (OCPD) is rather similar to the Axis 1 obsessive-compulsive disorder (OCD), and therefore they are often confused. Indeed it has been suggested that the evidence for the construct validity of these two scales was mixed (Phillips et al., 2010). However, where OCD can be defined as a mental disorder characterized by ego-dystonic, intrusive and time-consuming obsessions and compulsions, OCPD rather reflects an ego-syntonic personality style including stable traits such as perfectionism and rigidity. Assessing these disorders is difficult due to their comorbidity with (each other and) other disorders as well as their heterogeneity.

The Five Factor Obsessive-Compulsive Inventory (FFOCI; Samuel, Riddell, Lynam, Miller, & Widiger, 2012) is a self-report measure that maps OCPD onto the FFM. The 12 dimensions of the instrument each represent a maladaptive, lower-order version of a big five trait. The scale’s Cronbach’s alpha levels ranged between .77 and .87. The scales were shown to hold convergent validity with the MCMI-III (r = .58, p < .01), PDQ-4 (r = .50, p < .01) and the SNAP-2 (r = .66, p < .01). The instrument was also found to hold incremental validity over the MCMI-III.

5. Discussion

This paper has been a PD assessment “housekeeping and auditing” exercise, targeting comprehensiveness rather than reflecting usage frequency or the publication track-record of instruments. Such criteria probably more reflect whether a measure has been included in one of the four major longitudinal studies on PDs resulting in an increased number of publications, instead of reflecting their quality status. We made the choice to go back as far as DSM-III covering old and new measures. We also did not do a “quality control” in terms of journal ratings as these vary over time and have been criticised.

This review has made a number of things clear: (a) there are multiple options to assess PDs, both comprehensively and specifically, (b) objectives of users may largely vary, going from general screening towards differentiated (clinical) diagnostics, (c) methods vary in terms of in-depth assessment and necessary time, with structured interviews generally preferred for clinical assessment over self-report inventories, (d) there exist a number of “fore wash” assessment methods, that may be followed by more in-depth assessment when appropriate, (e) also the contexts of the assessment may vary, ranging from clinical to more personal developmental and occupational related questions, and finally (f) methods may differ in terms of whether they assess primarily pathological trait variance or also tap into general traits.

To facilitate choosing among measures, a decision tree was developed distinguishing three major entries, i.e. research, clinical assessment and assessment with a developmental purpose (Figure 1). Researchers can choose between categorically-based versus more dimensionally-based PD measures usually relying on individuals’ self-reports. A similar dichotomy is available for clinical diagnostics and decision making, with the same dimensional measures available like for research purposes. Categorically-based clinical assessment may first involve a pre-screening followed by more specific PD assessment focussing on single PDs. Our review has made clear that there are various options for the majority of the PDs. A broader clinical assessment procedure may also include a comprehensive PD assessment, usually done via structured interviews, assessing both nature and severity of personality pathology in terms of DSM-5 categorical constructs. Finally, an increasing number of psychologists are interested to use personality pathology assessment instruments and methods to identify an individual’s personal and professional needs, in order to help develop and mould the sharp sides of their personalities. These are often related to subclinical forms of personality pathology, also called aberrant personality tendencies (Wille et al., 2013; De Fruyt et al., 2013ab). Given the idea of a spectrum or continuum between dimensions of general and maladaptive personality traits (De Bolle et al., 2012) and that the US Legislation (Americans with Disabilities Act, ADA, 1990) prohibits that clinical measures are used in personnel selection and development assessments, FFM general trait based measures of personality dysfunction seem to be most useful. In addition, also a number of DSM-IV based personality measures have been developed to understand personality functioning at work, contextualizing the item content and/or instructions with a work-frame. Also for these purposes, different options are available and increasingly used.

The purpose of the present review was not only to list the available PD measures, but also to illustrate the

Figure 1. Personality pathology assessment decision tree.

increased attention and use of such assessment devices for a broader range of purposes including clinical and non-clinical assessment and research. This expanded scope is a direct consequence of advances in the conceptualization of PDs from distinct categorical entities to the consideration of personality pathology dimensions that are more quantitatively than qualitatively different from normal trait variation. Despite the many suggestions to replace the categorical PDs by a dimensional system including more specific traits, DSM-5 continues the categorical conceptualisation of PDs making our review of assessment methods still timely. We hope that professionals and researchers will find our review helpful in choosing an appropriate assessment method for their purposes, but will consider at the same time also the trait based system of personality pathology described in DSM-5 Section 3.


