Estimating One’s Own and Other’s Psychological Test Scores

doi:10.4236/psych.2018.98127

Psychology
Vol.09 No.08(2018), Article ID:86965,19 pages
10.4236/psych.2018.98127

Adrian Furnham

●How to Cite this Article

BI: Norwegian Business School, Oslo, Norway

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: July 5, 2018; Accepted: August 27, 2018; Published: August 30, 2018

ABSTRACT

This paper examines how accurate people are at estimating their own psychometric test results, which assess personality, intelligence, approach to learning and other factors. Seven groups of students completed a battery of power (general intelligence, fluid intelligence, creativity and general knowledge) tests and preference (approaches to learning, emotional intelligence, Big Five personality) tests. Two months later (before receiving feedback on their psychometric scores) they estimated their own scores and that of a class acquaintance who they claimed to know well on these variables. Results from the different samples were reasonably consistent. They showed that participants could significantly predict/estimate their own Neuroticism, Extraversion and Conscientiousness scores, as well as their General, Fluid and Crystalised intelligence, Approaches to Learning, Creativity and Happiness. Correlations between estimated and test-derived scores for an acquaintance were around half those for self-estimates and better for personality than ability. Participants self and “other” estimates were nearly all significantly positive. The discussion considers when, if ever, self-estimated scores can be used as proxy for test scores and what self-estimated scores indicate. Limitations are considered.

Keywords:

Personality, Intelligence, Creativity, Learning Style, Self-Estimation, Self-Assessment

1. Introduction

There are psychological studies on the validity of self-estimates going back nearly 100 years (Shen, 1915) . Most of these studies have looked at self-estimates of ability/intelligence (Ackerman & Wolman, 2007; Chan & Martinussen, 2015; Holling & Preckel, 2005; Gold & Kuhn, 2017; Kang & Furnham, 2016; Tierney & Herman, 1973; Zell & Krizan, 2014) but others have looked at self-estimates of personality (Furnham, 1997, 2001; Ziegler, Danay, Scholmerich, & Buhner, 2010) as well as concepts like emotional intelligence (Siegling, Sfier, & Smyth, 2014) . There also have been various reviews (Freund & Kasten, 2012; Furnham, 2016) . Some studies have also looked at the ability of people to estimate others, rather than own scores (Cogan, Conklin, & Hollingworth, 1915) while some studies the stability of judgments over time (Jonsson & Allwood, 2003) .

The concern of this research is which psychometric test scores are people more and less accurate at estimating and why. This becomes important if self-estimates are used as proxy for actual test scores, under particular circumstances.

This paper reports seven similar studies run on different cohorts that look at the correlations between self- and other-estimated scores and test performance on various preference and ability tests. The methodology allows three issues to be addressed: self-versus other-estimates, self-estimates versus psychometric scores, other-estimates versus psychometric scores.

Self-estimated and psychometrically measured IQ

A limited number of studies have investigated this issue using a fairly diverse series of measures, yet the results have been fairly consistent. De Nisi and Shaw (1977) tested 114 students on 10 different ability tests and also obtained self-ratings. All the correlations between the two scores were significant and positive with five being (r > .30). They concluded that self-reports of ability cannot substitute for validated measures (i.e., IQ tests). Mabe and West (1982) later found a correlation of r = .29 between self-estimated and objective abilities. Borkenau and Liebler (1993) showed that when strangers rated the intelligence of people they saw relatively briefly on a video, the correlation between other-estimate and psychometric score was r = .43.

Various different studies done in Great Britain revealed modest test-estimate correlations (Furnham & Rawles, 1999; Reilly & Mulhern, 1995) . Furnham & Chamorro-Premuzic (2004) found a correlation of r = .30 (n = 184) between self-estimates and scores on the Wonderlic Personnel Test. In a cross-cultural study comparing 172 British and Singaporean students, Furnham and Fong (2000) found the correlation between estimated and measured IQ was r = .19 overall (British r = .14; Singaporeans r = .26). The highest correlation was for Singaporean females (r = .51) and lowest for British females (r = .08). In a meta review Zel and Krizan (2014) found a overall r = .29 between self-evaluated and overall performance measures.

Paulus, Lysy and Yik (1998) reviewed the relevant literature and found that correlations between single-item, self-reports of intelligence and IQ scores tended to rarely exceed r = .30 in college students. The authors concluded “as a whole, our verdict is pessimistic about the utility of self-report as proxy measures of IQ in college samples” (p. 551). Recent studies have found similar correlations between self-estimated and psychometrically measured intelligence, thus supporting Paulus et al.’s (1998) conclusion (Chamorro-Premuzic et al., 2004) .

Ackerman and Wolman (2007) thoroughly tested 142 mature American students on a large number of ability tests including verbal, spatial and mathematical tests. Self-estimates were obtained prior to, and after, actual testing. All correlations were positive, though there was wide variability (.27 to .54). Higher correlations were found when both variables were aggregated to make them more reliable: r = 0.33 for spatial ability, and r = 0.44 for mathematical ability. Interestingly participants gave lower estimates for verbal than maths or spatial ability because they had better knowledge of them.

Correlations are affected by two things: whether estimates are made before or after taking the test and which tests are taken. Correlations tend to be more modest (and often more accurate) when self-estimates are made after tests. They also tend to be more modest on crystalised rather than fluid intelligence tests (Furnham, Chamorro-Premuzic, & Moutafi, 2005) . In this study we examine participants’ ability to predict their own score on a variety of IQ tests to examine the extent to which they vary. It is predicted that both self and other estimated and actual IQ scores would be significantly positively correlated.

Various studies have also looked at emotional intelligence (Petrides & Furnham, 2000; Petrides, Furnham, & Martin, 2004; Siegling et al., 2014) . They all found significant positive correlation between self-estimates and actual scores.

Self-estimated and psychometrically assessed personality

There is a small, but consistent, literature on the relationship between estimates of, and scores on, psychometrically validated personality tests. Various studies have looked at participants’ ability to predict their own Extraversion and Neuroticism scores (Vingoe, 1966; Harrison & McLaughlin, 1969; Gray, 1972; Semin, Rosch, & Chassein, 1981; Blaz, 1983) . Studies in this area have used a large number of personality measures, including the Fundamental Interpersonal Relations Oriented-Behaviour (FIRO-B), the Myers-Briggs Type Indicator (MBTI) (Furnham, 1990) , and locus of control measures (Furnham & Henderson, 1983) . Furnham (1997) used the NEO-FFI (Costa & McCrae, 1988) to measure the Big Five personality traits, and found participants were best at predicting Conscientiousness (r = .57), followed by Extraversion (r = .52) and Neuroticism (r = .51). They were least good at predicting their Openness-to-Experience score (r = .33) and Agreeableness (r = .39). Furnham and Chamorro-Premuzic (2004) looked at self-estimate and actual test derived scores on all 30 facets of the NEO-PI-R. The most consistent were for the six Conscientiousness scales (range r = .18 to r = .54; mean r = .41). Overall the correlation for six facets (N1, N2, N3, E3, E4, C5) were r > 0.50 while four (N5, O3, A2, A6) were non-significant. They also showed that less Agreeable, Neurotic participants gave lower estimates of their overall intelligence.

Approach to learning

The literature on approaches to learning antedates the research on learning styles and approaches to learning. Whereas the “style” literature is about how different people choose to process material, the “approaches” literature is clearly much more concerned with motivation and assessment. The issue is how people approach their learning task. Murray-Harvey (1994) notes that both styles and approaches researchers are concerned with the learning strategy that students use which are considered important attributes that they bring to any learning situation.

Most researchers observed that if students were given a text to read that they knew would be examined on some tried to understand, contextualize and comprehend the “big picture” content while others focused on remembering what they thought were the “facts” that they would be examined on. These two very different approaches have been called deep vs. surface approaches. To adopt the deep approach means to achieve a critical understanding and retention of concepts that are integrated into a knowledge schema and used for problem solving. The surface approach is based on a pragmatic short-term memorization of salient facts for examination or repetition.

This study will use the Biggs (1987) measure, which assesses the surface, deep, and achievement-oriented approach to learning. Because this study tested students it was predicted that correlations between self as well as other estimated actual scores would all be positive and significant.

Creativity

A few have investigated the relationship between self- and objectively measured creativity and have used different measures of creativity (Kaufman, 2006; Karwowski, 2011) . In one study the correlation between self-estimated creativity and a test score was r = .27 (N = 64) (Furnham, Zhang, & Chamorro-Premuzic, 2006) . Because most people believe they are creative and because the concept is so loosely defined we predicted a low, but positive and significant correlation between self-estimates and test score.

This study

A central question is which psychometric test scores people are able to predict with any degree of accuracy. It could be assumed that people are able to predict scores for dimensions that they understand or where they have some frame or schema of reference. If, for instance, a person is required to estimate his or her Extraversion or Conscientiousness score accurately, he or she would have to be familiar with the psychological concept, be clear about the situations or phenomena to which it applied and be aware of how he or she compared with population norms for Conscientiousness and Extraversion. Thus, to do this task well, a participant needs to access and use a cognitive category or framework concerning personality traits.

This study moves the literature forward in three ways. First, while it replicates earlier studies on measures of intelligence and personality, it uses new measures including emotional intelligence, creativity, happiness and approaches to learning. Second, this paper reports seven cohorts of students to examine the replicability of the results. Rather than combine the samples (measured over different years) on those measures, which were the same, we treated this as seven replication studies. Third, we used in all five different measures of intelligence to see if there were significant differences in the correlations as a function of the different tests.

2. Method

2.1. Participants

Participants were undergraduate students in London. Study 1: N = 72, 55 females, median age 19 yrs; Study 2: N = 95, 71 females, median age 20 yrs; Study 3: N = 91, 74 females, median age 19 yrs; Study 4: N = 118, 90 females, median age 20 yrs; Study 5: N = 106, 85 females, median age 19 yrs; Study 6: N = 102, 71 females, median age 19 ys; Study 7: N = 96, 62 females, median age 20 yrs. All the participants were fluent English speakers and collaborated in this study as part of their course-work.

2.2. Measures

Personality. The NEO Personality Inventory?Revised (NEO-FFI; Costa & McCrae, 1992 ). This 60-item, non-timed questionnaire which measures the “Big Five” personality factor. The manual shows impressive indices of reliability and validity. Test-retest reliabilities range from r = .71 for Agreeableness to r = .80 for neuroticism.

Approaches to Learning. Study Process Questionnaire (Biggs, 1987) . This is a 42-item questionnaire that yields six scores. There are 3 approaches and 2 components. The first component is learning motive (why students learn): the second learning strategy (how students learn). The three approaches are surface (a reproduction of what is taught to meet the minimum requirement), deep (a real understanding of what is learned), and achieving (designed specifically to maximise grade). The questionnaire has been repeatedly shown to have satisfactory internal reliability and test-retest reliability (r = .82), content, construct and predictive validities.

Emotional Intelligence (EQ). Trait Emotional Intelligence (TEIQ) (Petrides & Furnham, 2003) . Trait EI “refers to a constellation of behavioural dispositions and self-perceptions concerning one’s ability to recognize, process and utilize emotion-laden information. It encompasses various dispositions from the personality domain, such as empathy, impulsivity and assertiveness as well as elements of social intelligence and personality intelligence, the latter two in the form of self-perceived abilities”. Studies report test-retest reliability of between r = .74 and r = .84.

Verbal Reasoning. The Baddeley Reasoning Test (Baddeley, 1968) . This 64-item test can be administered in 3 minutes and measures Gf through logical reasoning. Scores can range from 0 - 64. Each item is presented in the form of a grammatical transformation that has to be answered with “true”/”false”, e.g. “A precedes B ? AB” (true) “A does not follow B ? BA” (false). The test has been employed previously in several studies (e.g. Furnham & McClelland, 2010 ) to obtain a quick and reliable indicator of people’s intellectual ability. It has a test-retest reliability of r = .80.

General Knowledge. General Knowledge Test (Von Stumm, 2009) . This is a 72 item questionnaire that measures knowledge of six areas: literature, general science, medicine, games, fashion and finance. Each area is measured by 10 items, and each correct response is awarded 1 point (in a few cases, there are two correct responses and not one). The internal reliability of the test for the present sample was a = .78.

Creativity. The Barron-Welsh Art Scale (Barron & Welsh, 1952) . This scale consists of 86 different black and white pictures arranged and numbered to 8 pictures per page. Participants are instructed to make quick, instinctive, dichotomous judgements about whether they like/dislike each picture. This test requires no language skills, can be used on children and adults, is simple and does not require extensive concentration. The test-retest reliability is r = .81

Happiness. Oxford Happiness Questionnaire (Hills & Argyle, 2002) . It measures trait happiness. This is 29 item scale that was devised the “opposite” of the Beck Depression Inventory. It was one of the first measures to be used in the Positive Psychology revolution and there is a short version. The psychometrics are good though there is some question about its dimensional structure.

Fluid intelligence. Advances Progressive Matrices Set II (Raven, 1938) . This is a 36 item test, possibly the most famous in psychology. Participants are shown a diagram with 9 pictures of complex shapes with one missing. Participants have to choose between 8 options of figures that logically fit in the missing space. The test has been extensively validated against other measures of fluid and crystallised intelligence.

General Intelligence. The Wonderlic Personnel Test (Wonderlic, 1990) . This 50-item test can be administered in 12 minutes and measures general intelligence. Scores can range from 0 to 50. Items include word and number comparisons, disarranged sentences, serial analysis of geometric figures and story problems that require mathematical and logical solutions. The test has impressive norms and correlates very highly (r = .92) with the WAIS-R total IQ score.

Arithmetic. Mental Arithmetic (Lock, 2008) . This is a 30 item test requiring a person to make 10 arithmetic calculations (multiply, divide, add, subtract) per item. It is meant to be a mental test, though some people do attempt written calculations. Ten minutes were allowed for the administration.

2.3. Procedure

Participants in each study were tested simultaneously in a large lecture theatre in the presence of five examiners who ensured the tests were appropriately completed. They completed the tests in two settings each lasting around 40 minutes. Two months later in a lab setting the tests were explained: what each factor measured (i.e. the full definition based on the manuals), and shown population norms and means, as well as the means for their group. They were asked to estimate their (and their friends) score on the same scale shown in the results for each test. For example for the Wonderlic they were shown a normal distribution scores of over 100,000 showing the range (i.e. 50) the mean score and one standard deviation above and below the mean. They were also given reminders of what the tests looked like to refresh their memory. They were asked to nominate a person in the class who they knew best (i.e. “friend”) and also to make an estimate for them. They also indicated on a 5-point scale how well they knew this person from “not much” to “extremely well”. This task thus involved around 30 estimates. Immediately after they had completed the exercise they got their test scores, which were explained, in detail. They also saw the correlational results shown in this study two weeks after making their estimates.

3. Results

Study 1: Table 1 presents the descriptive statistics and correlations. Twelve of the 14 self-estimate-actual scores were significant, but only 6 of the other-estimate-actual scores. The highest correlations were for Extraversion and the lowest for Emotional Intelligence. The self-other test scores indicated that the pairs were only significantly alike in their Emotional intelligence, Extraversion and Neuroticism scores. On the other hand their self-other estimate scores indicated that they believed they were alike on 10 scales, particularly General Knowledge and Openness.

Study 2: Table 2 shows the results of the correlational analysis. Of the 14 self-estimated/actual score correlations 12 were significant, but two (Openness and Agreeableness) negative. The highest was for verbal reasoning, followed by deep approach to learning and then Extraversion. Six of the 14 other estimated actual scores were significant, but only two of the self/other actual scores. Ten self/other estimated scores were significant and all positive.

Study 3: Table 3 shows all but one of the self-estimate/actual scores was significantly positively correlated with all intelligence test scores r > .50. Six of the 17 other estimate actual scores were significant. In this study seven of the self and other actual test scores were significantly positive indicating a similarity in personality, ability and approach to learning between the participants. As in the other studies self and other estimates were nearly always significantly and positively related.

Study 4: Table 4 shows with one exception (Openness) all the self-estimate/actual scores were significant with five being r > .50 (Neuroticism, Extraversion, Conscientiousness, Verbal Reasoning and General Knowledge. Nine of the 13 other estimate/actual scores were significant with Extraversion being the highest correlation. Seven of the self/actual scores were significant particularly General Knowledge. All but one of the self/other estimates was significant all being r > .30.

Study 5: In all 11 out of the 13 self-estimate/actual scores were significant the highest being for Happiness, Extraversion and Conscientiousness but with Agreeableness showing a negative relationship (See Table 5). Seven other estimate/actual correlations were significant and two of them were significantly