by a literal banana
I have been trying to understand the “lexical hypothesis” of personality, and its modern descendant, the Five Factor Model of personality, for several months. In that time, I have said some provocative things about the Big Five, and even some unkind things that I admit were unbecoming to a banana. Here, I wish to situate the Five Factor Model in the context of its historical development and modern use, and to demonstrate to the reader the surprising accomplishment that it represents for the field of psychology.
In personality research, the “lexical hypothesis” refers to a hypothesis attributed to Francis Galton (1884). Galton supposed that each human language would reflect important realities of human character within that language and culture. In particular, he noted that the words used to evaluate character and personality are very numerous (he estimated over a thousand, using a thesaurus), and often overlap in meaning.
But Galton immediately left his thesaurus behind, readily admitting of the impossibility of defining any aspect of character. Rather, he turned to experimental means of testing the character in various ways, and insisted that no particular map or model of personality is needed to start from.
Nowhere in his essay does Galton propose surveys as a means for studying character. He would probably regard such methods as unscientific, as indicated in his final paragraph:
[C]haracter ought to be measured by carefully recorded acts, representative of the usual conduct. An ordinary generalisation is nothing more than a muddle of vague memories of inexact observations. It is an easy vice to generalise. We want lists of facts, every one of which may be separately verified, valued and revalued, and the whole accurately summed. It is the statistics of each man’s conduct in small every-day affairs, that will probably be found to give the simplest and most precise measure of his character.
The methods that Galton proposed are exclusively non-linguistic. For instance, he commented that observing children involved in play quickly gives one an idea of each child’s emotional expression. Galton’s proposed methods prefigure both hidden camera prank shows and Goffman’s “breaching experiments:”
I will not attempt to describe particular games of children or of others, nor to suggest experiments, more or less comic, that might be secretly made to elicit the manifestations we seek, as many such will occur to ingenious persons. They exist in abundance, and I feel sure that if two or three experimenters were to act zealously and judiciously together as secret accomplices, they would soon collect abundant statistics of conduct. They would gradually simplify their test conditions and extend their scope, learning to probe character more quickly and from more of its sides.
Other methods Galton expressed enthusiasm for include heart rate measurement (he wore a home-brew heart-rate-measuring apparatus while he delivered the lecture that makes up the text) and methods discoverable from personal context (giving an example from Benjamin Franklin, of a man with one attractive and one deformed leg, who kept track of which leg his interlocutors paid attention to, as a gauge of their optimism or pessimism). Galton would be surprised, I think, to find that the most promising and scientific theory of personality in the twenty-first century is premised entirely on survey responses as its “facts.”
Early in the study of personality, there was a major shift of meaning in the lexical hypothesis. At first, the thesaurus and the word list were its tools of study (e.g., Allport & Odbert, 1936); the idea was to find common factors of meaning in the words themselves. Of course, there is no particularly scientific way to decide how much the word “annoying” is the same as “obnoxious,” or how much either is the same as “low-status.” The major shift was to begin to measure the correlations of an entirely different construct: the correlations of the words when used to describe a particular person. That is, rather than trying to measure the underlying meaning of words, researchers began to measure the degree to which different words were applied to the same person. “Sameness” and “correlation” were no longer distinguishable concepts for the methods.
Initially, lists of adjectives, and eventually, short survey questions, were administered to subjects, who described either a person they knew or themselves. When the responses were subjected to factor analysis—a mathematical analysis to reveal the structure of correlations between responses—a varying number of factors emerged, depending on the methods and the researchers and the questions and the subjects, and these factors were given varying names. Since the early 1990s, the Five Factor Model has been dominant, although the names of the factors vary somewhat even today. The acronym OCEAN is used for the traits: Openness to experience (sometimes called “intellect” or “imagination” or “open-mindedness”), Conscientiousness, Extraversion (sometimes called “surgency”), Agreeableness, and Neuroticism (sometimes called “negative emotionality” or “emotional stability” reversed).
Today, the five traits are measured with various survey instruments, with five questions on the shortest version (one for each aspect) and sixty questions on a common long-form version (that used by Soto, 2019). Survey instruments are validated in a number of ways: how much their responses correlate between testings (test-retest reliability, with astrological sign as the gold standard), how much different raters agree using the criteria (inter-rater reliability), and a nebulous concept of construct validity, which sometimes includes scientific gestures designed to ensure that the instrument measures what it purports to measure. Many papers present elaborate numerical artifacts of validation, and I have found that some characterize the validity of their instruments as “good” without providing an indication of what would be “not good enough.” From a brief review of dozens of validated instruments in social psychology, it seems to me that it is relatively easy to “validate” meaningless instruments. As long as the mathematical bona fides are present, the construct need not be meaningful in other ways. (The reader who is rightly suspicious of my broad and unsourced claims may wish to search Google Scholar with variations of “scale,” “inventory,” and “survey instrument,” and examine the results critically. The naming of factors is often a particularly interesting step.)
The strong claim made by advocates of the Five Factor Model is that any set of questions describing a human being, administered to subjects and the responses subjected to factor analysis, will reveal the same five factors (paraphrasing Jordan Peterson in this video, around 11:10-16:40). This strong claim, though dubious in a number of respects, is a major part of the basis for the scientific legitimacy of the Big Five. It is interesting to see which aspects of the strong claim are admitted to be false by advocates of the Big Five, and how much is excused on the grounds that at least it’s something. The Five Factor Model is not perfect, advocates grant, but it is better than nothing. It is not clear how they measure “better than nothing;” this is a potentially interesting hypothesis in need of precisification, perhaps.
The Big Five exist as a special, scientifically validated property of language and survey methods, and that is one basis for their legitimacy. The other basis for the legitimacy of the Five Factor Model is its replicable correlation with consequential life outcomes. We know that the Big Five are not merely phantoms that fall out of a certain analysis of a certain use of language in WEIRD college students, because these traits are reliably correlated with things we care about.
One of the most interesting features of the Big Five is the nature of its scientific evidence. Observe what is held out as a “replication” of the theory, and you will discover the theory’s true nature. The most impressive aspect of the ongoing accomplishment of the Five Factor Model is the degree to which it deflects curiosity about its underlying meaning with rituals of scientific validation, regardless of the rituals’ appropriateness in context. Since “replication” is the scientific ritual most recently shown to detect poor science in psychology, being shown to reliably “replicate” is a huge boost to the credibility of a theory.
The interesting thing about the Five Factor Model is what it gets away with, in terms of being considered a theory, even though it is not causal, and makes no predictions. What counts as a “replication” of the Five Factor Model, as in Soto (2019), is the following: a correlation is found between one or more factors of the Five Factor Model and some other construct, and that correlation is found again in another sample, regardless of the size of the correlation. In almost all cases, and in 100% of Soto (2019)’s measures, the construct compared to a Big Five factor is derived from an online survey instrument.
What counts as a “consequential life outcome” is also fascinating. In most cases, the life outcome constructs are vague abstractions measured with survey instruments, much like the Big Five themselves. For instance, the life outcome “Inspiration” is measured with the Inspiration Scale, which asks the subject in four ways how often and how deeply inspired they are. Amazingly, this scale correlates a little bit with Extraversion and with Open-mindedness. Do these personality traits “predict” the life outcome of inspiration? Is “Inspiration” as instrumentalized here meaningfully different from the Big Five constructs, such that this correlation is meaningful?
Compare the items for the construct “Inspiration” with the items for Extraversion and Open-mindedness used in Soto (2019):
Inspiration Scale Items
- I experience inspiration.
- Something I encounter or experience inspires me.
- I am inspired to do something.
- I feel inspired.
- 1. Is outgoing, sociable.
- 6. Has an assertive personality.
- 21. Is dominant, acts as a leader.
- 41. Is full of energy.
- 46. Is talkative.
- 56. Shows a lot of enthusiasm.
- 11. Rarely feels excited or eager.
- 16. Tends to be quiet.
- 26. Is less active than other people.
- 31. Is sometimes shy, introverted.
- 36. Finds it hard to influence people.
- 51. Prefers to have others take charge.
- 10. Is curious about many different things.
- 15. Is inventive, finds clever ways to do things.
- 20. Is fascinated by art, music, or literature.
- 35. Values art and beauty.
- 40. Is complex, a deep thinker.
- 60. Is original, comes up with new ideas.
- 5. Has few artistic interests.
- 25. Avoids intellectual, philosophical discussions
- 30. Has little creativity.
- 45. Has difficulty imagining things.
- 50. Thinks poetry and plays are boring.
- 55. Has little interest in abstract ideas.
How surprised should we be that items like “I am inspired to do things” correlate with items like “Is full of energy,” or that “I experience inspiration” negatively correlate with “Has few artistic interests”? “Replication” seems like a dignified term for asking the same questions in different ways.
One of the more surprising correlations replicated by Soto (2019) is between “Agreeableness” (Big Five) and “Heart Disease” (measured by a questionnaire about chest pain). The correlation is only .04, compared to the original .15. Even so, since the most important questions on the 1977 questionnaire ask the subject to agree with various statements about chest pain, rather than phrasing them in the negative, a tiny correlation with agreeableness would not be surprising. It is not clear what scientific hypothesis—particularly a causal hypothesis—might be advanced here. But it counts as a replication.
Soto (2019) reports that Negative Emotionality (sometimes called Neuroticism) correlates strongly with DSM-III Depression and Anxiety. It is hardly surprising that a subject answering questions about negative emotions would answer questions related to depression and anxiety similarly, since many of them are functionally the same questions. It is also interesting that a measure of supposedly stable personality traits correlates with a measure of supposedly pathological disease symptoms. Nonetheless, the correlation is treated as an important replication of the predictive power of the Big Five.
Many of the “consequential life outcomes” appear to be rephrasings, or aspects, of the personality factors they correlate with: “Existential or phenomenological concerns” correlates with Open-mindedness, “Existential well-being” and “Subjective well-being” correlate (negatively) with Negative emotionality, and “Dating variety” correlates with Extraversion. Interestingly, “Occupational performance” correlates only .03 with Conscientiousness (Soto 2019), even though the questions on the two surveys seem to overlap more than that. Survey statements like “I do a thorough job” or “I tend to be lazy” seem like they should predict job performance, and this prediction is at the core of the claims of proponents of the Big Five. But despite the inanity of the purported correlation, it is not in fact always easy to detect. Salgado’s (2002) meta-analyses, for instance, found that Conscientiousness modestly correlates with “deviant behavior” (such as theft), but does not correlate with absenteeism or accidents.
On the other hand, Big Five Conscientiousness was not found to correlate with mask wearing in a sample of thousands in Spain during the coronavirus epidemic (Barceló & Sheen, 2020). This was not treated by the authors as any kind of falsification of the Big Five, or even evidence against it. The abstract noun “conscientiousness” has a rich meaning, only part of which is captured by the Big Five, and only a tinier part of which is captured by the two-question methodology used here (“does a thorough job” and “tends to be lazy”). But Conscientiousness is often correlated to health behaviors, and is often said to predict them with various strengths, even though the questions in the survey focus on job performance and tidiness. For instance, Tiainen et al. (2013) found that Conscientiousness correlated with higher fruit intake, at least in women, in a food questionnaire study. (Extraversion was associated with a higher meat intake in women, and each personality trait was found to have some effect on some gender and food type.) Conscientiousness has been linked to the intention to use sunscreen by beachgoers in the UK, although not among beachgoers in New Zealand (Kouzes et al., 2017). Given the loose standards at play, it wouldn’t have been too surprising to discover a relationship between Conscientiousness and mask wearing, or Conscientiousness and any other allegedly health-related behavior.
The methodology of the Spain mask study was poor (asking about mask use without asking about likelihood of leaving the house at all, for example), but the quality of almost all studies in social science is poor, and it is not worse than most studies I have encountered in support of the Big Five. From a naive perspective, it is surprising that poor-quality studies with no practical power to falsify anything are still performed. But this is exactly the genius of the thing. What is interesting is the degree to which the Five Factor Model is insulated from falsification. It makes no causal predictions; others may make predictions about correlations, and these may replicate, or not. The only content of the Five Factor Model, in the sense of making claims, seems to be that the five factors are important somehow, and that their importance is specially validated by mathematical methods using survey instrument responses. That they correlate with responses on surveys ostensibly measuring other things is taken as evidence for their importance, but rarely is failure to find such evidence counted against their validity. There is an old saying that a stopped clock is right twice a day; it is not clear how often the Big Five “clock” is “right.”
As a novice to the study of the Big Five, I found that I had many misconceptions. For example, I thought that the Big Five were considered to be universal, in some sense, and not just descriptive of WEIRD college students. But in fact, it is not simply that the Big Five factors fail to fall out of the analysis of large surveys in other languages and cultures. They don’t even fall out of Big Five-specific questions administered in non-WEIRD populations (Gurven et al., 2013). Many of the concepts mentioned in the Big Five survey instruments do not even exist in non-WEIRD languages as such, particularly abstract nouns (Gurven et al., 2013). Of course a concept that does not exist across cultures could not form the basis for a universally important personality trait, as measured by language.
Second, I imagined that the Big Five were based on large and extremely inclusive sets of survey items. I admit that I hadn’t really thought about the abstract noun “personality.” What counts as personality, and what doesn’t? Apparently, the exact nature of the questions in the survey seem to matter a great deal. Saucier & Srivastava (2015) say that the appearance of the Big Five factors “is clearly contingent on one’s variable-selection procedure,” noting that multiple particularly broad question databases have failed to produce the Big Five as the top five traits. The Big Five are not a property that emerges from all sets of questions describing humans, as the strong claim would have it, but rather from a subset of questions whose nature and selection procedure is not always clear. The consequences of this are unknown.
Third, I’d thought that the Big Five were relatively stable across the life course; however, longitudinal studies of several cohorts of adults, born between 1914 and 1960, revealed that most traits changed over the life course in distinct patterns (Roberts et al., 2006), with social dominance (an aspect of extraversion), agreeableness, conscientiousness, and emotional stability increasing in adulthood and/or later life. One interpretation is that the positive traits identified by the Big Five are traits of adulthood, or reflections of having control over one’s environment, or something like that.
Finally, I’d thought that the traits themselves were orthogonal, in that they were genuinely separate traits that didn’t covary much. I’d thought that this was a major focus of the factoring process, and an aspect of the traits’ specialness and validity. However, the traits covary a great deal. Lukaszewski et al. (2017) found positive mean inter-factor correlations for every country in a large international sample, and an especially high value for Tanzania, the outlier apparently driving their result (that complex societies decrease trait covariance through specialization). Nor is it unusual to find big correlations between the factors; in a large sample of twins, Shane et al. (2010) found correlations of .39 between Extraversion and Openness to Experience, and .30 between Emotional Stability and Agreeableness. Chang et al. (2012) address the problem of the non-orthogonality of the Big Five, and report that even with a methodological correction to eliminate “methods bias,” they could not eliminate correlations between many factors, suggesting some may be “redundant.”
If the Big Five are not universal, stable, or orthogonal, what good are they? They have a perfectly clear use. They replicate: the answers to many other survey instruments can be found to correlate with the Big Five survey responses, in multiple samples of survey-takers. To complain that the Big Five are meaningless is somewhat unscientific. They have a very specific meaning within the language game they belong to, and they are popular and memetically successful tools within that sphere.
The Big Five are, in a sense, protected from falsification. They make no predictions; there is no underlying causal model. As I understand it, no study could be devised to prove that the Big Five aren’t real, because they make no formal pretense to reality. They are innocent mathematical constructs that fall out of particular survey instruments administered to particular populations.
Critics of the Big Five are almost always proponents of personality factor models with other numbers attached. Few criticize the connection between the survey instruments and the underlying reality. Consider each survey question. What does it ask the respondent for, other than “a muddle of vague memories of inexact observations,” as Galton put it? Note how every single question may be responded to with “compared to what?” Context, and not just what we might narrowly think of as personality, is relevant in 100% of the questions. Certainly the questions measure something; it is not at all clear what that something is, and the nature of that something is rarely investigated.
Heene (2013) puts the matter rather strongly, in regard to psychological methods that allow the researcher to slip out of the reach of falsification:
[T]he tools of mainstream psychology such as [Structural Equation Modeling] and [Item Response Theory] make exactly these strong assumptions about the quantitative structure of psychological attributes. But avoiding any tests of quantitative measurement but applying methods making the assumption of quantity appears to be nothing more than a self-delusion that one bears something valuable instead of being in fact empty-handed. This all too strong tendency to avoid falsification is probably deeply rooted in the scientifically unhealthy political/economical aspiration of psychology which keeps the machine for paper-producing and grant-funding well-oiled but also leading to a severe publication bias….[T]he possibly best evidence of my claims comes from a logical argument: has anyone ever seen articles using SEM, IRT, or Rasch models in which the author admitted the falsification of his/her hypotheses? On the contrary, it appears that stringent model tests are mostly carefully avoided in favor of insensitive “goodness-of-fit indices.” (Citations omitted.)
Holtz & Monnerjahn (2017), reviewing psychology textbooks, conclude that Karl Popper’s ideas of falsification have had “little to no traceable influence on the epistemology and practice of social psychology.” Given social psychology’s newfound conversion to the religion of replication, a theory devoid of causal claims, and therefore predictions, is the perfect theory: it can never be falsified.
But the true power of the Five Factor Model is not just its power to replicate, but its power to bless the measurement of any correlation (or lack thereof) between itself and any construct as valid science. For example, Asselmann & Specht (2020) measured the Big Five traits of subjects multiple times in a longitudinal study, some of whom had children during the study period, and some of whom remained childless. They predicted, based on previous studies, that three traits would change with parenthood, or predict parenthood: conscientiousness, agreeableness, and emotional stability. In fact, they found that exactly the other two traits (extraversion and openness) correlated with parenthood: more extraverted, less open people selected into parenting by a tiny amount, and people became less open and less extraverted after becoming parents. But the authors conclude their abstract triumphantly:
Taken together, our findings suggest that the Big Five personality traits differ before and across the transition to parenthood and that these differences especially apply to openness and extraversion.
The Big Five survey instrument doesn’t seem to measure the same changes from study to study, but this is taken as support for the results of the latest study. The underlying construct is not questioned. Simply measure the Big Five and report how they correlate with literally anything else, or even with themselves at different points in time, and you’ve performed socially valid science, regardless of your hypothesis and your results. Such is the power of the Big Five.
Is this “better than nothing”? For now, the “better than nothing” of the Big Five is its ongoing use in publishing psychology papers. It is certainly better than nothing for psychology researchers who want to publish research using cheap methods that no one will question (except rude bananas on the internet). But from an outside perspective, the perspective of one who might hope to get knowledge about the world from science, an unfalsifiable theory using methods that evade falsification becoming the dominant paradigm is hardly “better than nothing.”
Those who are skeptical of the Enneagram are usually Type 6, and those who are skeptical of astrology are usually Tauruses. Similarly, those who criticize the Big Five are typically low on extroversion, high on conscientiousness, low on agreeableness, and high on neuroticism (openness to experience can go either way, I suppose). From the perspective of the social psychology of personality, this essay is a glowing partial replication of the Big Five!
I predict that the Five Factor Model of personality has many years of life in it yet. Perhaps it will endure for decades, producing replicable findings of dubious significance, and meanwhile crowding out creative and meaningful research into how people work. Perhaps the most important and surprising accomplishment of the Five Factor Model is hiding the fact that such research is not taking place within the field of social psychology.
Allport, G. W., & Odbert, H. S. (1936). Trait-names: A psycho-lexical study. Psychological monographs, 47(1), i.
Asselmann, E., & Specht, J. (2020). Testing the Social Investment Principle Around Childbirth: Little Evidence for Personality Maturation Before and After Becoming a Parent. European Journal of Personality.
Barceló, J., & Sheen, G. (2020). Voluntary adoption of social welfare-enhancing behavior: Mask-wearing in Spain during the COVID-19 outbreak. Preprint at OSF https://osf.io/preprints/socarxiv/6m85q/.
Chang, L., Connelly, B. S., & Geeza, A. A. (2012). Separating method factors and higher order traits of the Big Five: A meta-analytic multitrait–multimethod approach. Journal of personality and social psychology, 102(2), 408.
Galton, F. (1884). Measurement of character. Fortnightly, 36(212), 179-185.
Gurven, M., Von Rueden, C., Massenkoff, M., Kaplan, H., & Lero Vie, M. (2013). How universal is the Big Five? Testing the five-factor model of personality variation among forager–farmers in the Bolivian Amazon. Journal of personality and social psychology, 104(2), 354.
Heene, M. (2013). Additive conjoint measurement and the resistance toward falsifiability in psychology. Frontiers in psychology, 4, 246.
Holtz, P., & Monnerjahn, P. (2017). Falsificationism is not just ‘potential’ falsifiability, but requires ‘actual’ falsification: Social psychology, critical rationalism, and progress in science. Journal for the Theory of Social Behaviour, 47(3), 348-362.
Kouzes, E., Thompson, C., Herington, C., & Helzer, L. (2017). Peer Reviewed: Sun Smart Schools Nevada: Increasing Knowledge Among School Children About Ultraviolet Radiation. Preventing Chronic Disease, 14.
Lukaszewski, A. W., Gurven, M., von Rueden, C. R., & Schmitt, D. P. (2017). What explains personality covariation? A test of the socioecological complexity hypothesis. Social Psychological and Personality Science, 8(8), 943-952.
Roberts, B. W., Walton, K. E., & Viechtbauer, W. (2006). Patterns of mean-level change in personality traits across the life course: a meta-analysis of longitudinal studies. Psychological bulletin, 132(1), 1.
Saucier, G., & Srivastava, S. (2015). What makes a good structural model of personality? Evaluating the big five and alternatives. In APA handbook of personality and social psychology, Volume 4: Personality processes and individual differences. (pp. 283-305). American Psychological Association.
Salgado, J. F. (2002). The Big Five personality dimensions and counterproductive behaviors. International journal of selection and assessment, 10(1‐2), 117-125.
Shane, S., Nicolaou, N., Cherkas, L., & Spector, T. D. (2010). Genetics, the Big Five, and the tendency to be self-employed. Journal of Applied Psychology, 95(6), 1154.
Soto, C. J. (2019). How replicable are links between personality traits and consequential life outcomes? The Life Outcomes of Personality Replication Project. Psychological Science, 30(5), 711-727.
Tiainen, A. M. K., Männistö, S., Lahti, M., Blomstedt, P. A., Lahti, J., Perälä, M. M., … & Eriksson, J. G. (2013). Personality and dietary intake–findings in the Helsinki Birth Cohort Study. PloS one, 8(7), e68284.
18 thoughts on “The Ongoing Accomplishment of the Big Five”
I love this essay. I am a longtime skeptic of personality testing instruments. They are sort of germane to my work insofar as I do a great deal of executive coaching. And, through no fault of me or others on my professional world, our work seems to be viewed as adjacent to clinical psychology. It is not and I am not even remotely in the psychology profession.
Anyway, with respect to personality testing, and its ubiquitous use in organizational design, HR, training and development, etc., I have been arguing for years that they may as well just use astrological charts for all of their accuracy or utility.
So, imagine my delight at this sentence: “Those who are skeptical of the Enneagram are usually Type 6, and those who are skeptical of astrology are usually Tauruses.”
A terminological note: You seem to treat personality psychology as a subset of social psychology, but while these two disciplines are sometimes organizationally intertwined, their approaches to understanding human behavior are quite distinct. In their most puristic forms, of course, social psychology is all about situations, with no intrapersonal stability, while personality psychology is all about traits that are stable regardless of situation. Methodologically, the two are quite different as well: social psychology is generally experimental, while personality psychology is correlational. The results of personality psychology are also much more replicable than those of social psychology (despite your cherry-picked examples). Personality psychology is concerned with measurement, while social psychology pays little attention to it.
You also claim that personality psychology is crowding out “creative and meaningful research into how people work”, but what exactly would that other research be? Hopefully not something like conjoint measurement theory–given that conjoint measurement has contributed absolute nothing to understanding human behavior in the 50+ years of its existence, someone like Heene is in no position to criticize mainstream psychologists for getting paid for doing frivolous things!
“The results of personality psychology are also much more replicable than those of social psychology (despite your cherry-picked examples). ”
The writer specifically addressed this “replicability.” Did you read the piece or just skim it and get shitter shattered?
I enjoyed this piece! One thing that will further accelerate the use of BFAS is the adoption of IBM Watson‘s text analysis API, which predicts big five from text. The accuracy and validity of this technique remain questionable, but I found researchers tend to ignore this and trust IBM’s brand.
Some specific responses to the section on Soto’s study:
“What counts as a “replication” of the Five Factor Model, as in Soto (2019), is the following: a correlation is found between one or more factors of the Five Factor Model and some other construct, and that correlation is found again in another sample, regardless of the size of the correlation”
This is incorrect. Replication was defined by finding a significant correlation, not just “a correlation”. And a key focus of the paper was to understand the relationship between the size of the original correlation and the replication, hence one of the key measures was the ratio of effect size of the original study with the replication to allow a more nuanced non-binary examination of replication (i.e. magnitude was not irrelevant).
“One of the more surprising correlations replicated by Soto (2019) is between “Agreeableness” (Big Five) and “Heart Disease” (measured by a questionnaire about chest pain). The correlation is only .04, compared to the original .15. …But it counts as a replication.”
This is incorrect. This was counted as a failure to replicate as the correlation of .04 was not significant.
“In almost all cases, and in 100% of Soto (2019)’s measures, the construct compared to a Big Five factor is derived from an online survey instrument….In most cases, the life outcome constructs are vague abstractions measured with survey instruments”.
“In almost all cases” is a qualitative judgement that isn’t substantiated with evidence to support this view. As noted in the discussion section of Soto, there are other ways to measure life outcomes. While self-report measures are indeed popular, many studies use these alternative measures to show correlations of the big five with outcomes that might fit the “life outcome” characterisation better. A quick google scholar search should show that this statement of “almost all” is not a reasonable characterisation of the literature (such as longitudinal studies showing big five correlations life outcomes such as health outcomes, relationship status, educational attainment, employment status, income etc…). “In most cases” is also a qualitative judgement. While harder to simply dismiss, it is not very meaningful without a more precise scope for “cases”, and the argument would lose its force if the “most” was more like 60% than “almost all”.
As an addict of surveys – I’ve loved filling out forms since I was a child! – I’ve always considered these kinds of tests as similar to reading the horoscope. As an Aquarius I always get a very positive send-off every time! Although once I took a personality test from an herbal tea company and found a great combination for when I’m feeling queasy. (Full Disclosure: I bought the herbs from another company.) “Compared to what?” is the best way to look at surveys and, in my opinion, to everything Social Psychology can dish up. I’ve always felt sorry for the college students that need the money (or credits) and have to participate in those “experiments,” some of which seem pretty cruel. I’ve always wanted to run a study on “What are the lasting effects on undergraduates who had to participate in nasty social psychology experiments?”
I no longer remember how I stumbled on your blog (maybe Vox?) but I’m enjoying the ride!
LikeLiked by 1 person
“If the Big Five are not universal, stable, or orthogonal, what good are they? They have a perfectly clear use. They replicate: the answers to many other survey instruments can be found to correlate with the Big Five survey responses, in multiple samples of survey-takers”
The wording of “not” here is an absolute statement imposing a dualistic standard on a probabilistic world. However, the reason why The Big Five have become so popular in personality psychology is because they are, to a degree, universal, stable, orthogonal, and replicable, though it seems not to the extent the author appreciated before writing this blog post. Quantifying exactly how universal, stable, orthogonal, and replicable they are in a topic of ongoing research and much debate, as the research picture is complex and messy, which is often the case with science. That people (inside or outside of the field) might oversell the success of the The Big Five and present simplistic, over confident or outdated summaries is not an inherent problem for the Big Five.
“The Five Factor Model is not perfect, advocates grant, but it is better than nothing… It is not clear how they measure “better than nothing”.
This “better than nothing” claim doesn’t seem like it would come out of the mouth of a psychologist. Typically scientific consensus stems from inference to the best explanation. It is not a case of better than “nothing”, but better than other competing theories that seek to explain the data (which would be rarely be “no theory”). So, for example, due to the extent of the evidence of lack of orthogonality, alternatives to the Big Five have emerged (e.g. Digman, 1997), of higher order models that cluster the Big Five into super-ordinate clusters, culminating in proponents of single factor model (e.g. Davies et al., 2015). This is all situated through research which involves close examination of the survey (and its methodological limitations) and underlying reality (though you suggest that “few” ever do this), to help best explain the data.
As noted above, focusing your argument on just the correlations with other survey instruments ignores all the other research not involving surveys, such as genetic studies, correlation with life outcomes (as described above), behavioural studies, neuroscience and so on. However, if you want to make the argument there is too much reliance on “cheap” self-report measures in personality psychology (along with low powered studies), and this is a problem that the field needs to address, then I think you will find widespread agreement from this inside the field.
LikeLiked by 1 person
And a final post from me. While there is much I disagree with in this essay, I have tried to highlight some key critical points which I think are valid.
“From a brief review of dozens of validated instruments in social psychology, it seems to me that it is relatively easy to “validate” meaningless instruments. ”
This may have some truth to it (though “meaningless” is too strong a word) – see this article on the so called “validation crisis”: https://replicationindex.com/2019/02/16/the-validation-crisis-in-psychology
“The Big Five are, in a sense, protected from falsification. They make no predictions; there is no underlying causal model. As I understand it, no study could be devised to prove that the Big Five aren’t real, because they make no formal pretense to reality. They are innocent mathematical constructs that fall out of particular survey instruments administered to particular populations…
Critics of the Big Five are almost always proponents of personality factor models with other numbers attached. Few criticize the connection between the survey instruments and the underlying reality…
Certainly the questions measure something; it is not at all clear what that something is, and the nature of that something is rarely investigated…”
The “Big Five” don’t make a pretence to reality, because that is done by the minds of human researchers. The Big Five, as a description of factor loadings in a Exploratory Factor Analysis, do not provide a causal model. However, researchers can and do posit causal models that underline the “mathematical constructs”, for an illustrative example, that differences in dopamine systems in the brain (as a hidden causal variable) might help explain variance associated with openness and extraversion factor loadings. The most typical approach is that personality traits, as “psychological constructs”, cause the correlations in behaviour that are identified by the mathematical model. These constructs are “latent variables” which do the important explanatory work. And the factor approach, in principle, can be falsified – or at least replaced with better theories and models.
Now, for those adopting the latent variable approach, then it is probably fair to say it isn’t always clear what they are and exactly how they work, or even if they exist at all. Figuring all this out is an ongoing enterprise, and psychology is still a relatively young field of science so there is much to understand. But I don’t know what these investigations that are only “rarely” done would look like. Do you have any examples of these rare cases? Of course researchers in these fields want to understand and investigate what it is being measured, and come up with good causal theories in particular (because these are high status in science). But psychology is hard, because unobservable latent constructs, are, well, unobservable.
The general point of criticising the field for lack of acknowledgement of the link between survey instruments and underlying reality seems valid. As argued by Borsboom (2006; “The attack of the psychometricians”), psychologists are quite prone to operationalism which corresponds with the dominance of classical test theory.
Now, in response to your claim of “almost always”, there are alternative models which aren’t based on formative models, such as reflective models and network approaches (see for example, https://www.researchgate.net/profile/Rene_Mttus/publication/312341252_Why_do_traits_come_together_The_underlying_trait_and_network_approaches/links/5afd5537a6fdcc3a5a2887e6/Why-do-traits-come-together-The-underlying-trait-and-network-approaches.pdf or https://www.researchgate.net/publication/221705059_Deconstructing_the_construct_A_network_perspective_on_psychological_phenomena).
To be fair, the factor based latent variable models (of which the Big Five rules supreme) are the most common standard in personality psychology, but viable alternatives do exist which don’t just posit another factor analytic solution, and I believe are growing rapidly in popularity.
For a final meta-point, the critique here incorporates scientific evaluative principles, such as strength of evidence, theoretical power and falsifiability, but at times it adopts a dismissive tone which appears anti-science, such as as what strikes me as a condescending attitude towards replication as part of scientific progress.
This appears to me a confused stance, and so I have scepticism of the concluding claim that “creative and meaningful research into how people work…is not taking place within the field of social psychology.” No clues are given on what this research might look like or who is carrying it out (I am guessing you might be one of these researchers?). If this research is outside of personality psychology, I would assume then it isn’t well characterised as being scientific. If so, by all means you can apply your own personal evaluative norms to determine that this mysterious research is more “creative” and “meaningful” for you than your understanding of the scientific research you target in your essay. But it seems then that to criticise The Big Five as a scientific theory for not being scientific enough functions more as a rhetorical strategy in this essay, and therefore isn’t to be taken too seriously.
The author may be aware of this, but for readers interested in further critiques of the Big Five, wikipedia has a nice summary section of criticisms https://en.wikipedia.org/wiki/Big_Five_personality_traits which points to futher literature. Some of the points made, like the idea that the Big Five might crowd out other personality research, have been around a long time, as far back as 25 years ago (e.g. Block, 1995, referencing in the wikipedia article).
LikeLiked by 1 person
So what do you think of the Enneagram? I would love to see an article like this on the Enneagram.
Curious where this specific reference came from “Survey instruments are validated in a number of ways: how much their responses correlate between testings (test-retest reliability, with astrological sign as the gold standard), ” Are they using their astrological sign to validate person’s consistency or how a person identifies to an astrological sign?