Reliability and Validity of the PI Behavioral Assessment

AJ Cheponis
Feb 1, 2022
6 min read

With personality assessments becoming commonplace in today's world, I'm frequently asked what the difference is with the tools we use. For starters, we use tools that are designed specifically for the workplace, and these tools have unprecedented scientific validation behind them.

99% of the time, we use the tools from The Predictive Index. "PI" has an extensive scientific background going back nearly 70 years. They have conducted over 400 client validation studies that support the PI Behavioral Assessment's ability to predict job performance across jobs, industries, and countries. They adhere to the standards set forth in The Standards for Educational and Psychological Testing. In addition, the PI Behavioral Assessment achieved certification with the European Federation of Psychologists' Associations (EFPA) in 2018, which involved the analysis and presentation of data regarding the assessment's reliability, validity, and fairness.

What is reliability? What reliability evidence does PI have for their Behavioral Assessment?

Reliability refers to the precision of the scores and their consistency across time. Reliability can be established in various ways, such as test-retest reliability and internal consistency. Strong test-retest reliability indicates that participants' scores remain relatively stable over time. Internal consistency indicates that the questions on an assessment consistently capture the same construct, or "hang together." The Predictive Index Science team has conducted numerous test-retest reliability studies on the PI Behavioral Assessment that show that the results of the PI Behavioral Assessment are stable enough over time to support the assessment's use cases. In 2017 they conducted a comprehensive reliability study, covering samples retesting out to eight years. This study demonstrated that the test-retest stability of the PI Behavioral Assessment generally outperforms Big Five personality assessments after a 4-year interval (which happens to be the median job tenure in the United States).

Does personality change over time?

There is research to show that people's personalities sometimes change very slowly as they grow older, which makes sense—who we are when we are 18 is not exactly who we are when we are 40. This is a general trend that would affect any personality assessment. In general, results from the PI Behavioral Assessment are expected to be stable enough to support decisions that span multiple years. Specifically, our test-retest analyses show that the results remain reasonably stable for up to 6-8 years. This means that results will remain reasonably stable for decisions or inferences we make that pertain to the next 6-8 years (the median job tenure in the U.S. is about 4 years).

Generally speaking, we advise clients to stick with the Self results of the first assessment administration unless there are extenuating circumstances that affected the first result (e.g., the participant did not take the assessment in their preferred language). Self-Concept can be administered more frequently. Further, we recommend re-administering the Behavioral Assessment after 6-8 years if a high-stakes decision needs to be made about an individual.

Can times of stress affect Behavioral results?

Distractions and extreme extenuating circumstances can affect someone's Behavioral results, as they could with any personality or behavioral assessment. With that being said, the Behavioral Assessment has been extensively validated on samples taking the assessment in potentially distracting situations.

In terms of how scores on the assessment might be affected (i.e., in ways related to anxiety), there is very little scientific literature to inform an answer to this question. It is reasonable to assume that anxiety can impact results, but it is unclear as to how results will be impacted. We hypothesize that Self-Concept scores might be subject to more rapid change as they are reflective of more recent circumstances.

How are errors minimized in PI Assessments?

PI minimizes error as measured by Standard Error of Measurement (SEM) by ensuring that assessments perform reliably, that confusing irrelevant or biased items are removed, and that the assessments are accessible and easy to complete. They remove sources of potential error in the assessment and administration that might impact a participant's score. This is achieved by using simple assessment formats, carefully field-testing instruments before they are used in the field, vigilantly monitoring for statistical bias, accurately translating assessment content, and frequently monitoring the statistics behind the assessment. By taking these precautions, PI creates assessments that can be trusted to report scores that are as accurate as possible for your workplace applications.

Validity

Validity refers to evidence supporting interpretations and use cases for an assessment. The PI Science Team engages in continuous improvement for their validation. Throughout the body of their validity research, they have found that the PI Behavioral Assessment is linked to multiple work outcomes, which has helped to establish criterion-related validity. Further, they have researched how the Behavioral Assessment is linked to theoretically related (and unrelated) dimensions, allowing them to establish convergent (and discriminant) validity.

Performance

Since 1992, they've conducted approximately 400 validity studies. In 94% of those studies, they've found that scores on the Behavioral Assessment were significantly associated with various measurements of job performance. The average criterion validity coefficient in the significant tests was r=0.30. These studies were conducted across 111 unique job roles in 11 different industries using assessment scores and performance data from more than 25,000 working adults, demonstrating the flexibility of the instrument for a variety of roles.

Tenure

While tenure can be difficult to capture (based on insufficient sample sizes and it being a broad construct), PI has conducted approximately 194 validity studies that found significant relationships between PI Behavioral Assessment Factor scores and tenure. Generally speaking, longer tenure is associated with lower Extraversion (B), higher Patience (C), and higher Formality (D), although this will differ by company and job role.

The “Dark Side” of Personality

There are several commercially available assessments that consider “the dark side” of personality (i.e., risk factors and derailers). It is important to keep in mind that many of them are simply overused strengths; they are not necessarily “negative” in and of themselves but can pose a problem when they appear very consistently or strongly (or in some cases, both). Although the Behavioral Assessment does not directly measure negative aspects of personality, it can be interpreted similarly to “dark side” assessments from the overused strengths perspective. For example, consider an individual who is high Dominance in a role that calls for low Dominance. This individual might have difficulty adjusting to the demands of the role depending on their ability to “stretch.” In addition, during times of stress or unrest, it is possible that individuals will rely more heavily on their behavioral strengths, leaning into them to the point that they might become problematic. While we don’t measure the dark side directly, it is straightforward to examine where there might be overused strengths in an individual’s Behavioral Assessment factor pattern.

Talent Development

The relevance of the PI Behavioral Assessment to talent development is built into its design, with the selection of four work-related behavioral factors that are used to drive inferences about a person’s drives and motivations. The assessment reports provide interpretive links to specific workplace behaviors, such as communication style, risk tolerance, connecting with others, and more. To verify that these tools are having the intended benefits for clients, PI tracks a variety of feedback measures, both from users and assessment takers. For example, of the clients using the PI Behavioral Assessment for employee development in a 2017 study, 559 clients (81% of those responding) agreed or strongly agreed that PI’s tools have helped them develop better employees. PI also tracks how well the reports resonate with assessment takers and users, with 2018 data showing that 87% of people agree or strongly agree with the interpretive text provided in their reports.

The Behavioral Assessment is related to many other criteria that are not detailed here. For further detail, feel free to reach out to us.

The “PI” Behavioral Assessment and Other Assessments

Some assessment providers may purport to measure more factors than the PI Behavioral Assessment. The PI Behavioral Assessment likely doesn’t measure these factors because it was designed to measure factors representative of behavior one can observe on the job.

For example, emotional stability (sometimes called neuroticism) is becoming an increasingly controversial trait to measure for workplace purposes. The PI Behavioral Assessment factors have been established to be comprehensive and job-relevant, as well as correlated with job performance. It might seem that those assessments measuring more factors or traits are more comprehensive, more valid, and/or more predictive than then our tools, but more factors do not necessarily result in a better assessment. Additional factors measured are often redundant with other factors, not work- or job-related, less predictive, or generally superfluous to the assessment overall. Furthermore, our streamlined approach to measuring workplace behavior results in a fast and easy testing experience for the candidate with similar predictive power as assessments that are longer and measure more factors.