Finding the job of your dreams, researching company information, and finally sending off the application – all this often happens via smartphone these days. No wonder that mobile-optimized selection procedures are now also becoming increasingly important in personnel selection (Nikou & Economides, 2018). Game-based assessments in particular are becoming increasingly popular.
Game-based assessments are playful prepared psychological test procedures. The idea behind these test procedures is that results in psychometrically developed mini-games allow conclusions to be drawn about cognitive, social and personality-based characteristics of applicants. Scientific studies suggest that this is indeed possible (e.g. Brown et al. 2014). Similarly, more and more large companies, such as LinkedIn, Tesla, McKinsey or Deloitte, are relying on this new method of personnel selection.
This sounds great so far, but are the procedures really suitable for distinguishing between suitable and unsuitable applicants? This question is currently on the minds of many HR managers. Often, they are still rather skeptical about the procedures. Justified skepticism or missing a trend? Time to take a closer look at these methods with reference to central scientific test quality criteria. Only if these criteria are fulfilled, it is possible to speak of a scientifically sound test procedure (Kubinger, 2019).
We will have a closer look at the three main quality criteria of classical test theory (objectivity, reliability, validity) and two important secondary quality criteria (fairness, economy).
Objectivity of a test procedure exists when different recruiters come to the same assessment of an applicant. Risks for objectivity occur, especially in the classic job interview. This means that the judgments of HR managers can be influenced by a wide variety of aspects, such as perceived liking for the applicant. This results in a distorted picture of the applicant’s abilities and personality qualities – the procedure is not objective. This is where one of the central advantages of game-based assessments comes into play. This is because technology-based execution and automated test evaluation, which make use of the latest findings in the field of machine learning, can reduce numerous sources of error – increasing the objectivity of the procedure. A disadvantage of game-based assessments, on the other hand, is that there is no certainty whether the person playing the game is actually the applicant and not, for example, another person (also referred to as “impersonation”). However, this problem exists in all online selection processes and is one of the reasons why game-based assessments have so far been used primarily in applicant pre-selection.
Another important quality criterion is the reliability of a test procedure. This describes the extent to which a test measures a characteristic (e.g., a cognitive ability or a personality trait) accurately (i.e., without measurement error). This is usually more successful when more data is available about this person. This can be demonstrated with a simple example: Imagine you are performing a concentration test after a long, stressful day at work. Additionally, you will be disturbed by the loud music of your neighbors. Consequently, the test result will probably not reflect your actual ability to concentrate (so-called “traits”), but rather represent a situational and temporary measurement of it (so-called “state”, cf. Fleeson 2001). Because for sure you will perform worse than on a day when you are slept in and undisturbed. However, if your concentration ability is recorded on different days, such random measurement errors will increasingly cancel each other out. This is also called the central limit theorem in statistics. This is exactly where the idea of game-based assessments comes in. Instead of inferring the applicant’s concentration ability from a single test result, the results of different game runs are stored and averaged. Consequently, a more accurate estimate of the applicant’s actual concentration ability (trait) results.
However, a method should of course not only measure accurately, but also the right thing (validity). In personnel selection, one is primarily interested in predictive validity – the test should therefore predict a certain variable as accurately as possible. A commonly used variable is the applicant’s future job performance. Similar to reliability and objectivity, several sources of errors exist that can negatively impact predictive validity. On the one hand, there are response tendencies of the applicants. Social desirability is a particular problem in the application of classical test procedures. This refers to the tendency of applicants to consciously select answers that present them in a positive light. This is especially important if the goal of the test is easy to see through. And this is often the case, especially in personality diagnostics. As an example, let’s look at a statement from the Big Five, one of the well-known procedures in personality diagnostics (Asendorpf & Neyer, 2012). Applicants are asked to express their agreement with the following statement:
I see myself as someone who is reliable and conscientious.
It is obvious that few people would deny such a statement in the application process for their dream job. It is questionable whether the item is suitable for distinguishing between conscientious and non-conscientious applicants.
But applicants do not always actively try to manipulate the results. Often, they just have difficulties assessing their own personality traits, strengths and weaknesses. After all, what does being conscientious or extroverted even mean? And how conscientious or extroverted am I actually? This is often not so easy to answer, and is also referred in psychology as a lack of ability to introspect. To simplify the question, applicants often make a comparison to people in the environment. This primarily answers the question: How conscientious or extroverted am I compared to the surrounding people? A number of scientific studies show that this change in the question often leads to bias (e.g., Schwarz, 1999).
Game-based assessments avoid exactly this problem. Instead of relying only on the applicant’s self-disclosure, the mini-games additionally capture behavioral nuances of the applicants. For example, applicants’ preference for speed versus accuracy is observed in each game. The behavioral nuances captured in this way are then used to supplement the applicant’s error-prone self-report with objective data. An optimal weighting of the data is achieved with the help of intelligent self-learning algorithms. Capturing actual behavior instead of relying only on the applicant’s statement-sounds logical, right? And also scientific research shows that this approach leads to more valid results than a pure self-report in the collection of many characteristics (e.g. Baumeister et al., 2007).
Another important quality criterion is test fairness. No group should be systematically disadvantaged in a testing procedure (e.g., based on gender or ethnic background). However, this is the case with many test procedures, as questions, for example, are geared toward Western cultural groups (Camilli, 2006). Gender, ethnic background and skin color, on the other hand, do not play a role in the test results of the largely language-free mini-games. There is also no scientific evidence to date for the sometimes expressed concern that people with gaming experience might have advantages in processing. In this case, aptitude diagnostics procedures only assess the factors that are relevant for job success and leave out irrelevant characteristics such as gender or social origin. This creates more fairness and equal opportunities.
The last criterion considered, test economy, was already briefly addressed at the beginning. Selection procedures should be characterized by a short duration, low costs and little effort for the applicant. Game-based assessments follow the idea of a “zero-footprint” measurement – a burden for the applicant no longer exists – instead, the games are often really fun. Since the individual tasks can be individually adapted to the applicant, with the help of intelligent algorithms (this is also referred to as adaptive testing), the games never get boring. However, not only applicants benefit, but also companies. The cost and time saving potentials compared to other test procedures (e.g. assessment centers) are enormous. This is especially relevant in the case of very heterogeneous and international target groups.
The observation shows that the modern procedures can definitely compete with established psychometric procedures (cf. table). Because fun, measurement accuracy, validity and fairness – these factors do not have to be mutually exclusive. In combination, they contribute to personnel selection that is based on scientific standards and meets the changing needs of a new target group.
However, implementing all test quality criteria in practice is not always easy. Thus, many psychometric games existing on the market do not meet the mentioned quality criteria (König et al., 2010). Consequently, the procedures should be critically examined before they are applied as part of a professional personnel selection. DIN 33430 on occupational aptitude diagnostics provides an important frame of reference in this context.
Werde zum HeRo und erkenne mit Aivy wie gut Bewerbende zu euch passen – und das noch vor dem ersten Bewerbungsgespräch...Ausprobieren Beraten lassen