1. Introduction
This Monthly Report presents the method developed by the CIES Football Observatory to determine the technical profile of footballers, ranking them in different groups and establishing hierarchies within the latter. To do this, we have based ourselves on the data referring to the technical gestures undertaken by players (shots, crosses, interceptions, etc.) produced by our partners from InStat.
The analysis was carried out on a sample of 7,215 footballers having played at least 750 domestic league minutes during the 2021 or 2021/22 seasons (up until the end of January 2022) within 36 top or second division leagues from UEFA member associations. Due to the specificity of their position, goalkeepers have not been included in the study.
Figure 1: leagues taken into account and number of players analysed
2. Variables and classification axes
Among the very numerous indicators collected by InStat, eleven were selected to determine players’ technical profile. They refer to both defensive and offensive actions. The selection was carried out to limit redundancies and eliminating variables that focus too closely on a few individuals due to the low number of actions. For example, shots were preferred to goals, the first being closely correlated to the latter, while more spread out among different players.
So as to determine technical profiles independently of the level and style of play of employer clubs, the values attributed to players for each of the eleven variables selected were defined by referring to the average value of the other team members, i.e. as a ratio between the player’s value and that of teammates (excluding goalkeepers).
Figure 2: game indicator selected for profiling
For example, a value equal to two on the levels of shots indicates that the player has shot twice as much as his partners. In this manner, a footballer playing in a team at a low level does not see its values structurally diminished in comparison to players of more competitive teams. Thus, we can actually analyse a player’s game profile rather than a performance that is strongly linked to the team’s overall strength.
From these relativise values, we have performed a Principal Component Analysis (PCA). The results are expressed visually in Figure 3 in the form of a factorial display with the eleven variables selected represented by arrows. The longer the arrow and the closer it is to an axis, the more the variable in question is important in the definition of the latter.
Figure 3: the factorial design for player profiling
The variable "interception" is strongly involved in the formation of the horizontal axis, as are the shots towards his opponent. This axis therefore defines the defensive or offensive tendencies of players. The two variables that are the most telling from the point of view of the vertical axis are crosses, that are especially the province of wingers, and aerial duels, that are principally the work of centre forwards and centre backs. This axis tends thus to refer to the different positioning of players in the same area of play (defence, midfield, or attack).
The two principal axes explain 70% of the total variance, the defensive-offensive axis by itself explaining almost half of the latter. This signifies that the eleven values selected, as well as their relativisation with regard to teammates, allows us to account to a large extent for the differences in the technical profiles of footballers.
3. Player proximity
This profiling method permits the calculation of distances between players. Using a reference footballer, it is possible to determine the players who are statistically the closest to him. For example, among big-5 league footballers, the closest footballer to Kylian Mbappé from the perspective of the technical actions performed is James Maddison (Leicester City). If we restrict the analysis to French Ligue 1 players, Stephy Mavididi (Montpellier HSC) is closest to the world champion.
Figure 4: five big-5 and Ligue 1 players with a technical profile most similar to Kylian Mbappé
This exercise can be carried out by using any player as a reference. For example, always at big-5 league level, Dušan Vlahović is the player who is closest to Erling Haaland, Paulo Dybala to Lionel Messi, Romain Faivre to Neymar Júnior, Dominik Szoboszlai to Kevin de Bruyne, Sadio Mané to Raheem Sterling, Remo Freuler to Jorginho Frello or Jonathan Tah to Virgil van Dijk.
4. Player classification
Aside from the calculation of statistical distances between players, the k-medoids algorithm allows us to rank them in groups. This method, derived from k-means, is based on the choice of reference players that serve as archetypes for the elaboration of groups to which all individuals will be attributed through successive statistical proximity. Six reference players with different positions and profiles were selected for this report.
Figure 5: player archetypes used in the elaboration of groups
Figure 6 illustrates the dominant technical gestures for each of the classes constructed from the six reference players selected. For example, players from the class based on Virgil van Dijk, win, on average, 2.62 defensive aerial duels more than their teammates, those of the class based on Marcos Alonso cross 2.62 more times, and so on.
Figure 6: relative average frequency of technical gestures by class (with respect to teammates)
Virgil van Dijk Class
The Van Dijk class regroups players whose values at the level of all defensive variables are above those of their teammates. They are principally centre backs who set themselves apart by their strong presence in duels, both on the ground and in the air. The class accounts for 21.7% of the players of our sample.
Marcos Alonso Class
The Marcos Alonso class also identifies players with a defensive-oriented vocation. However, they are also very active offensively with regard to crosses. They are mainly full backs or wing backs within a "3-5-2” tactical formation. This class regroups 15.7% of the footballers taken into account.
Jorginho Frello Class
The Jorginho Frello class also brings together players with a defensive vocation, but who are also relatively active offensively. They are generally central midfielders, both defending and box-to-box. This class is not only the one which regroups the most players, 26.5% of the total sample analysed, but also the most heterogeneous.
Bruno Fernandes Class
The Bruno Fernandes class regroups players who are more active in attack than in defence. Their speciality resides in the ability to create opportunities for teammates, as well as in their importance in animating attack generally (dribbles, shooting, crosses, passes, etc.). This class is the least numerous: it regroups only 8.6% of players.
Raheem Sterling Class
The Raheem Sterling class identifies players with a similar profile to that of Bruno Fernandes, but differentiating themselves by a greater propensity for shots and dribbles, and a lower propensity for key passes and crosses. This class is quite numerous as it accounts for 18.0% of the players of our sample.
Romelu Lukaku Class
Finally, the Romelu Lukaku class picks out, above all, footballers playing as centre forwards. Players in this category are rarely in the thick of the action and concentrate their efforts on two speciality areas: finishing and offensive aerial duels. This class makes up 9.8% of footballers included in the analysis.
5. A tiering of payers according to the profiles defined
Any tiering of players comes up not only against the very often underestimated problem of the impact on individual performance of the collective strength’s differential between opponents, but also against the difficulty of establishing groups of footballers with a style of play sufficiently close for a comparison to make sense.
The relativisation of performance indicators in comparison to teammates and the creation of player archetypes based on reference footballers, are both a means of limiting these problems without, however, resolving them completely. Indeed, any class, as homogenous as it may be, always contains margins where players with an atypical or inter-class profile are situated.
One solution resides in increasing the number of classes in comparison to the initial scheme. This can be done by determining, through the statistical tool of the silhouette value, the degree by which the class they are part of actually represents them. In doing this, players fitting well with their class remain in them, while new classes can be created from players with an inter-class profile.
For example, while a player such as Thomas Partey from Arsenal is well defined as a member of the Jorginho class, Nemanja Matić is situated in an intermediary position between this class and Virgil van Dijk’s one. We can thus establish a new class bringing together all the individuals statistically closest to Matić than to Jorginho or van Dijk. In the end, two intermediary classes (Matić and Trippier) were added to the six initial ones.
For each of the eight classes, we focused on the three variables where the players stand out more than their teammates of all the classes taken together (see Figure 7), and established hierarchies based on these three variables. For example, players from the Van Dijk class were ranked by taking into account the differences in comparison to teammates for the following three variables: aerial defensive duels, ground defensive duels and interceptions.
Figure 7: the eight classes used for ranking
Figures 8 present the top 10 footballers from all leagues surveyed for each of the eight classes. Nevertheless, it is also possible to take into consideration the league level at which the footballers play, so as to only compare players in championships of relatively similar strengths, the big-5 for example.
Figure 8: most productive players with respect to teammates
6. Conclusion
The game indicators collected by InStat are a treasure from which it is possible to develop multiple innovative research with very practical applications. This report was conceived with a view to linking science and industry, an approach that we value dearly and that constitutes the raison d'être of the CIES Football Observatory research group.
The method of players’ profiling and classification detailed in this study is particularly useful not only from a descriptive perspective to understand the different roles played by footballers within a team, but also from the point of view of scouting. The calculation of statistical distances between players is indeed particularly useful when targeting potential recruits to replace departing players.
The choice of profiling and classifying players not based on raw statistics, but in comparison to teammates, is also particularly fruitful when it comes to talent spotting. Indeed, it mitigates the recurring problem of the impact of the collective force of a team on individual performances. This approach notably allows us to identify players who do not stand out in absolute terms, but whose productivity is well above that of their teammates.
This report constitutes just another step in the direction of taking full advantage of the possibilities available in terms of research and development through the analysis of technical game data such as those carefully gathered by InStat. We look forward to pursuing this field more fully and to making available new breakthroughs for all those passionate about the game.