Analysing La Liga’s Goalkeepers With AI-Captured Shot-Stopper Identity

By: Paul Power | April 2, 2019

It is no secret Thibaut Courtois’ form has been questioned publicly by once and current Real Madrid manager Zinedine Zidane. Even before that, it was questioned in the media. Zidane favoured Keylor Navas, having previously stayed loyal to his keeper in the 2017/18 season despite mounting criticism of the keeper’s performances.

In fact, Zidane recalled Navas in his first game back in charge against Celta Vigo. He did so because he’s the man in charge and his opinion is ultimately the one that matters. But how can we accurately measure the impact this change could have had on Real Madrid’s season to date? To do this, we need to be able to accurately simulate how one keeper would manage facing another keeper’s shots – or better yet, how each keeper would perform against every shot in a uniform sample.

We’ll get back to the Real Madrid debate in a moment. First, let’s consider the larger problem of properly evaluating keeper performance.

The Problem with Evaluating Goalkeepers

Goalkeepers are assessed using metrics such as clean sheets, goals conceded and save percentage. More recently “expected” metrics such as expected saves (xS) have been introduced to compare performance to the league average. However, goalkeepers could have completely different types of saves to make depending on the defensive style of their team and the opponents they face.

Rather than using metrics which may not capture all the different situations and contexts, why can’t we go beyond metrics and simply simulate each goalkeeper for every shot, then compare who would concede the fewest goals?

Here, we outline how we do this and how our novel method can be used by coaches and recruitment departments to better analyse keepers and understand their strengths and weaknesses. If you’re interested in finding out more about the method, follow this link to Trading Places – Simulating Goalkeeper Performance using Spatial & Body Pose Data.

Capturing Goalkeeper Unique Identity

To capture the effect a goalkeeper has on strikers’ decision, you would need very fine-grained features such as the goalkeeper’s body position. For example, how are they position their arms? And are they flat-footed? This data is hard to come by on a large scale. Another way to do this would be to take hundreds of thousands of shots for every keeper and learn a personal model for each one, but the reality is a goalkeeper will face on average five shots on target a game. So this option is out.  Can we capture this in other ways?

We solve this problem by creating a set of features that measure strengths and weaknesses of keepers when facing shots in different situations. For example, are they stronger to their left- or right-hand side?  How do they cope in 1-v-1 situations? Are they easily beaten by their feet? Once we generate these features, we combine them together into what we call a Player Embedding, which creates a unique data-driven goalkeeper identity. When we plot these embeddings, we can see that we are now able to separate goalkeepers into different groups, meaning we are able to pick up on the unique nuances of each one (see the published paper linked above for full details on how this is calculated).

If we combine these embeddings with a normal xS model, we significantly improve our accuracy in predicting whether a goalkeeper will save a shot on target. Crucially, this allows us to move into the area of personalised player prediction. As a result, if we want the know how Goalkeeper A would cope facing Goalkeeper B’s shots, or in our case how Navas would cope with Courtois’ shots, we can now simply take each player’s unique identity and swap them to simulate what we expect to happen.

Real Have 99 Problems, But Navas Ain’t One

Going back to our Real Madrid example above, we’re now able to accurately simulate how many goals Navas would have prevented when facing Courtois’ shots and vice versa. Using our Simulated Save Chart, if we run the model over 99 shots both keepers have faced this season, we see that Navas would have been expected to save an additional four goals compared to Courtois. By visualising where each goalkeeper is stronger or weaker, we can see that Navas (red) is significantly better at saving shots toward the centre of the goal while Courtois (green) is only significantly better at saving shots near his bottom left.

This is critical information for the coach to be able to make an informed judgement when selecting which player to start. It’s also valuable in other areas such as coaching and the high-stakes area of recruitment.

Shopping in La Liga – Goals Added Rankings

La Liga has played host to some of the best modern-day goalkeepers such as Iker Casillas and Jan Oblak, and has exported a large number of top-class goalkeepers such as David De Gea and Pepe Reina. As a result, La Liga is a prime league for goalkeeper recruitment. However, not everyone can afford to go out and buy a well-known top-class goalkeeper. We can use our model to easily assess any goalkeeper in our database and find hidden gems.

The beauty of the model is that we’re able to simulate every goalkeeper against every shot taken in any game for which we have data. As a result, we can now provide accurate rankings of goalkeepers based on how many additional goals a player either prevents or allows compared to an average per 90 minutes. Our goals added model generates some interesting insights with broad applications.

As you would expect, names such as Jan Oblak and Marc-Andre Ter Stegen are in the top 10. But the player at the top of the rankings – Roberto Santamaria – doesn’t quite fit that level of name recognition. Santamaria plays for bottom club SD Huesca and has been somewhat of a journeyman, playing for five clubs in the past five seasons. He got his chance when Aleksandar Jovanovic – the league’s lowest rated keeper – was dropped. Santamaria became Huesca’s No. 1 on 23 December. Prior, they averaged 0.5 team points per match with 32 goals conceded in 16 La Liga matches. With him in goal since: 1.08 points per match with 18 conceded in 13 matches. Considering he’s playing for a relegation-threatened team, it’s a promising track record. And when simulated against the same shot sample as the rest of La Liga’s keepers, he’s the division’s best.

STATS AI Goalkeeper Index is a new model that allows us to accurately simulate and assess a goalkeeper’s strengths and weaknesses. and directly compare a target signing against others. In addition, it is possible to simulate how a new goalkeeper would increase or decrease the number of goals conceded compared to existing players in the squad.

And if Real need any further confirmation that bringing Zidane back was the proper decision, it seems his assessment of Courtois was apt.