Three Reasons Why Chelsea Won the 16/17 EPL Season


In the 82nd minute of their 36th game of the EPL season, Michy Batshuayi’s goal secured Chelsea’s fifth English Premier League title. Even though Chelsea won the league with relative ease, it wasn’t all roses from the start. After the first six games, Chelsea had tallied only 10 points. Things came to a head when Chelsea was easily beaten 3-0 by red-hot Arsenal in late September. After that loss, Conte went from a back 4 to a back 3 that served him well at Juventus, and the results improved immediately with Chelsea reeling off 13 straight wins to put them on firm course to win the league.

With the new tools STATS have developed using machine learning, we give three reasons on how Chelsea won the league.

Reason No. 1: Chelsea were incredibly effective in converting chances

Although Chelsea scored the most goals this season, they only ranked fifth in the league in terms of chances created (see Figure 1). To estimate the number of chances created, we use the expected goals (xG) measure, which estimates the likelihood that the average league player will score a goal based on the situation (i.e., ball position, game-context etc. – see [1] for more details). 


Figure 1: Plot showing how many goals each team could have expected to score given the situation (Chelsea rank 5th with approximately 60 goals expected).

However, what highlights their offensive effectiveness in this year’s EPL is their xG plus-minus (xGpm), which is +22.4, meaning that Chelsea scored +22 more goals this season than expected[1].  To put this +22.4 measure in context, when we compare Chelsea with other teams this year, we see that they are executing their chances in a much more clinical fashion (see Figure 2). Tottenham are the next-closest team in terms of plus-minus with a +15.4 (although with two games remaining the Spurs were only +7.6 – meaning the last two games where the scored 13 goals somewhat inflated this statistic), followed by Liverpool (+4.8), Bournemouth (+4.6) and Burnley (+3.1). Southamption, on the other hand, were quite the opposite, missing more than 16 goals that the average team would have converted.


Figure 2: Ranking the teams on their goals-expected goals in the 16-17 EPL. Chelsea have a +22.4, seven more than Spurs.

From a historical perspective in terms of how this team compares in xGpm across the last six seasons in which we have calculated this statistic, we see that this Chelsea team are ranked third, with only Liverpool and Manchester City in the prolific 13-14 season being more effective (see Table 1).

Table 1: Ranked list of the most effective offensive teams across the last 6 seasons.

Rank Season Team xGpm
1 16-17 Chelsea +22.4
2 13-14 Liverpool +22.0
3 13-14 Manchester City +19.1
4 16-17 Tottenham +15.4
5 13-14 Arsenal +13.4

Needless to say, scoring 22 goals more than expected goes a long way to securing a title. However, as we will see in the next section, their defense played a massive role as well.

Reason No. 2: Defensively, Chelsea did not give up many chances

Similarly to what we did in the previous section, we can use the expected goals measure to analyze how effective a team’s defense is. Although Chelsea ranked fifth in creating chances, they are first defensively (see Figure 3).


Figure 3: Expected goals against measure which estimates how many goals a team should have conceded based on game situation. Chelsea gave up the fewest chances.

In terms of goals conceded, it is clear that Tottenham were far superior in terms of defense (26 vs 33). But when we look at the expected save (xS) measure, which estimates the likelihood that a shot will end up as a goal based on the player’s position and shot location, we can see that Hugo Lloris saved more than 10 goals that the “average league goalkeeper” would not have. Chelsea’s goalkeeper performance this season, on the other hand, was -2.  Figure 4 shows how the goalkeepers fared based on goals conceded minus the expected save value.


Figure 4: Comparing goalkeeping performance this year based on Saves vs Expected Saves

Reason No. 3: Chelsea went to a back 3 to provide more defensive stability

In the previous two sections, we showed quantitatively how Chelsea fared both offensively and defensively in terms of goal-scoring chances. But as noted earlier, after six games and a poor run of form, Antonio Conte changed from a back 4 to a back 3 – a move that’s been hailed as a key decision in turning things around. In this section, we show how the change in formation changed their style of play.

To do this analysis, we compared the performances of Chelsea for the first six games (until the Arsenal vs Chelsea match on Sept. 24) to the performances after. A summary of some key performance metrics are shown in Table 2. From this table, it can be seen that although Chelsea averaged more shots with a back 4 (16.8 vs 14.1 per game), they actually averaged more goals with a back 3 (2.2 vs 1.7). Defensively, they conceded the same amount of shots, but with a back 3 they conceded far fewer goals per match (0.7 vs 1.5). In terms of possession, with a back 3 they actually gave up around 4% possession per game, which indicates a change in playing style.

Table 2: Comparing offensive and defensive metrics when Chelsea had a Back-4 and Back-3.

Measure Offensive Defensive
  Back 4 Back 3 Back 4 Back 3
Shots per game 16.8 14.1 8.5 8.6
Goals per game 1.7 2.2 1.5 0.7
xG per game 1.6 1.5 0.9 0.7
Possession per game 57.3% 53.3% 42.7% 46.7%

Using a new metric developed at STATS, we can break up all continuous play possession into a series of “style” states, which automatically assigns a portion of a game into one of these distinct game phases. These style names are quite self-descriptive (i.e., direct-play, counter-attack, maintenance, build-up, sustained-threat, fast-tempo, crossing, high-press – but for more details see [2]).

In Figure 5, we compare Chelsea’s playing style between when they played with a back 3 and a back 4. From viewing this plot, it can be seen that when Chelsea played with a back 3 they used a lot more direct-play and their use of maintenance, build-up and sustained threat reduced. With a back 3, they also utilized less crosses. In terms of goal-scoring efficiency, this makes sense as it has been shown previously that the most effective way of scoring is via direct-play [3].


Figure 5: Chart comparing Chelsea’s playing style with a back-3 (blue) compared to back-4.

In Figure 6, we show the defensive playing style of Chelsea (i.e., how opposition teams tend to attack when they have possession of the ball).  What is interesting to note is that we have the opposite occurring, with Chelsea having less direct-play and more maintenance and crossing against them. As having a back 3 is thought to give a team more “defensive stability,” it also correlates with Chelsea conceding fewer good chances.


Figure 6: The defensive playing style of Chelsea (i.e,. when teams have the possession of the ball against Chelsea).


Using new analysis tools developed at STATS, we have been able to objectively measure Chelsea’s title run using expected goals, expected saves and playing styles measures.

[1] In our analysis, we have classified own goals down to luck, so in determining the “expected goals plus-minus” (xGpm) we exclude own goals from the goals values (i.e., xGpm = (Goals – Own goals) – xG.

Sports Data: Why It Can Fall Flat and Why It Doesn’t Have To


Sport has never been more competitive. Today, every athlete, coach and team are tapping into data analysis to achieve the slightest winning edge over their rivals. There’s huge appetite for this, but is the analysis as effective as possible?

The context here is that the volume of data available to teams is expanding exponentially. If you take football as an example, as the amount of data increases, it’s becomes harder to analyse these millions of data points into something that be quickly absorbed, tailored and shared to enhance teams’ performances and win more games. So instead, many teams only receive flat, statistical reporting, devoid of tactical context.

This ‘flat’ statistical reporting and data is limiting for two reasons:

Firstly, because data analysts are under increasing time pressures to produce new tactics and strategies. With only two or three days before the next game, implementing different playing styles on a team can be challenging. This means that analysts, more often than not, dive deeper into flat reporting and video footage.

Where’s the time for analysis? Likewise, sports scientists feel they have to spend much of their time in raw data and this leaves less time for analysis and guidance.

Secondly, there is no guidance for those attempting to interpret the figures. Simply, the tools that provide the data outputs don’t’ provide interactive analysis that enables analysts, coaches and managers to better understand the opposition’s playing style and the impact of their own team’s style. This would enable better decision-making around the likes of team selection, tactics and training regimes.

However, data can be segmented based on tactical situations and provide an understanding of how a team’s style will affect the opposition’s physical requirements. This can be done using a method called principle component analysis.

Principle component analysis, in the case of football, takes eight on-pitch incidents such as a dribble forward and reduces it to ‘playing styles’ for each team. These insights then allow analysts to classify teams into specific playing styles, for example the high press, counter attack or sustained threats. Narrowing down the analysis into playing styles helps save time and provides contextualised actionable data. Simplifying data and adding context can help data scientists and coaches understand how an opposition’s playing styles will affect their player’s requirements. This then allows sport scientists, coaches and players to create evidence-based tactical training sessions and capture the tactical workload for each position.

Does the fact that Manchester United or Chelsea have taken the highest number of shots this season really give valuable insight to an opposing coach to develop a strategy to stop that team shooting? The answer is no. Instead insights in today’s competitive climate must take all factors into consideration to provide a clear and accurate picture of what happened and most importantly why it happened.

On average, during a game, a footballer spends 97% of their time without the ball. That’s why it’s so important to know, not just what event is happening with the ball. It is critical to know what is happening all over the pitch. Achieving the winning edge can only be done when the numbers provide context. With context in place and algorithms that address data points on multiple levels across an entire game, scientists, coaches and players will have a more accurate predictor of what it takes to win and perform at the highest level.

AI and the Growing Use of Technology in Sport


The use of data in professional sport has grown significantly in recent years. In football, the volume of data available to teams is expanding exponentially. It is increasingly difficult to capture all of the relevant data points available and distill the complex information contained in those millions of data points per game into a series of simple representations. These representations then have to be quickly absorbed, tailored and shared to enhance teams’ performances and win more games.

The adage that big data provides big insights has never been more important, while also being increasingly more difficult to achieve. Sports analysts rely on data to do their jobs, collating and clipping information from training sessions and competitive matches, which helps manage squads, deal with injuries and help coaches make the decisions that matter.

Sports data intelligence is an exploding industry due to the sheer value behind the data. Player monitoring and tracking technologies have been in place for some time, but without timely and relevant context the numbers simply do not add up. Finding the context behind each situation is crucial for data analysts to get immediate answers they can rely upon to make informed decisions.

The power of machine learning and artificial intelligence is being harnessed by blue-chip businesses looking to amplify human potential. The same is happening in professional sport. Behind the scenes new technologies are maximising the value of data – so important in a complex football match with thousands of events per game translating into millions of data points. Machine learning is helping players and their sports science teams come up with objective measures and spot scenarios impossible to the human eye.

Artificial Intelligence can simulate such a quantity of events that it allows a data scientist to translate the insights, and make recommendations as to what will happen on the pitch. This arms coaches to make informed decisions on individual players and is vital in preparing for a game. The added insight can influence which players are selected in team sports and be helpful with a tight turnaround between games such as is the case with football. Beyond individual training schedules that should be organised, it can aid in determining tactics based on the opposition’s playing style.

When imagining a sports scientist or analyst, you often envisage a room full of screens, spending time analyzing footage (akin to Billy Beane’s assistant in Moneyball). Machine learning is improving this time-intensive role to enable faster and better-decision making. So using actionable insight, as opposed to a series of numbers that have no relevant context, helps those in this role get quicker, deeper analysis and add even greater value to the coaching staff.

The technology is only getting better. Artificial Intelligence is quickly being implemented in our everyday lives with digital assistants on our phones and in our homes. And it’s here to stay. By applying technology to sport where marginal gains are vital, we’re seeing effective analysis allowing football clubs and players get ahead.

Sport creates some of the greatest human achievements and holds such an inspiring emotional connection for fans around the world. It’s fascinating to see the relationship between artificial intelligence and team performance. In celebrating the peak of human or team performance and emotion, it’s machines that provide data for insights to be drawn. As machine learning algorithms evolve and become further sophisticated, they hold great potential to unlock performance on the sports field.

The Five Most Tweeted Stats in American Sports


When it comes to storytelling, there are few areas with more depth and richness than sport. Perhaps this is how sports statistics arose in the first place – fans have always wanted more ways to memorialize the experience of the game.  Well before technology began to redefine sports, fans have even been willing collect the stats for themselves.  Just talk to any old-timer at a baseball game, detailing each moment with his most trusted tools: a pencil and scorecard.

Today, not every fan may want to manually score the game, but there’s no doubt they want to fully experience it by interacting with fellow fans as something special is happening on the field. This is an excellent opportunity for brands of all types to connect with their customers’ deepest passions and in-the-moment state of mind. STATS conducted research into the sports moments that generate the most mentions on Twitter to give you a sense of the most engaging statistics in sports.

  1. The Grand Slam (2,229,043 mentions)

Perhaps the most exciting play in baseball, a grand slam changes the game entirely and creates buzz in the stadium and social networks alike.  For fans, a grand slam represents a batter picking up his team and likely securing a win. Since 2000, teams that hit a grand slam have an .871 winning percentage. So far in 2017, teams are 25-7 (.781) when hitting a grand slam. The batter gets the glory or the slam, but the bases can’t be loaded if teammates didn’t do their jobs first.

  1. The Triple Double (2,371,567 mentions)

Thanks to Russell Westbrook’s NBA-record 42 triple-doubles this season – joining Oscar Robertson as the only players in league history to average a triple-double – this term has risen in popularity on Twitter in 2017.  For fans, the triple-double represents a superhero level of performance. In games that Westbrook recorded a triple-double, the Thunder went 33-9 (.786). They were 14-26 (.350) in all other games.

  1. The Three Pointer (2,630,985 mentions)

A buzzer beating three-point attempt might be one of the most exciting moments in sports. This stat has grown in popularity with the success of the Golden State Warriors’ Splash Brothers, Stephen Curry and Klay Thompson. Each of the last five seasons has seen a new record for most three-pointers made league wide. Curry had the most by a player in each of the last three seasons – 286 in 2014-15, 402 in 2015-16 and 324 in 2016-17. They’re also the top three totals in a season all-time.

  1. The Home Run (6,004,212 mentions)

Not quite as powerful as a grand slam, but much more common, the home run is still the gold standard for sports fans when it comes to talking about success. The popularity of the long ball should not come as a big surprise as home run records have been amongst the most prestigious in sports. It doesn’t hurt that many of baseball’s rising stars like Aaron Judge, Freddie Freeman, Mike Trout, Joey Gallo and Bryce Harper are atop the leader boards for home runs.

  1. The Touchdown (6,064,479 mentions)

Nearly every NFL or college football fan has imagined this moment – and perhaps even practiced a touchdown dance in the mirror. The touchdown has reliably become the most engaging moment for American sports fans. Nothing gets fans excited on Twitter like a touchdown in the fourth quarter. Arizona Cardinals running back David Johnson created the biggest buzz last year in the end zone with seven – four on the ground and three through the air.

“Conte vs Mourinho:” Comparing the Chelsea Playing Styles of Champions


Much has been made of the impact and difference of Chelsea’s playing style since the arrival of Antonio Conte. Seeing that most of the squad that won the title in 2014-15 is on the current 2016-17 squad, this leads us to the question: “what is the difference between the two?” Specifically, how is the style between the two teams different?

With the new tools STATS has developed using machine learning, we can see where the teams are similar and where they are different. In this article, we run through a playing style “checklist,” which enables us to have a better understanding of how they have achieved success in these respective seasons. Overall, it will give us a sense whether there is a distinct change in playing style under Conte, whether they are the same – or somewhere in between.

Comparison No. 1: Creating and Executing Chances

The first thing to compare is how many chances each team created, and how effective they were in executing those chances (see Table 1 for summary). In the 14-15 season, Chelsea created around 15.1 shots per game, which is slightly above what they are achieving this year (14.6) but not significant.

What is staggering though, is how effective they have been in converting chances this year. The expected goals (xG) measure is a tool we can use which estimates the likelihood that the average league player will score a goal based on the situation (i.e., ball position, game-context etc. – see [1] for more details).  This year, across 35 games, Chelsea have scored 75 goals but have an expected goal value of 54. Given that one of these goals were from two own-goals, their xG plus-minus (xGpm) is +19, meaning that Chelsea have scored +19 more goals this season than expected[1]. This compares to the 14-15 season, when Chelsea had a xGpm of +8.5 and scored 73 goals but were expected to score 63.5 (one of those was an own goal as well).

Table 1: Comparing the chance creating and execution between the 14-15 and 16-17 Chelsea squads.

Season Games Shots/(shots per game) Goals/ (goals per game) Own goals Avg chance xG xGpm
14-15 38 573 (15.1) 73 (1.9) 1 11.1% 63.5 (1.7) 8.5
16-17 35 510 (14.6) 75 (2.1) 2 10.5% 54.0 (1.5) 19.0

In terms of explaining why Chelsea are more effective this season compared to the 14-15 version, let’s look at the individual contribution of the top scorers for the respective seasons (see Table 2). In the 14-15 season, Diego Costa was the leading goal scorer, netting 20 goals from an expected value of 14.8 – meaning he was +5.2 better than the average striker that year. The next-leading scorer was Edin Hazard, who scored 14 goals, with a plus-minus of +1.2.

Fast forward to this season, where Costa has been the leading scorer with 20 goals so far. However, his plus-minus has only been +2.4. Hazard has also been a leading light, scoring 15 but with a plus-minus of +4.5. Probably the key difference between this year and other years, has been the contribution of the other players – not in terms of the number of goals they have scored but by how efficient they have been. Pedro and Willian have been excellent – and in terms of their effectiveness in front of goal, they have been quite clinical with plus-minuses of +2.6 and +4.6, respectively. Even Marcos Alonso has been effective from his customary left-wing-back position, chiming in with six goals, with a plus-minus of +2.5.

A thought of Jose Mourinho teams in the past has been their reliance on individual brilliance instead of focusing on offensive team play. This table suggests that Conte has been able to extract more from other key offensive players – not just Costa and Hazard. This begs the question: Do Chelsea create chances in a different manner from Mourniho’s 14-15 team, or are they just better finishing the chances?

Table 2: Table comparing goals-scorers of the 14-15 season and the 16-17.

2014-2015   2016-2017
Player Goals xG G-xG Player Goals xG G-xG
COSTA 20 14.8 5.2 COSTA 20 17.6 2.4
HAZARD 14 12.8 1.2 HAZARD 15 10.5 4.5
REMY 7 3.7 3.3 PEDRO 8 5.4 2.6
OSCAR 6 5.5 0.5 WILLIAN 7 2.4 4.6
TERRY 5 4.1 0.9 ALONSO 6 3.5 2.5
IVANOVIC 4 4.1 -0.1 CAHILL 6 2.8 3.2
DROGBA 4 2.4 1.6 FABREGAS 4 2.4 1.6
IVANOVIC 4 4.1 -0.1 MOSES 3 2.9 0.1

Comparison No. 2: Is there any difference in how they created chances?

Even though the 14-15 and 16-17 teams created approximately the same amount of chances per match, there could be a difference in how they created the chances. Recently, we have created a dictionary of scoring methods which are described below: Build-Up/Normal, Counter-Attack, Direct-Play, Corner Kick, Free-Kick, From-Free-Kick, Cross, From Cross, Throw-In and Penalties. In Figure 3, we show how many chances were created for each type of shot.


Figure 1: The creation of chances between the 14-15 and 16-17 seasons.

In Figure 3, we can see a number of things:

  • In the 14-15 season, 50.1% of the chances created were in the build-up style, compared to this year, which is 37.7%.
  • This season, Chelsea are creating more chances from direct-play, free-kicks and crosses.
  • There is no difference in terms of chances created for counter-attacks, corner-kicks and penalties.

As Chelsea are +19 in terms of expected goals plus-minus, does this change where the goals are coming from? In Figure 4, we show the comparisons of how they’re scoring. The key points are that Chelsea are getting fewer goals from build-up and corner kicks and more from the counter-attack and direct-play in the 16-17 season.


Figure 2: Where are the goals coming from? Comparison of goals between the 14-15 and 16-17 squads.

Comparison No. 3: Conceding Chances

Now that we have compared the offensive performance between the 14-15 and 16-17 squads, we can do the same on the defensive side. Table 3 gives a summary of both teams. The first thing to notice is that the current squad gives up far fewer shots per game (8.6 vs 11.2). When we look at the expected goal plus-minus, we can see that Chelsea were -4.9, meaning that they should have conceded nearly five more goals then they have (see next comparison on goalkeeping to see a reason why this was the case). This season, their plus-minus is 1, meaning that they have conceded one more goal than what we would expect them to have.

Table 3: Comparing how opposition teams created and executed chances between the 14-15 and 16-17 squads.

Season Games Shots/(shots per game) Goals/ (goals per game) Own goals Avg xG xG xGpm
14-15 38 424 (11.2) 32 (0.8) 1 8.5% 35.9 -4.9
16-17 35 301 (8.6) 29 (0.8) 1 9.1% 27.0 1.0

A strong cue into describing the defensive discrepancy between the two seasons relates to goalkeeping. Using the expected save (xS) value, which estimates the likelihood that a goalkeeper should have saved a shot based on the game situation (i.e., player position, ball position and game-phase). In Table 4, we can see that Thibaut Courtois was more effective in the 14-15 season with a expected-save plus minus of +4.3 compared to this year’s -0.5. This means that he saved four goals more than the league-average keeper would have saved in the 14-15 season, compared to -0.5 goals this year – just below the league average.

Table 4: Comparing goalkeeping performance between the 14-15 and 16-17 squads.

Season Games Saves xSaves Saves-xSaves
14-15 38 91 (2.4) 86.7 4.3
16-17 35 72 (2.0) 72.5 -0.5

Comparison No. 4: Playing Styles

Using a new metric, which we have developed at STATS, we can break up all continuous play possession into a series of “style” states, which automatically assigns a portion of a game into one of these distinct game phases. These style names are quite self-descriptive (i.e., direct-play, counter-attack, maintenance, build-up, sustained-threat, fast-tempo, crossing, high-press – but for more details see [2]).

In Figure 3, we compare Chelsea’s playing style. Generally, they have a similar possession percentage, and a similar amount of direct play. But there is a substantial relative increase in counter attack, from 14.2% to 24.5% compared to the league average. To give this increase some context, Leicester last season showed a +26.2% in counter attack, just slightly above Chelsea this year (and Leicester were considered very counterattack-heavy). Interestingly, Leicester paired that with an above average direct play, whereas Chelsea are well below average in this respect.

In terms of the possession-based styles (i.e., maintenance, build up and sustained threat), while they have a similar amount of possession, they seem to be less offensive when they have long possessions than they were in 14-15. In terms of the other style categories, there was a big increase in fast-tempo this season, which means they had more possessions where they circulate the ball quickly in the opposition’s half. In terms of crossing, this season they were on par with the league average in terms of crossing, whereas before they were quite a bit below the league average (+0.6% this season, -24.1% in 14-15). In terms of high pressing, they do less than in 14-15.


Figure 3: Chart comparing Chelsea’s playing style in 14-15 vs 16-17.

Summary: Chelsea Composite Squad: (Formation 3-4-3)

Based on our analysis and advanced metrics, we have put together a composite starting 11. As the 3-4-3 this year has been more effective defensively, we have used this formation.

Goalkeeper: Thibaut Courtois (14-15)

Back 3: Cesar Azipuleta, David Luis and Gary Cahill (c) (all from the 16-17 squad)

Left Wing-Back: Marcos Alonso (16-17) – due to his goal-scoring feats and as well as the defensive exploits in this 16-17 title winning squad).

Right Wing-Back: Pedro (16-17) – obviously being played out of position and Victor Moses would feel hard done by, but there were some other performances from the 14-15 squad that were impossible to not include.

Holding Midfielders: N’golo Kante, (16-17) and Nemanja Matic (14-15)this year’s PFA player picks himself, but Matic’s imperious form in 14-15 was monumental in helping the Blues win the title.

Forward Three: Cesc Fabregas (14-15), Diego Costa (14-15) and Edin Hazard (16-17) – Fabregas was immense in the 14-15 season (and has played some very important cameos this season) and deserves a spot in the starting 11 (although he played out of position). Diego Costa has been a colossus in both seasons, scoring 20 goals in each – we choose the 14-15 season version as he was slightly more efficient in terms of xG. Similarly, both versions of Edin Hazard would be first name of the team-sheet, but again, due to his xG efficiency this season, the 16-17 version gets the starting birth.

Manager: Antonio Conte – both the 14-15 and 16-17 squads were similar in a lot of their attributes. But in terms of sheer efficiency in both offensive and defensive departments, Conte gets the nod by revamping the squad and by changing formations – which ultimately altered the fortunes of this Chelsea squad.

[1] In our analysis, we have classed own-goals down to luck, so in determining the “expected goals plus-minus” (xGpm) we exclude own goals from the goals values (i.e., xGpm = (Goals – Own goals) – xG.