After Sheffield United’s unbelievable campaign saw them finish eighth, and with many pundits keen to point out the fact that even Villa (17th) and relegated Bournemouth (18th) managed to score more goals than United during the campaign, I decided to take a dive into the numbers of the different promoted sides throughout the Premier League history to see if there was any secret to survival. My main aim from this was to work in Tableau to create a range of insightful visualisations, such as scatter graphs, which I had not created before.
I have included the relevant sides going back as far as the 1996/97 season. The 1995/1996 season was the first Premier League season in its current 20 team, 38 game format, and so the 1996/97 season was the first in its current format which saw three newly promoted sides from the Championship, which of course, has also been the same ever since.
There are certain factors which will affect the data which should also be considered if we were to dive deeper into the data, I have chosen to not do this as I am instead focusing on improving my skills in Tableau. For example, it should be remembered that certain goals scored/conceded numbers are massively affected by one result in a season, and so possibly by not including some more extreme results, we may unlock new answers on how beneficial a good attack/defence in the majority of the games is to survival. Newly promoted sides can find themselves on the end of a drubbing once or twice a season.
We could also try to glean more specific information, for example, if there is a certain style of play most beneficial to survival. This could be done using advanced statistics from a source such as FBref, unfortunately, it doesn’t include advanced stats for all the seasons that I was looking at. Obviously, football has also changed a lot over that time, and so we are unlikely to find any serious correlations there. Guardiola and many journalists have suggested that English football has changed since his arrival, with even the likes of Rochdale in League One trying to implement a more progressive passing football game, rather than the direct football we are used to seeing outside of the Premier League. It seems that those lower in the Premier League also are trying to play ‘good’ football, rather than outmuscle sides from set pieces and long balls and try to sit out a 1-0 win at home to ensure survival.
Ultimately, there are many ways to win a football game, and there is unlikely to be one factor that would ensure any newly promoted side's survival.
Surprisingly, to me at least, the pie chart here shows that more sides promoted from the Championship survive their first Premier League season than are relegated. If we then look at the number of times newly promoted sides have finished in each position, we then see the three most common final positions are 20th, 19th and 18th (i.e. the relegation places) respectively. We also see that Sheffield United’s campaign puts them as the fourth best ever newly promoted side based on their final league standing in their first season. Ipswich’s 2000/01 season was the best ever by a newly promoted side, and their haul which saw them finish 5th, would also have seen them finish fifth this season, on equal points to Champions League placed Manchester United and Chelsea. Unsurprisingly, the majority of newly promoted sides are finishing in the bottom half. With the Premier League’s money injection over recent years, it is unlikely that we will get a better fairytale story than Leicester’s 2016 League win. That was unimaginable, one can only dream of a team copying Nottingham Forest’s feat in 1976-78 when Clough’s side won promotion to the top division, and went onto win it in their first season in the League.
When we then break down the points outcome into a box plot for both the survival and relegated group, we see that only two teams (96/97 Sunderland and 97/98 Bolton) have ever reached the infamous ‘40-point mark’ and still not survived, whereas plenty of newly promoted teams have managed to scrape survival with less than this, including Aston Villa this season (partly thanks to the Goal Line Technology issue against Sheffield United). The lower quartile for the survival group sits at 39 points, with the relegated maximum at 40 points, this shows (obviously) that the survival group consistently reach a greater points total.
Of course, all of this is intuitive, and although the graphs may look nice, they don’t give us any information that anyone who watches football wouldn’t already know. Therefore, I created a range of scatter graphs to try to see if it was a good attacking output, or competent defending that ensured survival. Again, it would have been useful to go into further depth here if we were looking to do more than further upskill in Tableau.
Going forwards, we are going to continue to use the colour scheme of green for those teams who survived and red for those who were relegated, as expected. It is a common trend to use team badges at the minute to separate each point within the football analytics community on social media, however I don’t think this is necessary here. The tooltip is used to give this and greater detail, and using team badges is confusing for those teams who have featured in the Premier League various times after Championship promotion throughout the years.
Firstly, by comparing league position to win percentage, as expected we see a strong correlation. Tableau’s trend line feature gives us an R-squared value. In this case, it equals 0.79. At a glance, it is surprising to see nine teams surviving despite having worse or equal win percentages to some of the relegation group, but then the tooltip’s additional detail will reveal that these teams are in fact those in the lower quartile of the survival group’s points totals.
As you would predict, comparing league position to points does ascertain an even stronger correlation (R-squared = 0.83). It is so important for those sides battling at the bottom of the table to pick up any points where possible, so they cannot rely on just winning home games, and will traditionally often be set up to see out a draw if possible against stronger opponents and in away games.
Finally, we can give more context to the 40 point mark by looking at how many wins this requires. An R-squared value of 0.93 shows just how important it is to pick up a certain amount of wins for survival. No team has been relegated with 11 wins, and no team has scored less than 40 points at the 12 win mark, therefore it would be sensible for a club aiming for survival to not only have a target of 40 points, but also for 12 wins to ensure this target is met.
Now we can begin to look at how the attacking and defensive outputs influence survival. Firstly, by looking at points and goal difference, both are divided to give the statistics as per game. Although the goal difference per game may look rather poor, even for Ipswich Town, leading the way on +0.40, however to put this into perspective, Arsenal’s eighth placed finish this season saw them have a goal difference per game of +0.21 for the season, and although they did have defensive woes, Aubameyang chased Vardy all the way for the golden boot, showing they did still pose a high quality attacking threat. Ninth placed Sheffield United had a goal difference of 0, and all teams below them were in negative numbers, so those towards the top right of the graph had relatively good goal differences per game across the season.
The vertical and horizontal lines show the average value for each axis and are used in several of the following scatter graphs. This begins to become more useful when there isn’t a strong correlation for more advanced statistics, as we begin to get a better understanding of how a team might play and if certain factors of their games are in fact strengths or weaknesses.
This is shown in the next visualisation, where we can see which teams have the best attacking and defensive outputs, based on goals scored and conceded. There is a large amount of variance, and so by using the averages, we can conclude who has a ‘good’ or ‘bad’ attack and defence, relative to the group.
Finally, we can assess if there is a stronger correlation between having a good attack or defence to ensure survival. This is done by showing goals scored/conceded per game against points per game. Firstly, looking at goals scored vs points, the key difference between the surviving and relegated groups are how efficient they were. Even if they scored the same, those who survived managed more points per goal, as they are mostly above the trend line, whereas the relegated group mostly sit below the trend line. The survival group is effectively outperforming their points total based on their goals scored in many cases. This would suggest that they conceded fewer goals than the stronger attacking sides who ended up being relegated.
When looking at goals conceded compared to points, there isn’t as clear a division between the two groups, with both showing a decent number of data points above and below the trend line. However, the amount of goals conceded is much more distinct in comparison to goals scored, with the survival group dominating the left half, and the relegated group occupying the right half of the graph, meaning they conceded more goals.
Both goals scored and goals conceded have a moderate correlation to points with R-squared values of 0.51 and 0.53 respectively. This is to be expected as there are so many other factors that influence a football game, scoring lots of goals is irrelevant if the defence is shaky, and vice versa. Goals at both ends of the field will influence the final result.
Conclusion
To conclude, I have met my target of producing various types of graphs in Tableau and ensuring that a high professional standard is maintained throughout. It would definitely be useful to explore more advanced statistics to produce greater insights here, as many of these visualisations show what we would expect.
I have shown that the 40 point target should also be broken down into aiming for 12 wins in a season to ensure survival. It is unsurprising that if the team is good enough to win 12, they would be able to collect at least six points through draws to ensure safety. As those teams who survived often outperformed the trend line for goals scored to points, it shows the importance of game management and holding onto a result once going a goal up. This then makes it unsurprising that Sheffield United managed to have such a great campaign despite scoring a limited amount of goals, even compared to those at the bottom of the table. However, I was surprised that there wasn’t more of a correlation between goals conceded and points, I thought that good defensive outputs may be shown to be more important to survival in this graph. Again, I might question if large results may make a difference to this, and so possibly by filtering out the big losses we may find more meaningful results.
Comments