Premier league result analysis using pandas

15-06-2023
source code

Introduction

In this article I will show you how you can analyse a team result in a given premier league season using pandas. Given I am a Manchester United fan I have picked Manchester United, surprise! The season that we will be targeting is 2018/2019 season simply because its data is available not for any other reason.

Background

I was always fascinated by the statistics and analysis that is displayed during football matches and in match analysis programs. As such I have decided to try to analyse some football data using pandas. The dataset is available here

Preparing dataset

Before I can work with the dataset I need to prepare it by removing certain columns that I don't need for my analysis. This is a good practise as it will allow you to focus on the data that you want for your analysis. Please refer to this article for more explanation about the operations used in trimming down dataset.

# Explore data
df.shape
df.columns
df.describe()

When running these commands I noticed that there are lots of columns to do with predictions and odds, so I opted to remove them because I will not use them. I have also filtered the data down to include only Manchester United games. homeFixtures holds home games and awayFixtures holds away games.

# Remove all betting and predictions columns
noPredictionColumns = df.drop(df.iloc[:,23:], axis=1)
unitedFixtures = noPredictionColumns[(noPredictionColumns.HomeTeam == "Man United") | (noPredictionColumns.AwayTeam == "Man United")]
homeFixtures = unitedFixtures[unitedFixtures.HomeTeam == "Man United"]
awayFixtures = unitedFixtures[unitedFixtures.AwayTeam == "Man United"]

Dataset analysis

Now I have the data that I want I can go ahead and start extracting the statistics that I want

Total goals scored

The following will give me total goals scored

homeGoals = sum(homeFixtures['FTHG'])
awayGoals = sum(awayFixtures['FTAG'])
print("Total goals: ", homeGoals + awayGoals)

Total wins/lose/draw

Here I am getting homeResults and awayResults then adding total wins/loses/draws from both datasets to get the total for each.

homeResults = homeFixtures['FTR'].value_counts().rename({"H": "Win", "D": "Draw", "A": "Lose"})
awayResults = awayFixtures['FTR'].value_counts().rename({"H": "Lose", "D": "Draw", "A": "Win"})
totalWins = homeResults['Win'] + awayResults['Win']
totalDraws = homeResults['Draw'] + awayResults['Draw']
totalLose = homeResults['Lose'] + awayResults['Lose']

resultDf = pd.DataFrame([totalWins, totalDraws, totalLose], index=["win", "draw", "lose"], columns=['total'])
resultDf

I can also plot this data frame using pie chart

resultDf.plot(kind="pie", subplots=True)

Project Structure

Goals against

# Total goals against
goalsAgainst = sum(homeFixtures['FTAG']) + sum(awayFixtures['FTHG'])
print("Total goals against: ", goalsAgainst)
# Average goals against per game
averageGoalsAgainst = (homeFixtures['FTAG'].mean(axis=0)+awayFixtures['FTHG'].mean(axis=0))/2
print("Average goals against per game: ", averageGoalsAgainst)

This will output the following:

Total goals against:  54
Average goals against per game:  1.42105263157894737

Goals for

# Total goals for
goalsScored = sum(homeFixtures['FTHG']) + sum(awayFixtures['FTAG'])
print("Total goals scored: ", goalsScored)
# Average goals scored per game
averageGoalsScored = (homeFixtures['FTHG'].mean(axis=0)+awayFixtures['FTAG'].mean(axis=0))/2
print("Average goals scored per Home game: ", homeFixtures['FTHG'].mean(axis=0))
print("Average goals scored per Away game: ", awayFixtures['FTAG'].mean(axis=0))
print("Average goals scored per game: ", averageGoalsScored)

This will output the following:

Total goals scored:  65
Average goals scored per Home game:  1.736842105263158
Average goals scored per Away game:  1.6842105263157894
Average goals scored per game:  1.7105263157894737

Most goals scored in a game

 Most goals scored in a game
mostGoalsHomeIndex = homeFixtures['FTHG'].idxmax()
mostGoalsAwayIndex = awayFixtures['FTAG'].idxmax()
print(f"Most goals scored in an away fixture was {awayFixtures.loc[mostGoalsAwayIndex]['FTAG']} against {awayFixtures.loc[mostGoalsAwayIndex]['HomeTeam']} on {awayFixtures.loc[mostGoalsAwayIndex]['Date']}")
print(f"Most goals scored in a home fixture was {homeFixtures.loc[mostGoalsHomeIndex]['FTHG']} against {homeFixtures.loc[mostGoalsHomeIndex]['AwayTeam']} on {homeFixtures.loc[mostGoalsHomeIndex]['Date']}")

This will output the following:

Most goals scored in an away fixture was 5 against Cardiff on 22/12/2018
Most goals scored in a home fixture was 4 against Fulham on 08/12/2018

Summary

In this blog I went through analysing premier league data set for season 2018/2019 then I extracted certain statistics for one team namely Manchester United.