Skip to content

Overview

The English Premier League is one of the world's most-watched soccer leagues, with an estimated audience of 12 million people per game. With the substantial financial benefits, all significant teams of EPL are interested in Analytics and AI. Regarding sports analytics, machine learning and artificial intelligence (AI) have become extremely popular. The sports entertainment sector and the relevant stakeholders extensively use sophisticated algorithms to improve earnings and reduce business risk associated with selecting or betting on the wrong players.


image


Regression is one of the foundational techniques in Machine Learning. As one of the most well-understood algorithms, linear regression plays a vital role in solving real-life problems. In this project, we wish to use Linear Regression to predict the scores of EPL soccer players. With the business implications cleared. Let's get into the project's technical details.

This project is part of the Linear Regression Beginner Project Series, and it consists of discussing and implementing the fundamentals of Linear Regression in Python on the EPL Soccer Player Dataset.

Approach

  • Install Packages
  • Import Libraries
  • Exploratory Data Analysis
  • Correlation
  • Relationship between Cost and Score
  • Train - Test Split
  • Linear Regression
  • Model Summary
  • Prediction on Test Data
  • Diagnostics and Remedies

Important Libraries

  • pandas: pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool built on top of the Python programming language. Refer to documentation for more information.

  • NumPy: The fundamental package for scientific computing with Python. Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today. NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. Refer to documentation for more information. pandas and NumPy are together used for most of the data analysis and manipulation in Python.

  • Matplotlib: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Refer to documentation for more information.

  • seaborn: Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Refer to documentation for more information.

  • scikit-learn: Simple and efficient tools for predictive data analysis accessible to everybody and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib to support machine learning in Python. Refer to documentation for more information.

  • statsmodels: statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests and statistical data exploration. Refer to documentation for more information.

  • SciPy: SciPy provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and many other classes of problems. Refer to documentation for more information.

Sample Data

PlayerName Club DistanceCovered(InKms) Goals MinutestoGoalRatio ShotsPerGame AgentCharges BMI Cost PreviousClubCost Height Weight Score
Braund, Mr. Owen Harris MUN 3.96 7.5 37.5 12.3 60 20.56 109.1 63.32 195.9 78.9 19.75
Allen, Mr. William Henry MUN 4.41 8.3 38.2 12.7 68 20.67 102.8 58.55 189.7 74.4 21.3
Moran, Mr. James MUN 4.14 5 36.4 11.6 21 21.86 104.6 55.36 177.8 69.1 19.88
McCarthy, Mr. Timothy J MUN 4.11 5.3 37.3 12.6 69 21.88 126.4 57.18 185 74.9 23.66
Palsson, Master. Gosta Leonard MUN 4.45 6.8 41.5 14 29 18.96 80.3 53.2 184.6 64.6 17.64
Saundercock, Mr. William Henry MUN 4.1 4.4 37.4 12.5 42 21.04 75.2 53.77 174 63.7 15.58
Andersson, Mr. Anders Johan MUN 4.31 5.3 39.6 12.8 73 21.69 87.2 60.17 186.2 75.2 19.99
Rice, Master. Eugene MUN 4.42 5.7 39.9 13.2 44 20.62 97.9 48.33 173.8 62.3 22.43
Williams, Mr. Charles Eugene MUN 4.3 8.9 41.1 13.5 41 22.64 75.1 54.57 171.4 66.5 17.95
Fynney, Mr. Joseph J MUN 4.51 4.4 41.6 12.7 44 19.44 65.1 53.42 179.9 62.9 15.07

Jupyter Notebook

  • Use this link πŸ”— to view the jupyter notebook of this project.

Downloads

  • use this link πŸ”— to download the data.
  • use this link πŸ”— to download the notebook.

Say Hi, to collaborate!

Hello there, The fun for me in collaboration is, working with other people just makes us smarter; that’s proven. Having said that, drop a mail at mr@hemath.com.