Overview¶
The English Premier League is one of the world's most-watched soccer leagues, with an estimated audience of 12 million people per game. With the substantial financial benefits, all significant teams of EPL are interested in Analytics and AI. Regarding sports analytics, machine learning and artificial intelligence (AI) have become extremely popular. The sports entertainment sector and the relevant stakeholders extensively use sophisticated algorithms to improve earnings and reduce business risk associated with selecting or betting on the wrong players.
Regression is one of the foundational techniques in Machine Learning. As one of the most well-understood algorithms, linear regression plays a vital role in solving real-life problems. In this project, we wish to use Linear Regression to predict the scores of EPL soccer players. With the business implications cleared. Let's get into the project's technical details.
This project is part of the Linear Regression Beginner Project Series, and it consists of discussing and implementing the fundamentals of Linear Regression in Python on the EPL Soccer Player Dataset.
Approach¶
- Install Packages
- Import Libraries
- Exploratory Data Analysis
- Correlation
- Relationship between Cost and Score
- Train - Test Split
- Linear Regression
- Model Summary
- Prediction on Test Data
- Diagnostics and Remedies
Important Libraries¶
-
pandas: pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool built on top of the Python programming language. Refer to documentation for more information.
-
NumPy: The fundamental package for scientific computing with Python. Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today. NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. Refer to documentation for more information. pandas and NumPy are together used for most of the data analysis and manipulation in Python.
-
Matplotlib: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Refer to documentation for more information.
-
seaborn: Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Refer to documentation for more information.
-
scikit-learn: Simple and efficient tools for predictive data analysis accessible to everybody and reusable in various contexts. It is built on NumPy, SciPy, and matplotlib to support machine learning in Python. Refer to documentation for more information.
-
statsmodels: statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests and statistical data exploration. Refer to documentation for more information.
-
SciPy: SciPy provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics, and many other classes of problems. Refer to documentation for more information.
Sample Data¶
PlayerName | Club | DistanceCovered(InKms) | Goals | MinutestoGoalRatio | ShotsPerGame | AgentCharges | BMI | Cost | PreviousClubCost | Height | Weight | Score |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Braund, Mr. Owen Harris | MUN | 3.96 | 7.5 | 37.5 | 12.3 | 60 | 20.56 | 109.1 | 63.32 | 195.9 | 78.9 | 19.75 |
Allen, Mr. William Henry | MUN | 4.41 | 8.3 | 38.2 | 12.7 | 68 | 20.67 | 102.8 | 58.55 | 189.7 | 74.4 | 21.3 |
Moran, Mr. James | MUN | 4.14 | 5 | 36.4 | 11.6 | 21 | 21.86 | 104.6 | 55.36 | 177.8 | 69.1 | 19.88 |
McCarthy, Mr. Timothy J | MUN | 4.11 | 5.3 | 37.3 | 12.6 | 69 | 21.88 | 126.4 | 57.18 | 185 | 74.9 | 23.66 |
Palsson, Master. Gosta Leonard | MUN | 4.45 | 6.8 | 41.5 | 14 | 29 | 18.96 | 80.3 | 53.2 | 184.6 | 64.6 | 17.64 |
Saundercock, Mr. William Henry | MUN | 4.1 | 4.4 | 37.4 | 12.5 | 42 | 21.04 | 75.2 | 53.77 | 174 | 63.7 | 15.58 |
Andersson, Mr. Anders Johan | MUN | 4.31 | 5.3 | 39.6 | 12.8 | 73 | 21.69 | 87.2 | 60.17 | 186.2 | 75.2 | 19.99 |
Rice, Master. Eugene | MUN | 4.42 | 5.7 | 39.9 | 13.2 | 44 | 20.62 | 97.9 | 48.33 | 173.8 | 62.3 | 22.43 |
Williams, Mr. Charles Eugene | MUN | 4.3 | 8.9 | 41.1 | 13.5 | 41 | 22.64 | 75.1 | 54.57 | 171.4 | 66.5 | 17.95 |
Fynney, Mr. Joseph J | MUN | 4.51 | 4.4 | 41.6 | 12.7 | 44 | 19.44 | 65.1 | 53.42 | 179.9 | 62.9 | 15.07 |
Jupyter Notebook¶
- Use this link
to view the jupyter notebook of this project.
Related Blog¶
- check out this Blog
.
Downloads¶
Say Hi, to collaborate!
Hello there, The fun for me in collaboration is, working with other people just makes us smarter; thatβs proven. Having said that, drop a mail at mr@hemath.com.