Indian Premier League(IPL) Data Analysis from 2008-2020

In this tutorial, we will explore some interesting insights into Indian Premier League(IPL) matches played from 2008 – 2020. If you are not familiar with IPL, it is a franchise style T20 cricket league which is played every year.

I anticipate this topic to span across multiple articles covering various aspects of data analysis. I hope you have fun reading the article as much as I did writing it.

At the end of the tutorial, we will understand various concepts like:

Data Procurement
Data sanitization
Data analysis
Data visualization/insights

Please refer to some of the tutorials as part of this series:

Data Procurement

You often find lot of datasets including this one in Kaggle, an online platform for data scientists and machine learning enthusiasts. The first step is to download the relevant dataset related to IPL matches and explore some interesting insights from the dataset.

The first step is to import all the packages necessary packages necessary for data procurement, analysis and visualization. We will use matplotlib and seaborn for data visualization.

Let us import the data into a dataframe. The dataset captures all the matches starting from the inaugural match in 2008 which as per this dataset is around 816 matches.

Data sanitization

This is most likely your first step in any data analysis project is the data sanitization. Most of the sanitization tasks involve:

Converting unstructured data into a structured dataset
Remove duplicate entries
Add proper index and columns

For instance, in the current dataset, notice the below team names:

Notice, there are some duplicate names. Notice there are 3 variants for:

Rising Pune Supergiants
Delhi Capitals
Deccan Chargers

Team Name	Change
Rising Pune Supergiants	Rising Pune Supergiants
Rising Pune Supergiant	Rising Pune Supergiants
Pune Warriors	Rising Pune Supergiants
Delhi Capitals	Delhi Capitals
Delhi Daredevils	Delhi Capitals
Deccan Chargers	Deccan Chargers
Sunrisers Hyderabad	Deccan Chargers

As an exercise, can you please explore the following:

Sanitize the venue names to make sure there are no variants referring to the same venue
Sanitize the player names to make sure there are no variants referring to the same player

I hope you find this article helpful. Please refer to the project in GitHub https://github.com/kirancshet/IPL_Data_Analysis which includes the Jupyter notebook covering the entire code.

1,803 total views, 8 views today

Tags: Dataanalysis Programming

Data Procurement

Data sanitization

Leave a Reply Cancel reply

Latest Articles

How to fetch Director Information from MCA in Python?

How to fetch Company Information from MCA in Python?

IPL Data Analysis – Part 5 – Importance of venue on the outcome of the match

IPL Data Analysis from 2008-2020 – Part 4 – Importance on Toss on the outcome of the match

Indian Premier League(IPL) Data Analysis from 2008-2020 – Part 3

Indian Premier League(IPL) Data Analysis from 2008-2020 – Part 2

Shortcuts

Data Procurement

Data sanitization

Leave a Reply Cancel reply

Latest Articles

How to fetch Director Information from MCA in Python?

How to fetch Company Information from MCA in Python?

IPL Data Analysis – Part 5 – Importance of venue on the outcome of the match

IPL Data Analysis from 2008-2020 – Part 4 – Importance on Toss on the outcome of the match

Indian Premier League(IPL) Data Analysis from 2008-2020 – Part 3

Indian Premier League(IPL) Data Analysis from 2008-2020 – Part 2

Social Links