In this tutorial, we will explore some interesting insights into Indian Premier League(IPL) matches played from 2008 – 2020. If you are not familiar with IPL, it is a franchise style T20 cricket league which is played every year.
I anticipate this topic to span across multiple articles covering various aspects of data analysis. I hope you have fun reading the article as much as I did writing it.
At the end of the tutorial, we will understand various concepts like:
- Data Procurement
- Data sanitization
- Data analysis
- Data visualization/insights
Please refer to some of the tutorials as part of this series:
- Part 2: Which IPL team played the maximum number of matches
- Part 3: Which venue(stadium) hosted the maximum number of matches
- Part 4: Importance on Toss on the outcome of the match
Data Procurement
You often find lot of datasets including this one in Kaggle, an online platform for data scientists and machine learning enthusiasts. The first step is to download the relevant dataset related to IPL matches and explore some interesting insights from the dataset.
The first step is to import all the packages necessary packages necessary for data procurement, analysis and visualization. We will use matplotlib
and seaborn
for data visualization.
Let us import the data into a dataframe. The dataset captures all the matches starting from the inaugural match in 2008 which as per this dataset is around 816 matches.
Data sanitization
This is most likely your first step in any data analysis project is the data sanitization. Most of the sanitization tasks involve:
- Converting unstructured data into a structured dataset
- Remove duplicate entries
- Add proper index and columns
For instance, in the current dataset, notice the below team names:
Notice, there are some duplicate names. Notice there are 3 variants for:
- Rising Pune Supergiants
- Delhi Capitals
- Deccan Chargers
Team Name | Change |
---|---|
Rising Pune Supergiants | Rising Pune Supergiants |
Rising Pune Supergiant | Rising Pune Supergiants |
Pune Warriors | Rising Pune Supergiants |
Delhi Capitals | Delhi Capitals |
Delhi Daredevils | Delhi Capitals |
Deccan Chargers | Deccan Chargers |
Sunrisers Hyderabad | Deccan Chargers |
As an exercise, can you please explore the following:
- Sanitize the venue names to make sure there are no variants referring to the same venue
- Sanitize the player names to make sure there are no variants referring to the same player
I hope you find this article helpful. Please refer to the project in GitHub https://github.com/kirancshet/IPL_Data_Analysis which includes the Jupyter notebook covering the entire code.
919 total views, 2 views today