January 28, 2025

In this tutorial, we will explore some interesting insights into Indian Premier League(IPL) matches played from 2008 – 2020. If you are not familiar with IPL, it is a franchise style T20 cricket league which is played every year.

I anticipate this topic to span across multiple articles covering various aspects of data analysis. I hope you have fun reading the article as much as I did writing it.

At the end of the tutorial, we will understand various concepts like:

  • Data Procurement
  • Data sanitization
  • Data analysis
  • Data visualization/insights

Please refer to some of the tutorials as part of this series:

Data Procurement

You often find lot of datasets including this one in Kaggle, an online platform for data scientists and machine learning enthusiasts. The first step is to download the relevant dataset related to IPL matches and explore some interesting insights from the dataset.

The first step is to import all the packages necessary packages necessary for data procurement, analysis and visualization. We will use matplotlib and seaborn for data visualization.

Let us import the data into a dataframe. The dataset captures all the matches starting from the inaugural match in 2008 which as per this dataset is around 816 matches.

Data sanitization

This is most likely your first step in any data analysis project is the data sanitization. Most of the sanitization tasks involve:

  • Converting unstructured data into a structured dataset
  • Remove duplicate entries
  • Add proper index and columns

For instance, in the current dataset, notice the below team names:

Notice, there are some duplicate names. Notice there are 3 variants for:

  • Rising Pune Supergiants
  • Delhi Capitals
  • Deccan Chargers
Team NameChange
Rising Pune SupergiantsRising Pune Supergiants
Rising Pune SupergiantRising Pune Supergiants
Pune WarriorsRising Pune Supergiants
Delhi CapitalsDelhi Capitals
Delhi DaredevilsDelhi Capitals
Deccan ChargersDeccan Chargers
Sunrisers HyderabadDeccan Chargers

As an exercise, can you please explore the following:

  • Sanitize the venue names to make sure there are no variants referring to the same venue
  • Sanitize the player names to make sure there are no variants referring to the same player

I hope you find this article helpful. Please refer to the project in GitHub https://github.com/kirancshet/IPL_Data_Analysis which includes the Jupyter notebook covering the entire code.

 1,173 total views,  2 views today

Leave a Reply

Your email address will not be published. Required fields are marked *