How to populate batting position in cricsheet data?

This article will help you understand how to write a logic to compute the batting position of a player in the given data

Step 1: Let’s take a look at how the data looks like:

source: Cricsheet

Now, our job is to write a logic to populate the batting position

Step 2: Logical flow with an example

Assume we have the following scenario with striker, non_striker & outcome of that ball.

We can clearly see Rohit & Rahul are the openers and Rohit is 1, Rahul is 2 in their batting orders which is plain and simple. Let us store this values to a new column

Now comes the tricky part, here there’s a run-out, so for the next ball Rohit will be the man on strike & Virat the new entry will be at the non_strikers end.

So we need to keep track of both striker & non_striker positions while calculating, so lets add Virat to the list.

Next one should be simple, Rohit is bowled so Hardik will be on strike, lets’s add him to our lineup & finally, we have the whole batting lineup:

Step 3: Code for the above logic

Now comes the more fun and tricky part, converting our logic to python code and that too for some real data. Let’s read the dataset:

df = pd.read_csv('ipl_csv2/335982.csv')

If you wanna think about writing the logic on your own, would highly recommend you to pause the article here and then come back once you have your own version.

Here we’ll have to sort the data to make sure it’s in proper order.

# sort the data 
df = df.sort_values(['innings', 'ball'], ascending=True)

Now, create a new column for batting_pos & also initialize a few variables

# create a col for batting pos
df['batting_pos'] = None

# initialize a dict to store batting pos for both innings
batsman_list_i1 = {}
batsman_list_i2 = {}

# counter to 1 - 1st batting pos
count_i1 = 1
count_i2 = 1

Loop through every row and populate batting position as per the logic we’ve discussed, make note that you’ve to treat both innings separately

for i in range(len(df)):
    
    if df['innings'][i] == 1:

        striker = df['striker'][i]
        non_striker = df['non_striker'][i]

        # check if striker in batsman list
        if striker in batsman_list_i1:
            df['batting_pos'][i] = batsman_list_i1[striker]
        else:
            batsman_list_i1[striker] = count_i1
            df['batting_pos'][i] = batsman_list_i1[striker]
            count_i1 += 1

        # add non striker to the list
        if non_striker not in batsman_list_i1:
            batsman_list_i1[non_striker] = count_i1
            count_i1 += 1
            
    elif df['innings'][i] == 2:
        striker = df['striker'][i]
        non_striker = df['non_striker'][i]

        # check if striker in batsman list
        if striker in batsman_list_i2:
            df['batting_pos'][i] = batsman_list_i2[striker]
        else:
            batsman_list_i2[striker] = count_i2
            df['batting_pos'][i] = batsman_list_i2[striker]
            count_i2 += 1

        # add non striker to the list
        if non_striker not in batsman_list_i2:
            batsman_list_i2[non_striker] = count_i2
            count_i2 += 1
        

Let’s see the final outcome

You can also print the entire batting order of innings 1:

Finally, the whole code looks like this

# import pandas
import pandas as pd

# read the dataset
df = pd.read_csv('ipl_csv2/335982.csv')

# sort the data 
df = df.sort_values(['innings', 'ball'], ascending=True)

# create a col for batting pos
df['batting_pos'] = None

# initialize a dict to store batting pos for both innings
batsman_list_i1 = {}
batsman_list_i2 = {}

# counter to 1 - 1st batting pos
count_i1 = 1
count_i2 = 1

for i in range(len(df)):
    
    if df['innings'][i] == 1:

        striker = df['striker'][i]
        non_striker = df['non_striker'][i]

        # check if striker in batsman list
        if striker in batsman_list_i1:
            df['batting_pos'][i] = batsman_list_i1[striker]
        else:
            batsman_list_i1[striker] = count_i1
            df['batting_pos'][i] = batsman_list_i1[striker]
            count_i1 += 1

        # add non striker to the list
        if non_striker not in batsman_list_i1:
            batsman_list_i1[non_striker] = count_i1
            count_i1 += 1
            
    elif df['innings'][i] == 2:
        striker = df['striker'][i]
        non_striker = df['non_striker'][i]

        # check if striker in batsman list
        if striker in batsman_list_i2:
            df['batting_pos'][i] = batsman_list_i2[striker]
        else:
            batsman_list_i2[striker] = count_i2
            df['batting_pos'][i] = batsman_list_i2[striker]
            count_i2 += 1

        # add non striker to the list
        if non_striker not in batsman_list_i2:
            batsman_list_i2[non_striker] = count_i2
            count_i2 += 1
        

[post-views]

Scroll to Top