Original Post on 538

Lucky Derby

The Kentucky Derby is on Saturday, and a field of 20 horses is slated to run “the fastest two minutes in sports” in pursuit of the right to be draped with a blanket of roses. But let’s consider, instead, the Lucky Derby, where things are a little more bizarre:

The bugle sounds, and 20 horses make their way to the starting gate for the first annual Lucky Derby. These horses, all trained at the mysterious Riddler Stables, are special. Each second, every Riddler-trained horse takes one step. Each step is exactly one meter long. But what these horses exhibit in precision, they lack in sense of direction. Most of the time, their steps are forward (toward the finish line) but the rest of the time they are backward (away from the finish line). As an avid fan of the Lucky Derby, you’ve done exhaustive research on these 20 competitors. You know that Horse One goes forward 52 percent of the time, Horse Two 54 percent of the time, Horse Three 56 percent, and so on, up to the favorite filly, Horse Twenty, who steps forward 90 percent of the time. The horses’ steps are taken independently of one another, and the finish line is 200 meters from the starting gate.

Handicap this race and place your bets! In other words, what are the odds (a percentage is fine) that each horse wins?

Solution

We will be generating random numbers to simulate results of random chance, so import the necessary python module

In [1]:
from __future__ import print_function, division
import random

First, we need to create objects to track each horse's attributes, and track position, and determine whether the horse has completed the race. This will make it a bit easier to do bookkeeping, although the object-oriented focus isn't strictly necessary.

In [2]:
class Horse(object):
    def __init__(self, forward_chance, race_length):
        # Save the forward chance % into the object so we can use it later
        self.forward_chance = forward_chance
        # Initialize the horse's distance at the beginning of the race
        self.distance = 0
        # Save the full distance of the race, so we can determine whether this horse has won
        self.race_length = race_length
        
    def take_step(self):
        # Generate random number and compare against random chance of moving forward
        if random.random() <= self.forward_chance:
            self.distance += 1
        else:
            self.distance -= 1
            
    def finished(self):
        # Determine whether horse has moved equal or more than the total length of the race
        if self.distance >= self.race_length:
            return True
        else:
            return False

With our Horse object created, we can now create all the horses, and run a race by simulating each second and waiting until at least one horse has completed the full length of the race.

In [3]:
def run_race(length, n_horses=20):
    timer = 0 # Just for fun, we can keep track of how long it takes this race to complete
    horses = [Horse(0.52 + x * 0.02, length) for x in range(n_horses)]

    # Run race until at least one horse has completed
    while len([h for h in horses if h.finished()]) == 0:
        # Move all the horses
        for h in horses:
            h.take_step()
        # Increment the counter
        timer += 1
    
    # Once race is complete, print the winning horse and race duration
    winner = [h.forward_chance for h in horses if h.finished()]
    return winner, timer

winner, timer = run_race(200)
print(winner, timer)
[0.9] 250

Now that we can simulate one race, we need to just loop a number of times through the same simulation and keep track of who wins each time.

In [25]:
# Create a dictionary with each horse's forward percentage, and increment the counter when a horse wins
results = {0.52 + x * 0.02: 0 for x in range(20)}
timers = []
for i in range(100000):
    winners, timer = run_race(200)
    timers.append(timer)
    # There could be ties, so we split up a full win between all winners in this case
    for w in winners:
        results[w] += 1 / len(winners)
        
# Print out each horse, and the percentage of all races that they won
for k in sorted(results):
    print('%.2f: %.3f%%' % (k, results[k] / sum(results.values()) * 100))
0.52: 0.000%
0.54: 0.000%
0.56: 0.000%
0.58: 0.000%
0.60: 0.000%
0.62: 0.000%
0.64: 0.000%
0.66: 0.000%
0.68: 0.000%
0.70: 0.000%
0.72: 0.000%
0.74: 0.000%
0.76: 0.000%
0.78: 0.002%
0.80: 0.006%
0.82: 0.116%
0.84: 0.856%
0.86: 5.005%
0.88: 21.833%
0.90: 72.183%

So there we have it. Horses with a forward chance < 78% have a less than 1:100,000 shot of winning the race (since we didn't observe a single victory in our 10,000 simulations), while the favored victor has a nearly 3:4 chance to win.

Extra Mile

The nice thing about having this simulation, is that you can tweak the input variables slightly and see how that alters the outcome. What would the percentages have been if we only had the worst 10 horses involved instead of all 20?

In [4]:
# Create a dictionary with each horse's forward percentage, and increment the counter when a horse wins
results = {0.52 + x * 0.02: 0 for x in range(10)}
timers = []
for i in range(100000):
    winners, timer = run_race(200, n_horses=10)
    timers.append(timer)
    # There could be ties, so we split up a full win between all winners in this case
    for w in winners:
        results[w] += 1 / len(winners)
        
# Print out each horse, and the percentage of all races that they won
for k in sorted(results):
    print('%.2f: %.3f%%' % (k, results[k] / sum(results.values()) * 100))
0.52: 0.000%
0.54: 0.000%
0.56: 0.000%
0.58: 0.000%
0.60: 0.001%
0.62: 0.041%
0.64: 0.667%
0.66: 4.833%
0.68: 22.451%
0.70: 72.006%

The results here are nearly identical to those above, with the top horse winning roughly 72% of the time, the next winning 22% of the time, and the third horse winning 5% of the time, with others winning smaller amounts. It seems as though only the top 6 horses will win more than 1:100,000 times, regardless of the size of the field.


Now what happens if we change the length of the race to 50m instead of 200? Because the higher forward percentage should compound as the race gets longer, shortening the challenge might alter the winning percentages. Our simulation can check.

In [5]:
# Create a dictionary with each horse's forward percentage, and increment the counter when a horse wins
results = {0.52 + x * 0.02: 0 for x in range(20)}
timers = []
for i in range(100000):
    winners, timer = run_race(50, n_horses=20)
    timers.append(timer)
    # There could be ties, so we split up a full win between all winners in this case
    for w in winners:
        results[w] += 1 / len(winners)
        
# Print out each horse, and the percentage of all races that they won
for k in sorted(results):
    print('%.2f: %.3f%%' % (k, results[k] / sum(results.values()) * 100))
0.52: 0.000%
0.54: 0.000%
0.56: 0.000%
0.58: 0.000%
0.60: 0.000%
0.62: 0.000%
0.64: 0.000%
0.66: 0.001%
0.68: 0.001%
0.70: 0.006%
0.72: 0.024%
0.74: 0.078%
0.76: 0.231%
0.78: 0.553%
0.80: 1.392%
0.82: 3.212%
0.84: 6.668%
0.86: 13.652%
0.88: 25.802%
0.90: 48.379%

As expected, changing the length of the field lowers the skew, and allows more underdogs to have a fighting chance at winning.

What are ways to improve the performance of this simulation? What are other scenarios we could alter the race to find something new? Using only this simulation, can you determine the true mathematical solution for each horse's winning percentage as a function of forward percentage, number of horses, and length of race?