Note: This post is also on my blog with my fellow programmers Eric and Young. It’s called YES Programming, using our initials. Check it out if you have time!

Hello! Sang checking-in. Here’s an interesting problem I found:

Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.

Seems really challenging and interesting! I thought of several approaches. First of all, I absolutely know that when someone interviews me with this question, they’re not looking for a solution. They’re looking for how I approach this problem. In fact, solving this with code is NOT the hardest part. We can use simple rejection sampling to solve it.

How does this solution work? So, say we are given a task to write a program that outputs random numbers from1 to 25 in uniform. For this, we can return 5 * (rand5() – 1) + rand5() as in the code.  Now, if I want to return random numbers between 1 and 21 in uniform, I can create a filter so that numbers between 22 to 25 are rejected. So I use this 1~21 then I can filter it so that for each element i, I output i % 7 + 1. This will give me random numbers from 1 to 7. First of all, 1/7 is an infinite decimal in base 5, so I know there will not be any solution that will be in constant time.  

So now let’s get down to the real problem! When a company asks you this question, it does not just end by coding the solution. You need to know that the real problem is in a solution whether you want a perfect distribution with unbounded worst case runtime, or an imperfect distribution with a bounded runtime.

This is, of course, because of what I have said earlier. All powers of 5 are not divisible by 7. For example, say if you have 5^n equal sequences of length n, we cannot assign each sequence a number from [1,7] such that each of 1~7 is equally probable.

I seem like I’m not making sense, so I’ll rephrase. Suppose there exist a solution running in O(1), guaranteed to make no more than N calls to rand5() in a worst case scenario.

This means that there are 5^N possible outcomes of the sequence of calls to rand5, each of which has an output between 1 to 7. Now, add all of the possible sequences of calls whose output is i for each 1≤i≤7.
Then, the probability that the output is i is k/5^N, where m is the number of such sequences. So, k/5^N = 1/7, but there are no possible integer solutions such that the answer is (N,k). Therefore, we reached a contradiction. 

Update: I found this neat explanation at Stack Overflow a few years later. Much better at explaining than I did.

Imagine tacking it up to a dart board and randomly throwing darts at it. If you hit a non-zero value, it’s a statistically random value between 1 and 7, since there are an equal number of non-zero values to choose from. If you hit a zero, just keep throwing the dart until you hit a non-zero.