Skip to the content of the web site.

Project B.4: Approximating uniformly distributed events

You are aware that:

  1. if you flip a fair coin, both heads and tails have a 50-50 chance of appearing,
  2. if you roll a fair die, each face has a one-in-six chance of coming out on top,
  3. if you pick a random card from a standard deck, each card has a one-in-52 chance of appearing, and
  4. if you occasionally, say once an hour, look up at the clock and see how many seconds you are into the current minute, you will likely see every value between $0$ and $59$ with equal probability: 1 in 60.

In each case, every event (the coin landing on heads, rolling a six, or choosing the jack of diamonds) has an equal chance, and because there are only a finite number of $n$ different possibilities, each event has a $\frac{1}{n}$ chance of appearing. You may have shown this using bar charts where the height of the bar is the probability of the event occuring:

Of course, if you are waiting for a bus and 1) you are late for an appointment and 2) the bus is late, then you will be looking at your watch every few seconds, so if your current glance showed you were $40\ {\rm s}$ into the current minute, then it is unlikely that the next time you check it will be between $20\ {\rm s}$ and $40\ {\rm s}$ into the next minute, as you are more likely to check sooner than later. This is because these events (checking your watch to see how late you are) are not independent.

In all of these cases, however, the different events (a head appearing on the coin, a 5 appearing on the die face, a queen of hearts being drawn from the deck or it being 32 seconds into the current minute) are discrete and distinct. Some measurable quantities, however, can take on any real value: for instance, the height of a human being. The problem is, heights are not uniformly distributed: more people are going to be closer to the average height and outliers (those very tall or very short) are progressively rarer in the human population.

There are, however, natural phenomena where every single value on an interval is equally likely:

  1. If a bus comes every half hour, and you get to the bus stop without trying to time your arrival, you are just as likely to wait between $0$ and $1$ minute as you are to wait between $25$ and $26$ minutes.
  2. If you measure the height of a person to the closest centimetre, your error will be some value between $-0.5\ {\rm cm}$ and $0.5\ {\rm cm}$, and it is just as likely that your error will be between $-0.3\ {\rm cm}$ and $-0.1\ {\rm cm}$ as it will be between $0.2\ {\rm cm}$ and $0.4\ {\rm cm}$.
  3. If you forcefully spin the spinner in the game Twister (shown here), then if you measure the angle where the needle comes to a rest, then it is equally likely for the needle to come to rest between 10° and 20° as it is to come to rest between 250° and 260°, and it is twice as likely that it will come to rest between 170° and 190°.
  4. An analog-to-digital converter (ADC) takes a voltage and converts it into a binary number, usually through truncation. Thus, a voltage of $4.2\ {\rm V}$ may be converted to the truncated binary value $100.001$, equalling $4.125$. If the ADC produces three bits beyond the radix point, then the error will be uniformly distributed between $0$ and $0.125$.

Now, engineers must deal with the real world, and one of the best ways to test algorithms is to simulate real-world input. Sometimes, however, it is difficult to get access to sufficient real-world data, and therefore engineers will often simulate real-world events with simulated events. These simulators must be sufficiently true to reality, otherwise they are useless. Consider for example a flight simulator for training pilots where the behavior of the simulator was significantly different from the actual aircraft. This would be worse than no simulation at all, for the pilots training in the simulator would then perform the wrong responses at critical moments in actual flight. Similarly, simulating the wrong voltage fluctuations being sent by a voltage source may suggest your circuit design is robust; however, when the circuit is introduced to real-world voltages, it may result in, for example, an exploding battery.

Suppose you wanted to simulate a uniform distribution on the interval $[a, b]$. For this, you should produce a number that has equal probability of falling at any point in the interval and never outside the interval.

Now, the std::rand() function returns a random integer between the value of 0 and RAND_MAX inclusive. Thus, if we convert this the value produced to a double-precision floating- point number and divide by RAND_MAX to get a value on the interval $[0, 1]$.

	double x{((double) rand())/RAND_MAX};

Next, we want to map this to an interval $[a, b]$. Now, if we take every point on the interval $[0, 1]$ and multiply it by the linear polynomial $cx + d$, we get the interval $[d, c + d]$. If we want this to equal $[a, b]$, then $d = a$ and $c + d = b$, so substituting the first into the second, we get that $c = b - a$:

	// Generate a random value on the interval [a, b]
	double x{(b - a)*((double) rand())/RAND_MAX + a};

This will give us a random point on the interval $[a, b]$.

A little mathematics

For discrete events, the result can only be one of a fixed number of outcomes. For continuous events, like those described above, the probability of any one event happening is essentially zero. For example, if you asked "Will I have to wait $13\ {\rm min}\ 32.5923 {\rm s}$?" the answer is the likelihood is zero. However, if you ask "Will I have to wait $13\ {\rm min}\ 32 {\rm s}$?" meaning, "Will I have to wait at least $13\ {\rm min}\ 32 {\rm s}$ but no more than $13\ {\rm min}\ 33 {\rm s}$, then the answer is one in 1800.

Questions and practice:

1. Write a function double uniform( double a, double b ) that returns a random point on the interval $[a, b]$.

2. In some cases, it may be absolutely necessary to ensure that the value returned is never one of the end points. Write a new function double uniform_open( double a, double b ) that returns a random point on the open interval $(a, b)$.

For this question, this is the wrong implementation:

	int r;

	do {
		r = rand();
	} while ( (r != 0) && (r != RAND_MAX) );

	double x = (b - a)*((double) r)/RAND_MAX + a;

3. Test your function by writing a test file that divides the interval $[a, b]$ in to 10 sub-intervals of equal width $\frac{b - a}{10}$. You will then generate $N$ random values in the interval $[a, b]$ and keep track of which interval it falls in. You will then print the proportion of each points that fall into each interval. For example, if $N = 100$, your output may look like

             0.09 0.07 0.12 0.11 0.14 0.08 0.11 0.07 0.09 0.12

4. Following up on Question 3 with $N = 100$, how often do you have to run your test before you get exactly the follow distribution?

             0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

Hint: you will probably find that you never actually get this ideal distribution.

5. Following up on Question 4, how often do you have to run your test before all the values are one of $0.09$, $0.1$ or $0.11$, but no others?