Skip to the content of the web site.

Project H.1: Sieve of Eratosthenes

The problem

Suppose you wanted to find all prime numbers less than or equal to a given number $N$. The easiest way to do this is to start at $p = 2$ and then to flag all multiples of $2$ as not being prime. You then search for the next highest number $p$ that is not flagged as being not prime, and then flag all multiples of $p$ that are greater than or equal to $p^2$ as being not prime (any multiples between $2p$, $3p$, ..., $(p - 1)p$ have been flagged by previous multiples). Repeat until you find a $p$ that exceeds $p > \sqrt{N}$ (that is, $p^2 > N$). For example, suppose we want to find all prime numbers from $2$ to $150$. First we list all the numbers from $2$ to $150$, as is shown in Figure 1.


Figure 1. The numbers from $2$ up to $N = 150$.

We will iterate up to $\left \lfloor \sqrt{150} \right \rfloor = 12$ starting with $p = 2$. We take out all multiples of $2$ after and including $2^2 = 4$, as is shown in Figure 2.


Figure 2. All multiples of $p = 2$ greater than or equal to $2^2 = 4$ are marked as not prime.

Continuing, the next number flagged as not prime is $p = 3$, and so we flag as not prime all multiples of $3$ after and including $3^2 = 9$, as is shown in Figure 3.


Figure 3. All multiples of $p = 3$ greater than or equal to $3^2 = 9$ are marked as not prime.

Continuing, the next number flagged as not prime is $p = 5$, and so we flag as not prime all multiples of $5$ after and including $5^2 = 25$, as is shown in Figure 4.


Figure 4. All multiples of $p = 5$ greater than or equal to $5^2 = 25$ are marked as not prime.

Continuing, the next number flagged as not prime is $p = 7$, and so we flag as not prime all multiples of $7$ after and including $7^2 = 49$, as is shown in Figure 5.


Figure 5. All multiples of $p = 7$ greater than or equal to $7^2 = 49$ are marked as not prime.

Continuing, the next number flagged as not prime is $p = 11$, and so we flag as not prime all multiples of $11$ after and including $11^2 = 121$, as is shown in Figure 6.


Figure 6. All multiples of $p = 11$ greater than or equal to $11^2 = 121$ are marked as not prime.

After this, $12$ is flagged as not prime, and $13 > \sqrt{150}$, so we are finished, and all remaining unflagged entries must be prime, as shown in Figure 7.


Figure 7. All remaining unflagged entries are also prime, including $13$, $17$, $19$, $23$, $29$, $31$, $37$, $41$, $43$, $47$, $53$, $59$, $61$, $67$, $71$, $73$, $79$, $83$, $89$, $97$, $101$, $103$, $107$, $109$, $113$, $127$, $131$, $137$, $139$, $149$.

Note, we did not have to visit $2 \times 13$, $3 \times 13$, etc., because $26$ would have been flagged when $p = 2$, $39$ would have been flagged when $p = 3$, etc.

This method called the Sieve of Eratosthenes (err-a-toss—then-nees) and it is very fast. It requires an array of capacity $N + 1$, but in total, no more than $N \ln(\ln(N))$ numbers will be flagged (although some numbers that are not prime will be flagged multiple times).

For example, there are $50 847 534$ prime numbers less than one billion. To run the Sieve of Eratosthenes, this requires that only $2 549 489 372$ non-primes be flagged, although some will be flagged multiple times.

Minimal requirements

To complete this project, you must be familiar with the built-in types, integer arithmetic, for-loops and arrays.

Projects

The possible projects include:

  1. Project a. Printing out all primes up to and including $N$.
  2. Project b. Returning a dynamically allocated Boolean-valued array of capacity $N + 1$ with all entries corresponding to prime numbers assigned true with all others assigned false.
  3. Project c. Return a dynamically allocated instance of an array_t data structure containing all prime numbers up to and including $N$.
  4. Project d. Return a dynamically allocated instance of an Array class containing all prime numbers up to and including $N$.
  5. Project e. Repeat the previous project internally use the std::vector<bool> class and return an instance of the std::vector<unsigned long> class.

Optimization

If you wish, you can implement the following optimizations into any of the previous projects:

  1. Optimization i. Skipping even numbers.

Additional comments

1. As an aside, why is $1$ not prime? In general, many of the most interesting properties of prime numbers are not true if $1$ is considered to be prime. For example, that every number has a unique prime factorization is false if $1$ is a prime, for $6 = 2 \times 3 = 2 \times 3 \times 1 = 2 \times 3 \times 1 \times 1 \times \cdots$. Thus, if $1$ were said to be prime, then almost every interesting theorem about prime numbers would have to be worded as all prime numbers except for $1$.

2. The number of primes less than or equal to $N$ will be approximately $\frac{N}{\ln(N)}$. This estimates there are approximately $72 382$ prime numbers less than or equal to one million. There are in fact $78 498$ primes less than or equal to one million, so this approximation is not unreasonable; however, it is not exact. This same formula predicts that there are $48 254 942$ primes less than one billion.

3. The number of integers that will be flagged as being not prime, including duplicate flaggings, will be less than $N \ln(\ln(N))$. Therefore, finding all prime numbers less than one million will require less than three million integers to be flagged. As this is the vast majority of the work that his being done by this algorithm, essentially says that: if we find all prime numbers up to one million and it takes $20 {\rm ms}$, then finding all prime numbers up to ten million will take $200 {\rm s}$, and finding all prime numbers up to one billion will take only slightly more than $20 {\rm s}$ (assuming you have enough memory).

4. From the previous two points, we may therefore deduce that each non-prime number will be flagged an average of $\ln(\ln(N))$ times. Thus, applying this algorithm to find all prime numbers less than or equal to one billion will see non-prime numbers flagged on average $3.03$ times; which on the whole is not that bad, and if you were to implement Optimization i, this would reduce to each non-prime being flagged on average $1.515$ times.