John Derbyshire has another cute end-of-the-month maths puzzle
. Here's how he puts it:
Consider the powers of two: 2, 4, 8, 16, 32, 64, 128... Their right-most digits follow a simple repetitive pattern: 2, 4, 8, 6, 2, 4, 8, 6, 2, 4, 8, 6... What about their left-most digits, though? Here are the first 40-odd [actually, it's the first 149 - Ed.]. (Read right to left, line by line. I've just jammed the digits together, leaving out the courtesy commas, to save space.)
There is a pattern there, but it keeps breaking down. When you get up into really big powers of 2, in fact, it breaks down altogether. For the ten powers of 2 from the 30th to the 39th, for instance, we have lead digits 1, 2, 4, 8, 1, 3, 6, 1, 2, 5. For the ten powers from the billionth to the 1,000,000,009th, by contrast, we have lead digits 4, 9, 1, 3, 7, 1, 2, 5, 1, 2. Every digit (except, of course, zero) shows up, though.
I am going to refer to this infinite sequence of digits as "the sequence." Now I am going to ask the following two questions: In the first N digits of the sequence, how many occurrences of 3 shall I find? And how many occurrences of 4?
Call the first number U(N), the second V(N). The first few values of U(N) and V(N), for N from 1 to 15, look like this, as you can easily verify:
U(N) = 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,...
V(N) = 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,...
Show that for sufficiently large N, U(N) will always be bigger than V(N). Find the limit of the ratio U(N) to V(N) as N tends to infinity.
To answer these questions we need to know how to calculate the leading (leftmost) digits of the powers of two. We could just start off by raising two to succesively higher powers: 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131071,... but this won't tell us anything about the distribution of that leading digit in general. To get a direct formula for the leading digit we will need to use logarithms.
First let's review a few properties of logarithms (or logs). To start, here's the definition. The logarithm to base b
is that number that when you raise b
to it, you get x
. It is written logb x
. A couple of examples: log8
512 = 3, because 83
= 512. Similarly, log3.66
12.8901 = 1.97035 beacuse 3.661.97035
= 12.8901. And blogb x
A second important property is that addition of logs is like multiplication. log (x y) = log x + log y. So, log2
32 = 5 = log2
4 + log2
8 = 2 + 3.
Thirdly, raising a number to a power is like multiplying a log by a constant. For example, logb xy
OK, so now we can derive our formula for the leading digit of 2x
. If we take log10
, we get x
2 (in other words, 10x log10 2
). This will be some decimal number, like 4.21442 (log10
16384). We can split this into two pieces, by our second property: the integer part and the fractional part. The integer part represents a power of ten, in our example 104
= 10,000. The fractional part represents the amount by which you have to multiply this power of ten to get our original number back, in this case 100.21442
= 1.6384. And therein lies our answer. Let's introduce two new functions, called floor
. The 'floor' of x
is the largest integer that is smaller than x
. So floor(3.2) is 3, floor(-4.6) is -5. The 'frac' function just takes the fractional part of a number: frac(4.386743) = 0.386743.
If we raise 10 to the power frac(x
2), we get some number. If we then take the floor of this number, we get the leading digit of 2x
. So our final function is floor(10frac(x log10 2)
The frac function returns a value between 0 and 1. But does frac(x
2) take on all values between 0 and 1? Yes. The reason for this is highly theoretical and depends on a thing called Weyl's criterion. Without going into too much detail, Weyl's criterion says that frac(x y
is an integer) takes on all values between 0 and 1 as x
gets bigger if y
is irrational (the technical term is frac(x y
) is equidistributed and dense in [0,1]). Another property of logarithms is that logb a
is irrational if a
are integers and one of them has a prime factor that the other does not. So log10
2 is irrational. If we raise 10 to the power of the frac function as x
gets bigger and bigger, we eventually get a smooth curve that varies between 1 and 10. Applying the floor function turns it into a 'staircase', with risers of height 1 and treads of varying widths. Each tread is a power of two with the corresponding lead digit. The widths decrease as we go up. As they get narrower, the chance that a random power of x
will have that leading digit goes down. So we have the answer to Derb's first question: for sufficiently large N
, powers of two start with a three more often than a four. How much more often? About 1.29 times more often, but that will have to wait for an update.
What are the probabilities of a power of two having a given leading digit?. In other words, what are the widths of the treads in the staircase function we obtained by applying floor to 10x
? The function first takes a step up when 10x
= 2, or x
2. That value is approximately 0.30103. So roughly 30% of powers of two have leading digit 1. The next step has width log10
3 - log10
2 = log10
3/2 ~ 0.1761, and so on. The ratio of the third step to the fourth step (in Derb's question, the ratio of U(N) to V(N)) is log10
4/3 divided by log10
5/4 ~ 1.2892.
If we actually calculate the leading digits, for the first 100,000 powers of two the number of results with leading digit 3 and 4 are 12492 and 9692 respectively, which is a ratio of 1.2889, pretty close to the theoretical value. As we take more and more powers, the result will get even closer (although working with the results gets hard: 21000000
has around 300,000 digits).