How to count each digit in a range of integers?

Question

There’s a clear mathematical solution to a problem like this. Let’s assume the value is zero-padded to the maximum number of digits (it’s not, but we’ll compensate for that later), and reason through it:

From 0-9, each digit occurs once
From 0-99, each digit occurs 20 times (10x in position 1 and 10x in position 2)
From 0-999, each digit occurs 300 times (100x in P1, 100x in P2, 100x in P3)

The obvious pattern for any given digit, if the range is from 0 to a power of 10, is N * 10^N-1, where N is the power of 10.

What if the range is not a power of 10? Start with the lowest power of 10, then work up. The easiest case to deal with is a maximum like 399. We know that for each multiple of 100, each digit occurs at least 20 times, but we have to compensate for the number of times it appears in the most-significant-digit position, which is going to be exactly 100 for digits 0-3, and exactly zero for all other digits. Specifically, the extra amount to add is 10^N for the relevant digits.

Putting this into a formula, for upper bounds that are 1 less than some multiple of a power of 10 (i.e. 399, 6999, etc.) it becomes: M * N * 10^N-1 + iif(d <= M, 10^N, 0)

Now you just have to deal with the remainder (which we’ll call R). Take 445 as an example. This is whatever the result is for 399, plus the range 400-445. In this range, the MSD occurs R more times, and all digits (including the MSD) also occur at the same frequencies they would from range [0 – R].

Now we just have to compensate for the leading zeros. This pattern is easy – it’s just:

10^N + 10^N-1 + 10^N-2 + … + **10⁰

Update: This version correctly takes into account “padding zeros”, i.e. the zeros in middle positions when dealing with the remainder ([400, 401, 402, …]). Figuring out the padding zeros is a bit ugly, but the revised code (C-style pseudocode) handles it:

function countdigits(int d, int low, int high) {
    return countdigits(d, low, high, false);
}

function countdigits(int d, int low, int high, bool inner) {
    if (high == 0)
        return (d == 0) ? 1 : 0;

    if (low > 0)
        return countdigits(d, 0, high) - countdigits(d, 0, low);

    int n = floor(log10(high));
    int m = floor((high + 1) / pow(10, n));
    int r = high - m * pow(10, n);
    return
        (max(m, 1) * n * pow(10, n-1)) +                             // (1)
        ((d < m) ? pow(10, n) : 0) +                                 // (2)
        (((r >= 0) && (n > 0)) ? countdigits(d, 0, r, true) : 0) +   // (3)
        (((r >= 0) && (d == m)) ? (r + 1) : 0) +                     // (4)
        (((r >= 0) && (d == 0)) ? countpaddingzeros(n, r) : 0) -     // (5)
        (((d == 0) && !inner) ? countleadingzeros(n) : 0);           // (6)
}

function countleadingzeros(int n) {
      int tmp= 0;
      do{
         tmp= pow(10, n)+tmp;
         --n;
         }while(n>0);
         return tmp;
         }

function countpaddingzeros(int n, int r) {
    return (r + 1) * max(0, n - max(0, floor(log10(r))) - 1);
}

As you can see, it’s gotten a bit uglier but it still runs in O(log n) time, so if you need to handle numbers in the billions, this will still give you instant results. 🙂 And if you run it on the range [0 – 1000000], you get the exact same distribution as the one posted by High-Performance Mark, so I’m almost positive that it’s correct.

FYI, the reason for the inner variable is that the leading-zero function is already recursive, so it can only be counted in the first execution of countdigits.

Update 2: In case the code is hard to read, here’s a reference for what each line of the countdigits return statement means (I tried inline comments but they made the code even harder to read):

Frequency of any digit up to highest power of 10 (0-99, etc.)
Frequency of MSD above any multiple of highest power of 10 (100-399)
Frequency of any digits in remainder (400-445, R = 45)
Additional frequency of MSD in remainder
Count zeros in middle position for remainder range (404, 405…)
Subtract leading zeros only once (on outermost loop)

Leave a Comment Cancel reply