Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

Question

Here’s a summary of Dimitris Andreou’s link.

Remember sum of i-th powers, where i=1,2,..,k. This reduces the problem to solving the system of equations

a₁ + a₂ + … + a_k = b₁

a₁² + a₂² + … + a_k² = b₂

…

a₁^k + a₂^k + … + a_k^k = b_k

Using Newton’s identities, knowing b_i allows to compute

c₁ = a₁ + a₂ + … a_k

c₂ = a₁a₂ + a₁a₃ + … + a_k-1a_k

…

c_k = a₁a₂ … a_k

If you expand the polynomial (x-a₁)…(x-a_k) the coefficients will be exactly c₁, …, c_k – see Viète’s formulas. Since every polynomial factors uniquely (ring of polynomials is an Euclidean domain), this means a_i are uniquely determined, up to permutation.

This ends a proof that remembering powers is enough to recover the numbers. For constant k, this is a good approach.

However, when k is varying, the direct approach of computing c₁,…,c_k is prohibitely expensive, since e.g. c_k is the product of all missing numbers, magnitude n!/(n-k)!. To overcome this, perform computations in Z_q field, where q is a prime such that n <= q < 2n – it exists by Bertrand’s postulate. The proof doesn’t need to be changed, since the formulas still hold, and factorization of polynomials is still unique. You also need an algorithm for factorization over finite fields, for example the one by Berlekamp or Cantor-Zassenhaus.

High level pseudocode for constant k:

Compute i-th powers of given numbers
Subtract to get sums of i-th powers of unknown numbers. Call the sums b_i.
Use Newton’s identities to compute coefficients from b_i; call them c_i. Basically, c₁ = b₁; c₂ = (c₁b₁ – b₂)/2; see Wikipedia for exact formulas
Factor the polynomial x^k-c₁x^k-1 + … + c_k.
The roots of the polynomial are the needed numbers a₁, …, a_k.

For varying k, find a prime n <= q < 2n using e.g. Miller-Rabin, and perform the steps with all numbers reduced modulo q.

EDIT: The previous version of this answer stated that instead of Z_q, where q is prime, it is possible to use a finite field of characteristic 2 (q=2^(log n)). This is not the case, since Newton’s formulas require division by numbers up to k.

Leave a Comment Cancel reply