Finding the hundred largest numbers in a file of a billion

Obviously the interviewers want you to point out two key facts:

  • You cannot read the whole list of integers into memory, since it is too large. So you will have to read it one by one.
  • You need an efficient data structure to hold the 100 largest elements. This data structure must support the following operations:
    • Get-Size: Get the number of values in the container.
    • Find-Min: Get the smallest value.
    • Delete-Min: Remove the smallest value to replace it with a new, larger value.
    • Insert: Insert another element into the container.

By evaluating the requirements for the data structure, a computer science professor would expect you to recommend using a Heap (Min-Heap), since it is designed to support exactly the operations we need here.

For example, for Fibonacci heaps, the operations Get-Size, Find-Min and Insert all are O(1) and Delete-Min is O(log n) (with n <= 100 in this case).

In practice, you could use a priority queue from your favorite language’s standard library (e.g. priority_queue from#include <queue> in C++) which is usually implemented using a heap.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)