Efficient way to search a stream for a string

There are three good solutions here:

  1. If you want something that is easy and reasonably fast, go with no buffer, and instead implement a simple nondeterminstic finite-state machine. Your state will be a list of indices into the string you are searching, and your logic looks something like this (pseudocode):

    String needle;
    n = needle.length();
    
    for every input character c do
      add index 0 to the list
      for every index i in the list do
        if c == needle[i] then
          if i + 1 == n then
            return true
          else
            replace i in the list with i + 1
          end
        else
          remove i from the list
        end
      end
    end
    

    This will find the string if it exists and you will never need a
    buffer.

  2. Slightly more work but also faster: do an NFA-to-DFA conversion that figures out in advance what lists of indices are possible, and assign each one to a small integer. (If you read about string search on Wikipedia, this is called the powerset construction.) Then you have a single state and you make a state-to-state transition on each incoming character. The NFA you want is just the DFA for the string preceded with a state that nondeterministically either drops a character or tries to consume the current character. You’ll want an explicit error state as well.

  3. If you want something faster, create a buffer whose size is at least twice n, and user Boyer-Moore to compile a state machine from needle. You’ll have a lot of extra hassle because Boyer-Moore is not trivial to implement (although you’ll find code online) and because you’ll have to arrange to slide the string through the buffer. You’ll have to build or find a circular buffer that can ‘slide’ without copying; otherwise you’re likely to give back any performance gains you might get from Boyer-Moore.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)