Efficient way to search a stream for a string

Question

There are three good solutions here:

If you want something that is easy and reasonably fast, go with no buffer, and instead implement a simple nondeterminstic finite-state machine. Your state will be a list of indices into the string you are searching, and your logic looks something like this (pseudocode):
```
String needle;
n = needle.length();

for every input character c do
  add index 0 to the list
  for every index i in the list do
    if c == needle[i] then
      if i + 1 == n then
        return true
      else
        replace i in the list with i + 1
      end
    else
      remove i from the list
    end
  end
end
```
This will find the string if it exists and you will never need a
buffer.
Slightly more work but also faster: do an NFA-to-DFA conversion that figures out in advance what lists of indices are possible, and assign each one to a small integer. (If you read about string search on Wikipedia, this is called the powerset construction.) Then you have a single state and you make a state-to-state transition on each incoming character. The NFA you want is just the DFA for the string preceded with a state that nondeterministically either drops a character or tries to consume the current character. You’ll want an explicit error state as well.
If you want something faster, create a buffer whose size is at least twice n, and user Boyer-Moore to compile a state machine from needle. You’ll have a lot of extra hassle because Boyer-Moore is not trivial to implement (although you’ll find code online) and because you’ll have to arrange to slide the string through the buffer. You’ll have to build or find a circular buffer that can ‘slide’ without copying; otherwise you’re likely to give back any performance gains you might get from Boyer-Moore.

Leave a Comment Cancel reply