How does a ‘diff’ algorithm work, e.g. in VCDIFF and DiffMerge? [closed]

An O(ND) Difference Algorithm and its Variations (1986, Eugene W. Myers) is a fantastic paper and you may want to start there. It includes pseudo-code and a nice visualization of the graph traversals involved in doing the diff.

Section 4 of the paper introduces some refinements to the algorithm that make it very effective.

Successfully implementing this will leave you with a very useful tool in your toolbox (and probably some excellent experience as well).

Generating the output format you need can sometimes be tricky, but if you have understanding of the algorithm internals, then you should be able to output anything you need. You can also introduce heuristics to affect the output and make certain tradeoffs.

Here is a page that includes a bit of documentation, full source code, and examples of a diff algorithm using the techniques in the aforementioned algorithm.

The source code appears to follow the basic algorithm closely and is easy to read.

There’s also a bit on preparing the input, which you may find useful. There’s a huge difference in output when you are diffing by character or token (word).

Leave a Comment