Writing a syntax highlighter

Syntax highlighters can work in two very general ways. The first implements a full lexer and parser for the language(s) being highlighted, exactly identifying each token’s type (keyword, class name, instance name, variable type, preprocessor directive…). This provides all the information needed to exactly highlight the code according to some specification (keywords in red, class names in blue, what have you).

The second way is something like the one Google Code Prettify employs, where instead of implementing one lexer/parser per language, a couple of very general parsers are used that can do a decent job on most syntaxes. This highlighter, for example, will be able to parse and highlight reasonably well any C-like language, because its lexer/parser can identify the general components of those kinds of languages.

This also has the advantage that, as a result, you don’t need to explicitely specify the language, as the engine will determine by itself which of its generic parsers can do the best job. The downside of course is that highlighting is less perfect than when a language-specific parser is used.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)