What does an escaped ampersand mean in Haskell?

It escapes… no character. It is useful to “break” some escape sequences. For instance we might want to express “\12” ++ “3” as a single string literal. If we try the obvious approach, we get “\123” ==> “{” We can however use “\12\&3” for the intended result. Also, “\SOH” and “\SO” are both valid single … Read more

Poor man’s “lexer” for C#

The original version I posted here as an answer had a problem in that it only worked while there was more than one “Regex” that matched the current expression. That is, as soon as only one Regex matched, it would return a token – whereas most people want the Regex to be “greedy”. This was … Read more

When parsing Javascript, what determines the meaning of a slash?

It’s actually fairly easy, but it requires making your lexer a little smarter than usual. The division operator must follow an expression, and a regular expression literal can’t follow an expression, so in all other cases you can safely assume you’re looking at a regular expression literal. You already have to identify Punctuators as multiple-character … Read more

Lexer written in Javascript? [closed]

Something like http://jscc.phorward-software.com/, maybe? JS/CC is the first available parser development system for JavaScript and ECMAScript-derivates. It has been developed, both, with the intention of building a productive compiler development system and with the intention of creating an easy-to-use academic environment for people interested in how parse table generation is done general in bottom-up parsing. … Read more

Where can I learn the basics of writing a lexer?

Basically there are two main approaches to writing a lexer: Creating a hand-written one in which case I recommend this small tutorial. Using some lexer generator tools such as lex. In this case, I recommend reading the tutorials to the particular tool of choice. Also I would like to recommend the Kaleidoscope tutorial from the … Read more

Looking for a clear definition of what a “tokenizer”, “parser” and “lexers” are and how they are related to each other and used?

A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context to the tokens — this token is a number, that token is a string literal, this other token is an equality operator. A parser takes … Read more

lexers vs parsers

What parsers and lexers have in common: They read symbols of some alphabet from their input. Hint: The alphabet doesn’t necessarily have to be of letters. But it has to be of symbols which are atomic for the language understood by parser/lexer. Symbols for the lexer: ASCII characters. Symbols for the parser: the particular tokens, … Read more