Python regex parse stream

I had the same problem. The first thought was to implement a LazyString class, which acts like a string but only reading as much data from the stream as currently needed (I did this by reimplementing __getitem__ and __iter__ to fetch and buffer characters up to the highest position accessed…).

This didn’t work out (I got a “TypeError: expected string or buffer” from re.match), so I looked a bit into the implementation of the re module in the standard library.

Unfortunately using regexes on a stream seems not possible. The core of the module is implemented in C and this implementation expects the whole input to be in memory at once (I guess mainly because of performance reasons). There seems to be no easy way to fix this.

I also had a look at PYL (Python LEX/YACC), but their lexer uses re internally, so this wouldnt solve the issue.

A possibility could be to use ANTLR which supports a Python backend. It constructs the lexer using pure python code and seems to be able to operate on input streams. Since for me the problem is not that important (I do not expect my input to be extensively large…), I will probably not investigate that further, but it might be worth a look.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)