Why is bottom-up parsing more common than top-down parsing?

If you choose a powerful parser generator, you can code your grammar without worrying about peculiar properties. (LA)LR means you don’t have to worry about left recursion, one less headache. GLR means you don’t have to worry about local ambiguity or lookahead. And the bottom-up parsers tend to be pretty efficient. So, once you’ve paid … Read more

Practical difference between parser rules and lexer rules in ANTLR?

… what are the practical differences between these two statements in ANTLR … MY_RULE will be used to tokenize your input source. It represents a fundamental building block of your language. my_rule is called from the parser, it consists of zero or more other parser rules or tokens produced by the lexer. That’s the difference. … Read more

Parsing YAML, return with line number

Here’s an improved version of puzzlet’s answer: import yaml from yaml.loader import SafeLoader class SafeLineLoader(SafeLoader): def construct_mapping(self, node, deep=False): mapping = super(SafeLineLoader, self).construct_mapping(node, deep=deep) # Add 1 so line numbering starts at 1 mapping[‘__line__’] = node.start_mark.line + 1 return mapping You can use it like this: data = yaml.load(whatever, Loader=SafeLineLoader)

Writing a parser from scratch in Haskell

It’s actually surprisingly easy to build Parsec-from-scratch. The actual library code itself is heavily generalized and optimized which contorts the core abstraction, but if you’re just building things from scratch to understand more about what’s going on you can write it in just a few lines of code. I’ll build a slightly weaker Applicative parser … Read more

how do I parse an iso 8601 date (with optional milliseconds) to a struct tm in C++?

New answer for old question. Rationale: updated tools. Using this free, open source library, one can parse into a std::chrono::time_point<system_clock, milliseconds>, which has the advantage over a tm of being able to hold millisecond precision. And if you really need to, you can continue on to the C API via system_clock::to_time_t (losing the milliseconds along … Read more

How to get all html data after all scripts and page loading is done? (puppeteer)

If you want full html same as inspect? Here it is: const puppeteer = require(‘puppeteer’); (async function main() { try { const browser = await puppeteer.launch(); const [page] = await browser.pages(); await page.goto(‘https://example.org/’, { waitUntil: ‘networkidle0’ }); const data = await page.evaluate(() => document.querySelector(‘*’).outerHTML); console.log(data); await browser.close(); } catch (err) { console.error(err); } })();

DateTime parsing

Consider using this line: DateTime.ParseExact(Log.Date, “MMM d HH:mm:ss”, CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces); Notice that I removed one of the spaces between the month and the day. That’s because AllowWhiteSpaces literally means: Specifies that s may contain leading, inner, and trailing white spaces not defined by format.

XPath to Parse “SRC” from IMG tag?

You are so close to answering this yourself that I am somewhat reluctant to answer it for you. However, the following XPath should provide what you want (provided the source is XHTML, of course). //img[@class=”photo-large”]/@src For further tips, check out W3 Schools. They have excellent tutorials on such things and a great reference too.

Find a value in JSON using Python

You have to iterate over the list of dictionaries and search for the one with the given id_number. Once you find it you can print the rest of its data and break, assuming id_number is unique. data = [ { “id_number”: “SA4784”, “name”: “Mark”, “birthdate”: None }, { “id_number”: “V410Z8”, “name”: “Vincent”, “birthdate”: “15/02/1989” }, … Read more