parsing – Page 4 – Tarik Billa

Converting a hexadecimal string to a decimal integer

March 27, 2023 by Tarik

In most cases, you want to parse more than one hex byte at once. In those cases, use the hex crate. parse this into an integer You want to use from_str_radix. It’s implemented on the integer types. use std::i64; fn main() { let z = i64::from_str_radix(“1f”, 16); println!(“{:?}”, z); } If your strings actually have … Read more

Can this language be described by a non-ambiguous BNF grammar?

March 21, 2023 by Tarik

It’s not always easy (or even possible) to demonstrate that a grammar is ambiguous, but if there is a short ambiguous sentence, then it can be found with brute-force enumeration, which is what I believe that tool does. And the output is revealing; the shortest ambiguous sentence is the empty string. So it remains only … Read more

How do I know if PDF pages are color or black-and-white?

March 11, 2023 by Tarik

This is one of the most interesting questions I’ve seen! I agree with some of the other posts that rendering to a bitmap and then analyzing the bitmap will be the most reliable solution. For simple PDFs, here’s a faster but less complete approach. Parse each PDF page Look for color directives (g, rg, k, … Read more

How to identify whether a grammar is LL(1), LR(0) or SLR(1)?

February 8, 2023 by Tarik

To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair. FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be … Read more

What is parsing in terms that a new programmer would understand? [closed]

January 26, 2023 by Tarik

I’d explain parsing as the process of turning some kind of data into another kind of data. In practice, for me this is almost always turning a string, or binary data, into a data structure inside my Program. For example, turning “:Nick!User@Host PRIVMSG #channel :Hello!” into (C) struct irc_line { char *nick; char *user; char … Read more

Writing a parser like Flex/Bison that is usable on 8-bit embedded systems

January 24, 2023 by Tarik

If you want an easy way to code parsers, or you are tight on space, you should hand-code a recursive descent parser; these are essentially LL(1) parsers. This is especially effective for languages which are as “simple” as Basic. (I did several of these back in the 70s!). The good news is these don’t contain … Read more

Difference between an LL and Recursive Descent parser?

January 20, 2023 by Tarik

LL is usually a more efficient parsing technique than recursive-descent. In fact, a naive recursive-descent parser will actually be O(k^n) (where n is the input size) in the worst case. Some techniques such as memoization (which yields a Packrat parser) can improve this as well as extend the class of grammars accepted by the parser, … Read more

What is the difference between an Abstract Syntax Tree and a Concrete Syntax Tree?

December 27, 2022 by Tarik

A concrete syntax tree represents the source text exactly in parsed form. In general, it conforms to the context-free grammar defining the source language. However, the concrete grammar and tree have a lot of things that are necessary to make source text unambiguously parseable, but do not contribute to actual meaning. For example, to implement … Read more

Why doesn’t Haskell’s Prelude.read return a Maybe?

December 25, 2022 by Tarik

Edit: As of GHC 7.6, readMaybe is available in the Text.Read module in the base package, along with readEither: http://hackage.haskell.org/packages/archive/base/latest/doc/html/Text-Read.html#v:readMaybe Great question! The type of read itself isn’t changing anytime soon because that would break lots of things. However, there should be a maybeRead function. Why isn’t there? The answer is “inertia”. There was a … Read more

Difference between constituency parser and dependency parser

December 18, 2022 by Tarik

A constituency parse tree breaks a text into sub-phrases. Non-terminals in the tree are types of phrases, the terminals are the words in the sentence, and the edges are unlabeled. For a simple sentence “John sees Bill”, a constituency parse would be: Sentence | +————-+————+ | | Noun Phrase Verb Phrase | | John +——-+——–+ … Read more