The looseness of HTML can be accommodated by figuring out the missing open and close tags as needed. This is essentially what a validator like tidy does.
You’ll keep a stack (perhaps implicitly with a tree) of the current context. For example, {<html>, <body>} means you’re currently in the body of the html document. When you encounter a new node, you compare the requirements for that node to what’s currently on the stack.
Suppose your stack is currently just {html}. You encounter a <p> tag. You look up <p> in a table that tells you a paragraph must be inside the <body>. Since you’re not in the body, you implicitly push <body> onto your stack (or add a body node to your tree). Then you can put the <p> into the tree.
Now supposed you see another <p>. Your rules tell you that you cannot nest a paragraph within a paragraph, so you know you have to pop the current <p> off the stack (as though you had seen a close tag) before pushing the new paragraph onto the stack.
At the end of your document, you pop each remaining element off your stack, as though you had seen a close tag for each one.
The trick is to find a good way to represent the context requirements for each element.