Easiest way is to use something like this which the dump (in short is the text version of viewable HTML).
Remote file:
lynx --dump www.google.com > file.txt
links -dump www.google.com
Local file:
lynx --dump ./1.html > file.txt
links -dump ./1.htm
With charset conversion to utf8 (see):
lynx -dump -display_charset UTF-8 ./1.htm
links -dump -codepage UTF-8 ./1.htm