Convert (render) HTML to Text with correct line-breaks

The code below works correctly with the example provided, even deals with some weird stuff like <div><br></div>, there’re still some things to improve, but the basic idea is there. See the comments. public static string FormatLineBreaks(string html) { //first – remove all the existing ‘\n’ from HTML //they mean nothing in HTML, but break our … Read more

Convert (render) HTML to Text with correct line-breaks

The code below works correctly with the example provided, even deals with some weird stuff like <div><br></div>, there’re still some things to improve, but the basic idea is there. See the comments. public static string FormatLineBreaks(string html) { //first – remove all the existing ‘\n’ from HTML //they mean nothing in HTML, but break our … Read more

HTML agility pack – removing unwanted tags without removing content?

I wrote an algorithm based on Oded’s suggestions. Here it is. Works like a charm. It removes all tags except strong, em, u and raw text nodes. internal static string RemoveUnwantedTags(string data) { if(string.IsNullOrEmpty(data)) return string.Empty; var document = new HtmlDocument(); document.LoadHtml(data); var acceptableTags = new String[] { “strong”, “em”, “u”}; var nodes = new … Read more

HtmlAgilityPack and HtmlDecode

The Html Agility Pack is equiped with a utility class called HtmlEntity. It has a static method with the following signature: /// <summary> /// Replace known entities by characters. /// </summary> /// <param name=”text”>The source text.</param> /// <returns>The result text.</returns> public static string DeEntitize(string text) It supports well-known entities (like &nbsp;) and encoded characters such … Read more

HTML Agility pack – parsing tables

How about something like: Using HTML Agility Pack HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(@”<html><body><p><table id=””foo””><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>”); foreach (HtmlNode table in doc.DocumentNode.SelectNodes(“//table”)) { Console.WriteLine(“Found: ” + table.Id); foreach (HtmlNode row in table.SelectNodes(“tr”)) { Console.WriteLine(“row”); foreach (HtmlNode cell in row.SelectNodes(“th|td”)) { Console.WriteLine(“cell: ” + cell.InnerText); } } } Note that you can make it prettier with LINQ-to-Objects if … Read more

Html Agility Pack get all elements by class

(Updated 2018-03-17) The problem: The problem, as you’ve spotted, is that String.Contains does not perform a word-boundary check, so Contains(“float”) will return true for both “foo float bar” (correct) and “unfloating” (which is incorrect). The solution is to ensure that “float” (or whatever your desired class-name is) appears alongside a word-boundary at both ends. A … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)