25. October 2011 05:05by JP Hellemons
I recently stumbled upon a small bug which had to do with a part of C# code that cleans up an HTML string which came from a database. This string is used as output on the web and therefore needs to be w3c and tidy!
I always used Tidy.Net for it. Really liked it and decided to check for a new version of that library while I was doing some code maintenance. That library's latest release date is from June 2005! that’s over 6 years old!
So I decided to go and look for a better solution. I found the TidyManaged project from June 2010. I wasn’t directly motivated to migrate to this library so my next step was a showdown between the two.
I fired up Visual Studio 2010 and started a new console application. Because ‘the numbers tell the tale’. I used the StopWatch class which is awesome! I have downloaded the HTML source code from a website and passed that to both the Libraries.
static void Main(string args)
WebClient wc = new WebClient();
string testInput = wc.DownloadString("http://www.jphellemons.nl");
Stopwatch sw = new Stopwatch();
Console.WriteLine("Tidy.Net lib from 2005 took: " + sw.ElapsedTicks);
Console.WriteLine("TidyManaged lib from 2010 took: " + sw.ElapsedTicks);
Console.ReadKey(); // to keep console open
So this managed code wrapper for the unmanaged tidy
project’s DLL is always a lot faster! I have the tidy DLL placed in “C:\Windows\system”.
That DLL is 323kb and the unmanaged DLL is 25kb. (together 348kb) The Tidy.Net (which is an older .Net port of the 323kb DLL is 188kb. So that is smaller, but an older library.
If you look at the output of both libraries, you will see that the Tidy.Net library makes smaller html files then the TidyManaged. But the TidyManaged takes inline CSS styles and combines them in the header of your document.
I will attach my sample project, so that you can test the difference yourself.