Microsoft Advertising for Linux?

"Disraeli was pretty close: actually, there are Lies, Damn lies, Statistics, Benchmarks, and Delivery dates." (from fortune)

The recent tests done at ZDLabs turned up some interesting results. They were of course presented under the auspices of Windows NT is faster than Linux, which was, strictly speaking, true. Now, it doesn't really matter what the testing software was, or what the testing hardware was. I don't really care, for the moment at least, how honest the test was. I expect that it was at least somewhat honest since some RedHat people were on the scene. I'm interested in how we interpret the results.

Now, there is the face value of the results, that is, that windows NT is faster than Linux, thus better, and hence that in any given situation, NT is better to use than Linux. There's also the option, of course, to actually look at what the tests found. What are some of the actual facts that the tests came up with? Here are some important ones that I found (pretty color graphs aside):

Linux looks pretty slow, doesn't it? Who would use it for any real application? Well, let's examine this situation a bit more than just comparatively. First off, let's just look at an approximation of the situation that this represents:

So Linux/Apache should be able to handle your site on a 4 CPU 1 Gig RAM box if you get 159 million hits per day or less. If you get only a measly 113 million hits/day, then a single CPU box with 256 meg of RAM should be able to host your site. Of course, this only works if your access is 100% even which is extremely unrealistic. Let's assume that your busy times get ten times more hits per second than your average hits/second. That means that a single CPU Linux box with 256 meg of RAM should work for you if you get about 11 million hits every day. Heck, let's be more conservative. Let's say that your busy times get 100 times more hits/second than your average hits/second. That means that if you get 1.1 million hits per day or less, that same box will serve your site just fine.

OK, there's that way of looking at it, but it's not really a good way. It's a very coarse approximation of access patterns and what a site needs. Let's try another way of looking at this.

Let's do some simple calculations to see what sort of bandwidth these numbers mean. Bandwidth will be a better and more constant method of determining who these numbers apply to than guessed at hit ratios.

The ZDNet page said that the files served were of "varying sizes", so we'll have to make some assumptions about the average size of the files being served. Since over 1000 files were served per second in all of the tests, it's pretty safe to work by averages. Some numbers:

Just as a reference, a T1 line is worth approximately 1.5 MBits/sec, these numbers don't include TCP/IP & HTTP overhead, and this document is approximately 12k.

Now, what does this tell us? Well, that if you are serving up 1,314 pages per second where the average page is only 1 kilobyte, you'll be needing 10 T1 lines or the equivalent until the computer is the limiting factor. What site on earth is going to be getting a sustained >1000 hits per second for 1 kilobyte files? Certainly not one with any graphics in it. Let's assume that you're running a site with graphics in it and that you're average file is 5 kilobytes - not too conservative or too liberal. This means that if you're serving up 1,314 of them a second, you'll need 53 MBits of bandwidth. And there are no peak issues here, you can't peak out more than your bandwidth.

Let's go at it another way, this time starting with our available bandwidth:

note: these numbers don't include TCP/IP or HTTP overhead.

I am assuming that the tests that ZD made were meant to mean something, so I won't entertain the idea that they used an average file size of less that 1K. Given that, It is clear that the numbers that ZD's tests produced are only significant when you have the equivalent bandwidth of over 6 T1 lines. Let's be clear about this: if you have only 5 T1 lines or less, a single CPU Linux box with 256 MB RAM will wait on your internet connection and not be able to serve up to its full potential. Let me reemphasize this: ZD's tests prove that a single CPU Linux box with 256 MB RAM running apache will run faster than your internet connection!. Put another way, if your site runs on 5 T1 lines or less, a single CPU Linux box with 256 MB RAM will more than fulfill your needs with CPU cycles left over.

What was just if the ZD numbers were valid for files of only 1K in size. Let's make an assumption that you either (a) have pages with more than about a screen of text or (b)black and white pictures that make your average file size 5K and that ZD's tests accurately reflect this condition. Given this, ZD's tests would indicate that a single CPU Linux box with only 256 MB RAM running Apache would be constantly waiting on your T3 line. In other words, a single CPU Linux box with 256 MB RAM will serve your needs with room to grow if your site is served by a T3 line or less.

One might also conclude that if you serve things like color pictures (other than small buttons and doodads) and thus your average file size is 25K, a single CPU Linux box with 256 MB RAM will serve your site just fine even if you are served by an OC3 line that you have all to your self. I personally wouldn't bet that ZD's tests used such large average file sizes, though. It was a benchmark, after all.

So far, I have been addressing only internet based web serving. I'm not really going to address intranets, but let me ask you this, is it a good sign if your intranet is getting a sustained 1,300 hits per second? Assuming that each page view accounts for 10 hits (the associated pictures, etc.) and that no employee views more than 1 page every five seconds, that means that over any given five second interval, 650 of your employees have viewed one of the pages on your intranet webserver. If this is really the case and is some sort of sustained behavior, why are 650 of your employees looking at your intranet webserver at any given 5 second point in time? Aren't they supposed to be doing some more productive than surfing your intranet?

Now, I'm not saying that this is never going to happen, but if you have the number of employees to generate that sort of hit count are you really going to have them all on the same LAN that they are going to be all going to the same intranet webserver?

So far I haven't mentioned windows NT at all. I mean this paper to be a piece of Linux advocacy and also a bit of realism. Basic math skills can often be helpful in cutting through hype, which I think has been shown so far. But I can't resist one last piece of information that puts these tests in a different light than damning to Linux and Apache.

If the ZD numbers are to be believed about NT's performance, and I see no reason to disbelieve them, the NT server that ZD tested should be able to serve 359.9 million hits per day. According to Microsoft Support Online gets approximately 2.3 million page views a day. Even supposing that each of those page views generates 100 hits each, that would mean that Microsoft Support Online only gets 230 million hits per day, far under what the tested NT server can do. Theoretically, just one NT server like the one that ZD tested should be able to handle this load. Microsoft uses 6.

What's that, I hear someone scream? But Microsoft Support Online involves dynamic content? Well, the ZD test was only about static content. I'm so glad to know that it was relevant to the real world. Aren't you?

Appendix 1

Since I wrote this article, I received the following email message:
Here is a useful number that may help your now-famous web page out:

The mean file size in ZDNet's WebBench (you can download the benchmark
from their site, if you wish) 10342.3 bytes, or 10kb.

In other words, the "lesser" Linux box can handle a bandwidth of 107 Mbps,
and the quad box can handle a bandwidth of 150 Mbps.

- Sam (
added @ 3:38 EST jun 28, 1999.
Christopher Lansdown