Skip to content.

Manageability

Sections
Personal tools
You are here: Home » blog » archive » The Ultimate Java Versus C# Benchmark
Views
  • State: published

The Ultimate Java Versus C# Benchmark

Document Actions
20030520073353

The problem with Cameron Purdy's benchmarks are:

  1. It's not a "Real Life" benchmark.
  2. It's overly simplistic.
  3. It emphasizes numerical computation over symbolic manipulation. 
  4. It doesn't accentuate the performance advantage of Java over C#.

So, without much adieu, I present the "Ultimate Java versus C# Benchmark".  What program could be more "Real Life" than a program about the building blocks of Life itself (i.e. DNA).  This program matches a DNA sequence against a million DNA patterns expressed as regular expressions.   Definitely more complex a problem than computing the average salary of many people, more symbolic ( After all if you want to do computation do it in Fortran), best of all you'll love the results.

The C# code:

using System.Text.RegularExpressions;

using System.Text;

using System;

namespace TestRegex

{

public class RegexBench2

{  

  private static String _doc = 

  "CGAATCTAAAAATAGATTCGGACGTGATGTAGTCGTACAAATGAAAAAGTAAGCC";

  private  static int ITERATIONS = 1000000;

  public static void Main()

  {

    long start = System.DateTime.Now.Ticks / 10000; 

    long end;

    int length = 1;

    for( int i = ITERATIONS; i <= ITERATIONS * 2; i++ )

        {

        length = (int) (Math.Log((double)i)/Math.Log(4)); 

   

        String matchthis = generateWord(i, length + 1);

        Regex regexpr = new Regex(matchthis, RegexOptions.Compiled);

        Boolean b = regexpr.IsMatch(_doc);

        if( b )

                {

                end = System.DateTime.Now.Ticks / 10000;        

                Console.WriteLine("found {0} at {1} it took {2} miliseconds", 

                matchthis, i, end - start );

                }

        }     

    end = System.DateTime.Now.Ticks / 10000;

    Console.WriteLine(".NET regex took {0} miliseconds",end - start);

 } 

public static String generateWord(int value, int length )

 {

  StringBuilder buf = new StringBuilder();

  int current = value;

  for(int i = 0; i < length; i++ )

   {

   int v = current % 4;

   current = current / 4;

   buf.Append( convert(v) );

   }

  return  buf.ToString();

 }

        

private static String convert(int value) 

 {

 switch(value)

  {

  case 0: return "A";

  case 1: return "G";

  case 2: return "T";

  case 3: return "C"; 

  default:

   return "0";

  }

 }

}

}

The Java code:

import java.text.*;

import java.util.regex.*;

public class RegexBench2

{ 

static String _doc = "CGAATCTAAAAATAGATTCGGACGTGATGTAGTCGTACAAATGAAAAAGTAAGCC";

static int ITERATIONS = 1000000;

public static void main(String args[]) {

    long start = System.currentTimeMillis();

    int length = 1;

    for( int i = ITERATIONS; i <= ITERATIONS * 2; i++ ) {

      length = (int) (Math.log((double)i)/Math.log(4)); 

      String matchthis = generateWord(i, length + 1);

      Pattern regexpr = Pattern.compile(matchthis); Matcher matcher = regexpr.matcher(_doc);

      boolean b = matcher.find(); 

      if(b){

          long end = System.currentTimeMillis();

          System.out.println( MessageFormat.format("found {0} at {1} it took {2} miliseconds", 

              new Object[] {matchthis, "" + i, "" + (end-start) } )); }

         } 

      long end = System.currentTimeMillis();

      System.out.println("Java regex took " + (end - start) + " miliseconds");

      } 

     static String generateWord(int value, int length ) {

         StringBuffer buf = new StringBuffer(); int current = value;

         for(int i = 0; i < length; i++ ) {

            int v = current % 4; current = current / 4; buf.append( convert(v) );

            }

         return buf.toString();

        }

       static String convert(int value) {

           switch(value) {

               case 0: return "A"; case 1: return "G"; case 2: return "T"; case 3: return "C";  default: return "0"; } 

      }

}

The code can generally be applied in several real applications.  You could build a spam filter, a RSS categorization engine or even a rule based message broker.  Just to make sure nobody is cheating, matches must be found at 1000000 and 2000000. Also, don't even think of replacing the regex match with a string search, that's tantamount to dumbing down the requirements. 

Here are the results for the Java run:

found AAAGTAAGCC at 1000000 it took 0 miliseconds

found AAATGAAAAAG at 1048960 it took 578 miliseconds

found GAAAAAGTAAG at 1085441 it took 922 miliseconds

found TCTAAAAATAG at 1179694 it took 1844 miliseconds

found ACGTGATGTAG at 1204636 it took 2094 miliseconds

found AATAGATTCGG at 1548576 it took 5328 miliseconds

found TCGTACAAATG at 1576094 it took 5578 miliseconds

found CGGACGTGATG at 1599255 it took 5813 miliseconds

found ATTCGGACGTG at 1689064 it took 6657 miliseconds

found AGATTCGGACG at 1859204 it took 8235 miliseconds

found TGATGTAGTCG at 1984902 it took 9423 miliseconds

found AAATAGATTCG at 2000000 it took 9563 miliseconds

Java regex took 9563 miliseconds

If you run main() inside a loop using hotspot server, the time reduces 6516 miliseconds (oops! spelled that wrong again!).  Also, the memory footprint required for Java is slightly over 7MBytes.

The C# results? Well they're just too embarrassing to post. Suffice to say, it's several orders of magnitude slower! Furthermore, the memory usage exploded to who knows where! I can't believe that people may be thinking of deploying mission critical applications on this virtual machine!

Just like Cameron's benchmark, everything is self contained, you can cut and paste it, compile it and run it yourself.  Better yet, send it to your resident C# expert, have him tweak it as much as he can.  Bring a digital camera,  capturing the anguish in his face when he realizes the truth, would be priceless.

(BTW, did anyone notice? The Java version actually took less lines of code than the C# version, hmmm?)

[update] Cameron speculates about the numbers, he makes a guess that Java may be 100x faster.  The real reason why I didn't post the numbers was that I couldn't get the numbers in time for the post.  Well here it is:

found AAAGTAAGCC at 1000000 it took 31 miliseconds
found AAATGAAAAAG at 1048960 it took 4470117 miliseconds

Unhandled Exception: OutOfMemoryException.

In summary Java is at least 7,733x faster than C#, sorry Cameron!  Better yet it completes the task and doesn't gag of memory!


Last modified 2003-08-17 07:36 AM

C# code

Posted by Anonymous User Anonymous User at 2003-10-23 04:41 AM

I am sorry to say, but your c# code wouldn't even compile, let alone execute. For your information if you tried to compile the c# code it would complain on switch/case statement, because each case statement in c# would have to finish with the break statement.

correct implementation for .NET

Posted by Anonymous User Anonymous User at 2003-11-04 11:14 AM

... is:

Boolean b = Regex.IsMatch(_doc,matchthis,RegexOptions.None );

So, it´s a static function instead of an object (object-creation is expensive!). Now it´s comparable to Java. But it´s still slow (5 times). When you use IndexOf(), you are 2 times faster than then Java-regexp.

So, you got another reason why Java is better than .NET: Better regexp-implementation!!! :-)

Crappy benchmark

Posted by Anonymous User Anonymous User at 2004-01-11 11:39 AM

As you pointed out in your pathetic 101-list (you linked to MSDN where one is told that regex should not be used with compilation when many different regexes are used) .NET can currently not unload appdomains.

Your example ueses several thousand different regexes with compilation. In every loop .NET has to create the code for the individual regex, and then compile it, load it, JIT it and execute it. No wonder that memory-usage goes in the sky and it is slow like hell.

A fair comparison either uses no compilation OR lets the java-version execute javac in every loop (actually .NET has the very pity and poor behaviour to create a DLL in the temp-folder and load that assembly)

just getting started

Posted by Anonymous User Anonymous User at 2005-06-07 12:31 PM

>> (BTW, did anyone notice? The Java version actually took less lines of code than the C# version, hmmm?)

just got shown this today. the first dishonesty in the comparison is the fact that when you format the code the same way in both languages, the c# is 8 lines longer with that dramatic increase comeing from the extra usings and the namespace block stmt. that s huge isn't it? im still working on the performance details, but im sure i will find similar trickery.

Some more results ....

Posted by Anonymous User Anonymous User at 2005-07-07 07:27 AM

Couldn't resist conducting my own little test, here is what i got ;)

Java 1.5 - 9467 ms .NET 1.1 - Finds the first one, then crashes with the OutOfMemoryException .NET 2.0 beta 2 - 31 minutes! (hey at least it completes the task) Mono 1.1.8 - 1½ minute!

LOL! Mono's implementation is faster than Microsoft's own!

Incorrect use of regex

Posted by Anonymous User Anonymous User at 2006-04-09 03:50 AM

As already pointed out, it is no good to run the C# with the RegEx Option RegexOption.Compile on. When run with no options on, the C# runs and finished in 8890 ms. Additionaly, I replaced the switch in convert with a string[] => { "A","G","T","C"}, this way we actually can access these directly instead of using a switch.

The Java version finished in 5031 ms. (standard J2SE 5.0) With the switch replaced with an array, it finishes in 4391ms.

So what does that tell me? 1. The above C# code is poorly written. 2. Java is about twice as fast as C#.

So, go java! =D

Perhaps im missing the point...

Posted by Anonymous User Anonymous User at 2006-04-23 05:23 PM

However if you are doing a straight up Text Sequence Match ( which is what the example code is doing ) then you would use the .IndexOf method off the string object.

Of Course Im probably missing the point.. but you used the term Real World up there...

In Main, make the following change...

Comment : Regex regexpr = new Regex(matchthis.ToString(), RegexOptions.Compiled);

Comment : Boolean b = regexpr.IsMatch(_doc);

Replace : if (B) with : if ( _doc.IndexOf( matchthis.ToString()) > -1 )

When reRun, with no other changes, causes an end number of approximately ".NET regex took 1687 miliseconds"

Would be interested in seeing the Java equivalent change + perf. With the understanding that a implementation .IndexOf Java equivalent will probably be significantly different then .net

.Net version was 1.1

Confusing C# versus Java benchmark - an intention or ignorance?

Posted by Anonymous User Anonymous User at 2006-05-11 05:15 AM

In C# help you can find out that RegexOptions.Compiled option specifies that the regular expression is compiled into an assembly. This yields faster execution but increases startup time. In this confusing example the compiler was called 1000000 times. Certainly, it took some time.

After changing the code to: Regex regexpr = new Regex(matchthis); C# speed will be between 40 % and 33% of Java for this peculiar example (on my 2.4 GHz Pentium).

OK, C# regular expressions are slower when 1000000 different strings are searched in the same text. I do not think that this is too frequent situation. I always searched few string patterns in many texts when I parsed files. Therefore, the benchmark is artificial and misleading.

 

Powered by Plone

This site conforms to the following standards: