Differentia Javaica

I started a new project called differentia-javaica. Here is the quote from the original description:

The aim of this project is to compare two java source codes and check if they are equal. It is not a simple diff. It uses ANTLR to construct two Abstract Source Trees for java types and eventually compare these trees. As a consequence white spaces and comments will not affect comparison. Reordering of elements in source code will be treated as difference though.

This kind of comparison is especially helpful when writing unit tests for java source code generators. When we have expected source code it is possible to check if it equals to generated source code.

We are writing some Java source code generators right now at NCDC and this tools is quite helpful in unit testing. Thus we want to share it with community.


  1. What's the practical difference between this and just removing whitespaces and comments? What if one source has variable $counter (well, probably not Java-like example), and the other has all occurances of this variable changed to $NumberOfSomething - will it show the difference?

  2. "What's the practical difference between this and just removing whitespaces and comments?"

    It depends on what questions one wants to be answered:

    1. are two source codes different?
    2. how two source codes differ?

    Our case is the 2nd one. During development of source code generator we want to know which exact lines differ in relatively long generated classes.

    In Differentia Javaica's approach of using Abstract Syntax Trees or to be more specific Parse Trees (these are concrete), we have semantic representation of the whole source. Although this is not implemented yet, we can use some additional constrains in the future. For example reordering of some elements of the source code has no effect on the meaning. For instance when defining public static field in Java one can write:

    public static final String FOO = "foo";

    as well as:

    final static public String FOO = "foo";

    ,thus we can easily compare such fragments semantically. Of course we could compile sources into bytecode and then compare them via reflection, but then we probably would not have position in the source files :)

    Regarding your second question - it would be much harder although still possible. This case is much easier to handle when comparing compiled classes.