Open Source XML Diff Written in Java

Share the article!

Jon Udell has a column about “Structured Change Detection” where he mentions some XML diff tools that exist. The tools that he mentioned are proprietary implementations, so I was curious if I could find some open source ones. Well fortunately, I’ve found a whole bunch of them:

  1. VMTools – The toolkit contains tools for automatically generating differences between two XML documents. The difference document generated is optimized for minimal size. Supports mark-up style documents in addition to data documents.
  2. 3DM – The 3DM tool is a tool for performing 3-way merging and differencing of XML files. Unlike line-based tools, such as diff and diff3, 3DM is aware of the structure of the processed XML documents. 3DM is not limited to update/insert/delete operations, it also handles moves and copies of entire subtrees. 3DM is not reliant on edit histories; the only input needed are the XML files.
  3. diffxml – Standard UNIX tools exist for comparing (diff) and patching (patch) files,
    which operate on a line by line basis using well-studied methods for computing
    the longest common subsequence (LCS). This project contains XML diff and patch utilities. Also contains an implementation of a Delta Update Language or DUL.
  4. diffmk
    Converts the documents into two lists of nodes (text and/or element nodes) and attempts to find the longest common subsequence of nodes. Phrased another way, it finds the smallest number of additions and deletions to each list that are required to make the two lists the same.
  5. XMLUnit – XMLUnit for Java provides two JUnit extension classes, XMLAssert and XMLTestCase, and a set of supporting classes (e.g. Diff, DetailedDiff ) that allow assertions to be made about the differences between two pieces of XML. XMLUnit for Java can also treat HTML content (even badly-formed HTML) as valid XML to allow these assertions to be made about the content of web pages too.
  6. OpenSHORE XML Merger – ool to insert XML tags from differnt sources into one or more text files. The Java program reads a very simple file format (*.xmlm files) with one XML command per line. XMLM sorts these commands, removes duplicates, ensures correct tag structure and generate XML files from listed files.
  7. XOperator – XOperator is a scriptable command-line tool and library to compare, merge and synchronize XML documents, a framework to formulate and evaluate algebraic expressions on XML trees and a framework to express object-oriented inheritance (and more) in pure XML.
  8. JXyDiff – JXyDiff is a based on XyDiff. It was originally developed at INRIA. It employs a novel Change Model tailored to XML data. It is a tree oriented algorithm that is fast and can detect if a node has been moved or updated.
  9. DiffX – DiffX is an open source Java API for comparing XML documents by analyzing the sequence of XML events. When processing XML data for comparison it more interesting to know that a word in the text of chapter X, paragraph Y has been changed rather than knowing that line Z is different. DiffX can ignorethe order of the attributes and white space for indentation or namespace prefixes.
  10. XMLPatch – XMLPatch, developed at Nokia, is a framework utilizing XML Path language (XPath) selectors for the use in applying a set of patches to a document. The framework includes a simple xml-diff utility.
  11. XMerge – The XMerge SDK provides a framework for converting documents between different formats using conversion plugins to read and write each format.
    In addition to format conversion, the XMerge SDK provides a framework for merging changes in one document format into an original document.
    XMerge provides plugins that are designed to support the XML file format. The framework is able to support “chaining” of conversions, also known as an “Any-to-Any” conversion, eg. convert from a Palm document to a PocketPC document through the intermediate XML file format.

    Let me know what I should have found!

    Share the article!

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>