Open Source Document Repository Solutions Written in Java

Share the article!

Content Management Systems (CMS) have always been a nebulous term. That’s one reason that I’ve never included this category in my reviews of Java open source projects.

One of the primary features of a CMS is its support for Repository services. In recent months, there have been a couple of CMS implementations that supported JSR-170 (i.e a standard content repository specification). In general these capabilities are orthogonal to the presentation features available in many CMS. The primary intent of document repostiories is to support scalable management of large documents. This review contains CMS implementations that place more emphasis on this rather than the presentation layer.

Here’s the list of “Document Repository” implementations that I have found:

  • DSpace Federation – DSpace is a digital library system to capture, store, index, preserve, and redistribute the intellectual output in digital formats. Developed jointly by MIT Libraries and Hewlett-Packard (HP).
  • Fedora – The Flexible Extensible Digital Object Repository Architecture. The system is designed to be a foundation upon which interoperable web-based digital libraries, institutional repositories and other information management systems can be built, it demonstrates how distributed digital library architecture can be deployed using web-based technologies, including XML and Web services. The Fedora project was funded by the Andrew W. Mellon Foundation.
  • FlexStore – FLEX-db is an enterprise-wide, digital asset management product built upon JAVA and an object-based extensible Enterprise Java Bean (EJB) architecture.
  • Magnolia – Magnolia supports JSR-170. It features a very flexible structure, platform-independence through the use of Java and XML, a simple to use API, easy templating through the use of JSP, JSTL and a custom tag library, automatic administrative UI generation.
  • Daisy – Daisy is a comprehensive content management application framework, consisting of a standalone repository server accessible through HTTP/XML and a high-level (remote) Java API, and an extensive Daisy browsing and editing DaisyWiki application.
  • Slide – The Slide project main module is a content repository, which can be seen as a low-level content management framework. Conceptually, it provides a hierarchical organization of binary content which can be stored into arbitrary, heterogenous, distributed data stores. In addition, Slide integrates security, locking and versioning. Slide also offers a WebDAV access.
  • JackRabbit – The Jackrabbit Project has been formed to develop an open source implementation of the Content Repository for Java Technology API (JCR), being specified within the Java Community Process as JSR-170. Day Software, the JSR-170 specification lead, has licensed an initial implementation of the JCR reference implementation for use as seed code for this project.
  • MMBase – MMBase is a Web Content Management System with strong multi media features and advanced portal functionalities. MMBase has a large installed base in The Netherlands, and is used by major Dutch broadcasters, publishers, educational institutes, national and local governments.
  • SpaceMapper DataStore – DataStore is a Java based document repository server for storing, querying and fetching XML based documents. It is built on practical needs allowing the storage of semi-structured documents and un-structured documents. Documents are stored in conventional relational database. Built on top of the Avalon Phoenix framework, it allows server components to be easily developed, deployed and shared. The documents are managed through a BEEP and/or XML-RPC interface using a subset of the SEP (Simple Exchange Profile) protocol.
  • KnowledgeTree – KnowledgeTree is a feature-rich document management system featuring knowledge management, document version control, hierarchical document management, support for common file formats, extensible meta data, creation of custom document types, application managed document links that guarantees consistent data and eliminates emailing documents, easy publication of documents, subscription agents, archiving according to expiry date, expiry time period, or utilisation for enhanced speed, and much more.
  • Xinco – Xinco DMS is a powerful Web-Service based Information and Document Management System (DMS) for files, text, URLs and contacts, featuring ACLs, version control, full text search and an FTP-like client.
  • Open Harmonise – Harmonise is a metadata, taxonomy and content management system written in Java and based upon the WebDAV standard. Unlike traditional CMSs it focuses on metadata management and integrates that with content management and publishing.

Take note that I moved several entries from “Knowledge Management (KM)“. The distinction of that list is that KM refers to repositories that support richer semantics and “documents” of smaller granularity. Please let me know if there are other worthwhile projects that I should include in this list.

Share the article!