Open Source Document Management Solutions Written in Java
|
|
Content Management Systems (CMS) have always been a nebulous term. That's one reason that I've never included this category in my reviews of Java open source projects.
One of the primary features of a CMS is its support for Repository services. In recent months, there have been a couple of CMS implementations that supported JSR-170 (i.e a standard content repository specification). In general these capabilities are orthogonal to the presentation features available in many CMS. The primary intent of document repostiories is to support scalable management of large documents. This review contains CMS implementations that place more emphasis on this rather than the presentation layer.
Here's the list of "Document Repository" implementations that I have found:
- DSpace Federation - DSpace is a digital library system to capture, store, index, preserve, and redistribute the intellectual output in digital formats. Developed jointly by MIT Libraries and Hewlett-Packard (HP).
- Fedora - The Flexible Extensible Digital Object Repository Architecture. The system is designed to be a foundation upon which interoperable web-based digital libraries, institutional repositories and other information management systems can be built, it demonstrates how distributed digital library architecture can be deployed using web-based technologies, including XML and Web services. The Fedora project was funded by the Andrew W. Mellon Foundation.
- FlexStor - FLEX-db is an enterprise-wide, digital asset management product built upon JAVA and an object-based extensible Enterprise Java Bean (EJB) architecture.
- Magnolia - Magnolia supports JSR-170. It features a very flexible structure, platform-independence through the use of Java and XML, a simple to use API, easy templating through the use of JSP, JSTL and a custom tag library, automatic administrative UI generation.
- Daisy - Daisy is a comprehensive content management application framework, consisting of a standalone repository server accessible through HTTP/XML and a high-level (remote) Java API, and an extensive Daisy browsing and editing DaisyWiki application.
- Slide - The Slide project main module is a content repository, which can be seen as a low-level content management framework. Conceptually, it provides a hierarchical organization of binary content which can be stored into arbitrary, heterogenous, distributed data stores. In addition, Slide integrates security, locking and versioning. Slide also offers a WebDAV access.
- JackRabbit - The Jackrabbit Project has been formed to develop an open source implementation of the Content Repository for Java Technology API (JCR), being specified within the Java Community Process as JSR-170. Day Software, the JSR-170 specification lead, has licensed an initial implementation of the JCR reference implementation for use as seed code for this project.
- MMBase - MMBase is a Web Content Management System with strong multi media features and advanced portal functionalities. MMBase has a large installed base in The Netherlands, and is used by major Dutch broadcasters, publishers, educational institutes, national and local governments.
- SpaceMapper DataStore - DataStore is a Java based document repository server for storing, querying and fetching XML based documents. It is built on practical needs allowing the storage of semi-structured documents and un-structured documents. Documents are stored in conventional relational database. Built on top of the Avalon Phoenix framework, it allows server components to be easily developed, deployed and shared. The documents are managed through a BEEP and/or XML-RPC interface using a subset of the SEP (Simple Exchange Profile) protocol.
- KnowledgeTree - KnowledgeTree is a feature-rich document management system featuring knowledge management, document version control, hierarchical document management, support for common file formats, extensible meta data, creation of custom document types, application managed document links that guarantees consistent data and eliminates emailing documents, easy publication of documents, subscription agents, archiving according to expiry date, expiry time period, or utilisation for enhanced speed, and much more.
- Xinco - Xinco DMS is a powerful Web-Service based Information and Document Management System (DMS) for files, text, URLs and contacts, featuring ACLs, version control, full text search and an FTP-like client.
- Open Harmonise - Harmonise is a metadata, taxonomy and content management system written in Java and based upon the WebDAV standard. Unlike traditional CMSs it focuses on metadata management and integrates that with content management and publishing.
- Apache Lenya - Apache Lenya is an Open Source Java/XML Content Management System and comes with revision control, site management, scheduling, search, WYSIWYG editors, and workflow. Based on Apache Cocoon.
- Contineo - Contineo is a web based document management system. It supports its users by managing documents in most popular formats. Contineo aims to fulfill all phases of document lifecycle. You can create and develop documents by using office software. With contineo itself, you can publish, search, and manage the versions of documents. Further, you can communicate with some other users directly or via e-mail.
- Alfresco - Alfresco is a modern content repository with an out-of-the-box portal framework for managing and using content designed to work with standard portals, and a groundbreaking Common Internet File System (CIFS) interface. The Alfresco system is developed using JBoss 4.0, JBoss Portal 2.0, Spring 1.2, Hibernate 3.0, MyFaces 1.0, Lucene 1.4 and Java 1.5. The system is architected to use aspects and Aspect-Oriented Programming wherever possible. Aspects make all aspects of the repository configurable including typing, versioning, process separation and security.
- Archimede - A Canadian software solution for institutional repositories inspired by the D-Space model. Archimede has been developed in a multilingual perspective, with internationalization as a focus. Archimede allows searching on metadata as well as on the full text. The system is OAI compliant, using a Dublin Core metadata set. The search engine is based on open source Lucene, using LIUS (Lucene Index Update and Search). LIUS allows indexing of different types of documents formats : XML, HTML, PDF, RTF, MS Word, MS Excel, JavaBeans; it also permits mixed indexing, integrating for example in the same occurrence metadata in XML and full text in PDF, HTML, etc
- jLibrary - jLibrary is a Document Management System, oriented for personal and enterprise use. This dual approach make from jLibrary unique. Using jLibrary, you can classify your documents, videos, or any other media type. You can export those contents to static web pages based on templates, search on those content, add comments and categorize it. jLibrary features team work support, version management, offline document edition, document locking, security constraints based on roles, users and groups, easy web access, etc.
- Nuxeo 5 - Nuxeo 5 addresses trequirements for enterprise wide content management such as Document Management, Collaboration, Compliance, Records Management, Business Process Management, Business Rules Management, Retention Management, Indexing, Search and File Transformation. Nuxeo 5 is built on Apache Jackrabbit JCR, the JBoss application server, JBoss Seam, jBPM, JBoss Rules, JSF and EJB3.
- dotCMS - dotCMS is an enterprise-level open source J2EE/Java Web Content Management System (wCMS). While the dotCMS includes the features you'd expect in a complete CMS, including true separation of content and design and ease of editing, it also includes many features you wouldn't expect such as calendar and events management, e-communications tools and more.
- Sling - Apache Sling is a web framework that uses a Java Content Repository to store and manage content. Sling applications use either scripts or Java servlets, selected based on simple name conventions, to process HTTP requests in a RESTful way.
- MeshCMS - MeshCMS is an web based editing system written in Java. It provides a set of features usually included in a CMS, but it uses a more traditional approach: pages are stored in regular HTML files and all additional features are file-based, without needing a database.
- Hippo CMS - Hippo CMS is targetted at medium to large organisations managing content for multi-channel distribution like websites, intranets, PDAs and print (also called Enterprise Content Management). It attempts to follow international accepted open standards. Hippo CMS is built to integrate external sources of content into one.
- Graffito - Graffito is a framework used to build content based applications like content management, document management, forums, blogs and wiki. Graffito integrates content repositories, workflow, collaboration and personalization via existing open source projects and powerful standards (JCR, WEBDAV). Graffito includes features like taxonomy, version control, fine grained access control, collaborative editing, publication workflow, scheduling, indexing, searching and more. It will also support many document types like XML, HTML, PDF, MS Office, Open Office, RDF, etc.
- OpenCMS - OpenCMS provides a browser based ui that features configurable editors for structured content. Content can also be created using an integrated WYSIWYG editor. A sophisticated template engine enforces a site-wide corporate layout and W3C standard compliance for all content. OpenCms is based on Java, J2EE, XML, RDBMS technology.
- AtLeap - AtLeap is a multilingual free Java CMS with full-text search engine. Blandware AtLeap is a framework which allows you to rapidly start your own Web application. AtLeap project is initially based on AppFuse. It does not require an EJB application server to run. Based on Hibernate, Spring, XDoclet, Struts, FCKEditor, Lucene, Quartz, Acegi and TinyMCE.
Take note that I moved several entries from "Knowledge Management (KM)". The distinction of that list is that KM refers to repositories that support richer semantics and "documents" of smaller granularity. Please let me know if there are other worthwhile projects that I should include in this list.
[update] Frank Sommers has a written detailed explanation of why document repositories differs from other kinds of databases.
JXTA Content Management System
http://download.jxta.org/build/release/2.3.1/
under CMS heading
eXo Platform JCR
eXo JCR is another JSR-170 implementation
DSpace development
PoI: DSpace is now developed by an international OS development community. MIT and HP are still involved.
Corendal DocSide
We use Corendal DocSide at work. It integrates in real time with our Active Directory accounts and groups for access rights and such. http://sourceforge.net/projects/corendaldocs/


Though the indexing part is written in java.
Replies to this comment