Understanding CMIS
CMIS is an effort toward standardization and is managed by the Organization for the Advancement of Structured Information Standards (OASIS) body. The latest version is 1.1 (http://docs.oasis-open.org/cmis/CMIS/v1.1/CMIS-v1.1.html), which was approved in May 2013. Version 1.0 specifies most of the functionalities and is quite developed being approved in May 2010. Some content servers might not yet support Version 1.1, so we will point out when a feature is only available in Version 1.1. CMIS is all about being able to access and manage content in a so-called content repository in a standard way. You can think of a content repository as something that can be used to store files in a folder hierarchy.
The CMIS interface consists of two parts: a number of repository services for things such as content navigation and content creation, and a repository query language for content search. The standard also defines what protocols can be used to communicate with a repository and what formats should be used in requests and responses via these protocols.
To really explain what CMIS is, and the background to why it came about, one has to look at how the implementation of content management systems has evolved. If we go back 15-20 years, most companies (that are large corporations) had one content management system installed for Document Management (DM) and workflow. This meant that all the content was available in one system via a single Application Programming Interface (API), making it easy for other enterprise systems to integrate with it and access content. For example, the Swedish nuclear power plant that I worked for in the mid 90s had one big installation of Documentum that everyone used.
In the last 5-10 years, there has been an explosion in the number of content management systems used by companies; most companies now have multiple content management systems in use, sometimes running into double digits.
Note
So you are thinking that this cannot be true; companies having five content management systems? This is true alright. According to the Association for Information and Image Management (AIIM), which is the main Enterprise Content Management (ECM) industry organization, 72 percent of large organizations have three or more ECM, Document Management, or Record Management systems, while 25 percent have five or more (as mentioned in State of the ECM Industry, AIIM, 2011).
This is because these days we not only manage documents, but we also manage records (known as Record Management), images and media files (known as Digital Asset Management), advanced workflows, web content (known as Web Content Management), and many other types of content. It is quite often that one content management system is better than the other in handling one type of content such as records or web content, so a company ends up buying multiple content management systems to manage different types of content.
A new type of content management system has also emerged, which is open source and easily accessible for everyone to try out. Each one of these systems have different APIs and can be implemented in a different language and on a different type of platform. All this means that a lot of companies have ended up with many content silos/islands that are not communicating with each other, sometimes having duplicated content.
What this means is that when it comes to implementing the following kind of services, we might have a problem choosing what API to work with:
Enterprise service that should aggregate content from several of these systems
Content transfer from one system to another
UI client that should display content from more than one of these systems
It would then be necessary to learn about a whole lot of APIs and platforms. Most of the proprietary APIs were also not based on HTTP, so if you wanted a service or client to be outside the firewall, you would have to open up new ports in the firewall to take care of security and so on.
Any company that wants to develop tools or clients to access content management systems would also have to support many different protocols and formats, making it difficult to work with more than a handful of the seasoned CMS players. This leads to people thinking about some sort of standard interface and protocol to access CMS systems.
The first established standard covering the content management area is Web Distributed Authoring and Versioning (WebDAV), which was proposed in February 1999 with RFC 2518 (refer to ftp://ftp.isi.edu/in-notes/rfc2518.txt). It is supported by most content management systems, including Alfresco, and is usually used to map a drive to access the content management system via, for example, Windows Explorer or Mac Finder. The problem with this way of accessing content is that most of the valuable features of a content management system cannot be used, such as setting custom metadata, managing versions, setting fine grained permissions, controlling relationships, and searching for content.
So this led to more comprehensive standards such as the Java Content Repository (JCR) API, which is managed by the Java Community Process as JSR-170 (https://www.jcp.org/en/jsr/detail?id=170) and JSR-283 (https://www.jcp.org/en/jsr/detail?id=283) and was first developed in 2002. The JCR standard has been supported by Alfresco for a long time, but it has never really taken off as it is Java centric and excludes content management systems such as SharePoint and Drupal.
Something needed to be done to come up with a new standard that would be easy to learn and adopt. This is where CMIS comes into the picture. CMIS provides a standard API and query language that can be used to talk to any CMS system that implements the CMIS standard. The following figure illustrates how a client application that adheres to the CMIS standard can talk to many different content management systems through one standard service-oriented interface:
The preceding figure shows how each one of the content management systems offers access to their proprietary content and metadata via the standard CMIS service interface. The CMIS interface is web-based, which means that it can be accessed via HTTP through the Internet. Even cloud-based content management systems such as the Alfresco Cloud installation can be accessed via CMIS.