In my role as a developer evangelist at Alfresco, I regularly speak with developers who have never heard of the tool and are not familiar with the concept of content management. This short post will provide enough of an introduction that you will know what capabilities these tools provide, and when to research more information. It is the first in a short series on developing with Alfresco.
Alfresco is a tool for enterprise content management (ECM). I compare it to a database for unstructured business content like audio, video, office documents, PDFs, images, or anything else you want to hand it. When asked to handle these opaque file formats, many developers either shove them into a database as a binary blob or stick them on the filesystem. Really crazy people stick them into source control. They then work-around the limitations of their storage in order to implement requested capabilities to control their content like version history, permissions, metadata, workflow, and transformation. Then they realize that they need to provide a way to backup and restore, lock content, and search. If it continues, this process results in lots of custom code, brittle systems, and maintenance headaches. Eventually most developers realize that these are common problems and they discover that ECM repositories address them.
Enterprise content management systems provide a set of content services that can be used to manage opaque binary content. My team put together the following short-list of critical content services a modern repository should provide:
- Interface (UI, API, and integration with authoring tools)
- Persistence / Data Model / Metadata
- Business Process / Workflow / Rules Execution / Scheduler
- Library Services (Upload / Download, Versioning, Check-in / Check-out)
- Permissions and Security
- Transformation / Rendition / Thumbnails
- Tagging / Categorization
- Transfer / Publication
- Activity Streams / Notification
- Auditing and Reporting
- Records Retention / Disposition Schedules / Legal Holds
- File System View (CIFS, FTP, NFS, WebDAV)
Your project will not require all of these services, but since you don't have to write or maintain them it is good to know they are there to use when your requirements evolve. The goal of a content repository is to have a single "system of record" where the official copy of each piece of content is stored. That content can then be accessed through services by every system that needs to use it.
There is an industry standard API for working with most of these services called Content Management Interoperability Services (CMIS). CMIS provides a vendor neutral, lowest common denominator API for accessing a content repository. CMIS compliant repositories have to have a SOAP and REST interface (AtomPub or JSON) for the basic Create, Read, Update, Delete (CRUD) operations on content. CMIS also defines operations for traversing the folder hierarchy, permissions, editing and reading metadata, and versioning. There is also an SQL like query language for searching the content. Whenever you deal with content, you should base your interaction on CMIS and extend it with vendor specific APIs. That gives you a certain amount of vendor neutrality and also allows a number of integrations between various products. I'll explain more about CMIS in a separate post.
There are content management systems optimized for specific industries, such as finance, publishing, construction, eCommerce, or healthcare. There are also solutions optimized for specific types of content, like digital asset management, inventory management, or web content management. These systems already have definitions for what metadata should be tracked for each content type (the content models), they are configured with the essential content workflows, and they often contain special tools for dealing with their specific content types (like video editing or digital rights management tools). But there are trade-offs; they are usually hard to customize to your specific environment, have limited integration points, and lack capabilities that don't immediately apply to the problem they are focused on solving. Solution repositories are often quicker to deploy then a solution built on an ECM repository, but they will not be as flexible over time as a general repo. Many general ECM vendors will sell specific content solutions, but beware that sometimes these solutions poorly integrated with the core ECM repository and less flexible than running a separate system (this mostly happens when the solution is the result of an acquisition by the vendor). If you are evaluating a web use case, or trying to select between a solution specific or general content repository, then you should read my next post in this series on WCM vs ECM.
Alfresco is a general purpose ECM repository. The specific solutions sold by Alfresco are configurations of that core repository to meet specific use cases. The architecture of Alfresco is straight-forward:
- The repository runs as a Java WAR (we bundle Tomcat).
- Content is stored on the file system.
- Meta-data and permissions are tracked in a relational database (we bundle PostgreSQL).
- User accounts can be provisioned from the database or a back-end directory service (LDAP).
- Search indexes are maintained in a Solr instance.
- LibreOffice is used for text extraction and office document transformation.
- ImageMagick is used for image transformation.
This makes administering backups and configuring high-availability (Enterprise Edition only) straight-forward.
Alfresco doesn't care what type of file format you hand it. Any file it receives will be wrapped with meta-data, version history, and permissions. Based on where it sits in the hierarchy, Alfresco can execute rules against the file or initiate a workflow. But Alfresco will also look at the MIME Type for the file and see if it knows how to do something more. Examples include:
- It can extract EXIF information from JPEGs,
- Stream MPEGs,
- Extract text from MS Office documents for searching,
- Transform TIFFs to JPEGs,
- Transform DOCX to HTML,
- Transform PPT to PDF,
- Preview anything it knows how to turn into PDF.
Alfresco also provides a REST API for interacting with your content. The REST API can be extended with your own REST endpoints using Alfresco webscripts. You can learn more about these capabilities in my post on development with Alfresco.
Alfresco provides all of the capabilities of an enterprise content repository. Not only is it a powerful, it is also free to use. You can begin using Alfresco in the Cloud for free without installing anything, or you can deploy and customize Alfresco Community Edition which is open source.
Let me know if you have questions you want me to address in future posts.