Alfresco is format-agnostic. It will store any bytes you hand it, and along with storage you get powerful functionality such as permission enforcement, categorization, versioning, automation rules, workflow, check-in / check-out, metadata, and meta-data search (among other capabilities). In addition to these core content-agnostic capabilities, Alfresco will also examine the MIME type to see if additional operations can be performed such as full content indexing, thumbnailing, transformation, automated metadata extraction, and content preview in Alfresco Share.
I am regularly asked to provide a list of formats for which Alfresco supports these extra capabilities. Unfortunately, such a list depends on the underlying technologies that we use and it changes very quickly. In this post I hope to provide some guidelines on how Alfresco will treat various formats. These lists are not exhaustive, as I have only listed formats that I have personally tested.
Alfresco is designed to be very extensible. If Alfresco does not have a capability that you would like for a specific format, it is worth checking with us to see if we are aware of an Alfresco Partner or community project that has provided the desired functionality.
If Alfresco can convert a MIME type to text, it will extract that text and index it for full content searching. Share will also display a preview of the text, including formatting were possible. Alfresco uses Open Office for conversions that retain formatting, and Apache Tika to do conversions that require clean text. Apache Tika is also used for conversions from Word to HTML in Web QuickStart.
- Text based files (obviously) such as TXT, CSV, RTF, XML, and HTML. Single-byte and multi-byte encodings work fine.
- File formats that are readable by OpenOffice, such as MS Office and Open XML formats (XLS, DOC, PPT, XLSX, DOCX, and PPTX), Word Perfect files (WPD), and Open Document files (ODS, ODT, ODP).
- The text layer of a PDF files (be aware that some PDF files just contain an image of the text).
Previews are created by transforming the content to a PDF using Open Office, then using the open source utility PDFtoSWF to convert it to a Flash movie. For this reason, the formatting for some MS Office documents will not be rendered correctly in the preview. In our Swift release it will be possible to incorporate a 3rd party transformer (such as MS Office) to help with these extractions and transformations.
Alfresco can do image transformations, such as thumbnail generation. If Alfresco can convert an image format to PDF, then it will also display the image for preview in Share as described above. ImageMagick is the library used under the hood for transformations, and there are very few file formats that do not work.
Apache Tika is used for extracting metadata such as EXIF data.
Audio and Video
Alfresco Share does not currently have a previewer for audio and video, but will in the Swift release.
In the Swift release, one of three previewers will be selected depending on the content MIME type and browser capability. If the browser has Flash 10+ available, Strobe from the OSMF will be used. Flashfox will be used for older versions of Flash. If Flash is not available, or if neither player supports the content MIME type, HTML5 will be used to deploy the content. The available players will be configuration driven. Previews use progressive downloading so that playback can be started before the whole video has been downloaded.
There are no transformers bundled for audio and video, but the Share Extras project shows how to define transformers based on FFmpeg, which we can’t ship due to license concerns.
Though I have seen the audio and video preview demoed, I haven’t played with it yet so I can’t tell you what video and audio formats will be supported. Based on the underlying libraries, I would expect common modern formats will work just fine.
Most metadata extraction is done with Apache Tika. Only a limited number of extractors are wired out of the box, but the hooks are available for customers to wire additional extraction.
I hesitate to call them “unsupported” formats, because Alfresco can manage these files just fine (even really large AutoCAD files) but you won’t get previews or metadata extraction. Of course this is not a complete list, but I thought it would be helpful specify common requests that we do not currently fulfill.
- MS Visio
- MS Project
- AutoCAD (though a partner plugin can provide preview, thumbnails, and metadata)
- iWork formats (though engineering is currently working on this one)
Though the details of this post will quickly become out of date, I hope that the information presented will continue to guide you in determining what capabilities Alfresco will have for a given format. Our open-source approach of leveraging the best underlying libraries the community has to offer means that our capabilities expand quickly. By looking at the tools we harness you can get a sense for what a specific release can do.
Feel free to list in the comments any additional formats you have tried so that others benefit. I apologize for the cumbersome comments system, as spammers have caused me problems in the past.