Citeck ECOS Guide
General Information
Abstract
This document shall constitute a technical description of Citeck ECOS system and Alfresco system being a base for Citeck ECOS. Described versions are Citeck ECOS2.8.1, Alfresco Community 4.2.c.
List of Abbreviations
Table 1. Abbreviations
Abbreviation | Abbreviation Expansion |
AD | Microsoft Active Directory (Microsoft Directory Service) |
SW | Software |
OS | Operating System |
DB | Database |
DBMS | Database Management System |
EDMS | Electronic Document Management System |
IS | Information System |
SSO | Single-Sign-On (transparent authentification) |
JDBC | Java DataBase Connectivity (Java DB access library) |
General System Terms
Content Application Server is a Java Enterprise Edition’s software component enabling a Java applications launch and execution.
Alfresco Structure
Three-tier Structure
Alfresco keeps three-tier architecture (Picture 1):
- Physical storage;
- Alfresco Content Application Server;
- Alfresco Client.
Picture 1. Alfresco Tiers’ Structure
The Physical Storage tier consists of File System and Relational DB. File System is used for the storage of files and document versions content. All other information (metadata) is stored in the DB. Alfresco DB scheme is not a public API, it often changes from version to version and so it should not be used in interaction with Alfresco. It is recommended to use one of public services (Embedded API, Remote API, see infra) instead of Alfresco DB scheme.
Content Application Server offers advanced services for document management. The business part of data management is covered by this tier. Content Application Server provides with different types of exterior protocols for the working with different clients, such as CMIS, REST API, CIFS, IMAP, etc. The connection between Physical Storage and Content Application Service is provided by DBMS (JDBC) and OS file system mechanisms. All DBMS realizations and file systems can be used as Physical Storage. The only condition is to have necessary JDBC drivers and OS.
Alfresco Client tier is presented by various Alfresco Client applications including browser web-clients, mobile and desktop applications and even the access through file system mechanisms (CIFS, FTP, WebDAV).
The architecture described above allows to implement different types of document-focused applications, such as Document Management (DM), Web Content Management (WCM), Records Management (RM) and others.
Structure of Content Application Server
Alfresco Content Application Server is a Java web application consisting of different components. The structure of Content Application Server on the highest level is presented below (Picture 2).
Picture 2. Alfresco Content Application Server Structure
Content Application Server should be considered as extended DBMS offering a large set of document management services:
- Content Services – modeling, search, versions management, multilinguism, export/import, content conversion, content classification through categories and tags, metadata fields highlighting;
- Control Services – business processes, rules and policies, access permissions, access auditing, preview generation, publication;
- Collaboration Services – favorites, like, events feed, wiki, blogs, forums.
These services are provided by following APIs and protocols:
- Embedded APIs – interface for Java, JavaScript, FreeMarker (templates), content modeling and business processes languages;
- Remote APIs – interface for web services (SOAP), web scripts (REST) and CMIS API for SOAP and REST;
- Protocols – CIFS, WebDAV, FTP, IMAP, SharePoint.
Application service architecture helps to actualize different module and extension types for the standard set of interfaces:
- content models;
- business processes;
- additional services for Java, JavaScript, FreeMarker;
- rules, activities and policies;
- additional web scripts (REST API).
Interaction with Adjacent Systems
Content Application Server uses adjacent systems services for its own services realization. Adjacent systems are used particularly for:
- indexing and search (Lucene, SOLR);
- authentification (LDAP, NTLM, Kerberos, External)
- content conversion (LibreOffice, ImageMagick, SWF Tools).
Indexing and search are extremely important functions in most of Alfresco applications. There are two search subsystem realizations, i.e. either usage of embedded indexing library Lucene or usage of remote web application SOLR which uses Lucene. These two realizations have different opportunities and their comparison is presented below (Table 2).
Table 2. Comparison of Lucene and SOLR search realizations
Property | Lucene | SOLR |
Indexing in transaction | Supported, can be turned off | Not supported |
Alfresco server backoff | No, works on the same server | Yes, can be moved to other server |
Clustering for improving performance | Alfresco clustering | Independent SOLR clustering |
Indexing in transaction (or atomic content indexing) permits to maintain index in current status, which is necessary in some applications. In cases when maintaining the index in current status is not necessary, the deferred indexing permits to law transaction time and therefore improve performance. Indexing in transaction is about the only advantage of Lucene. Otherwise, SOLR is highly recommended by Alfresco.
Whatever the search realization is, the search subsystem maintains several properties that are relevant for applications:
- Access permissions verification before search results showing to client;
- Supporting of different search languages including Lucene, CMIS, Full Text Search.
Authentification subsystems delegate client verification function to external systems. Directory Service such as Active Directory is most frequently used for integration with enterprise structure. In case of such integration client, information is imported from AD to Alfresco and authentification is proceeded through one or several LDAP, NTLM or Kerberos protocols. The authentification protocols are compared below (Table 3).
Table 3. Authentification protocols comparison
Property | LDAP | NTLM | Kerberos |
Versioning | 3 | 1 | 5 |
Encryption | Only SSL | Embedded | Embedded |
Transparent authentification (SSO) | No | Yes | Yes |
Authentification CIFS | No | Yes | Yes |
Authentification SharePoint | Not SSO | Yes | Yes |
Alfresco server acts as “Man in the middle” in NTLM usage, actually encouraging protocol vulnerability. For this reason, only the NTLM version 1 is supported. NTLM version 2 protocol is not supported in interaction with AD, as authentification through adjacent services is not needed for the NTLM protocol. Authentification protocol Kerberos is recommended for the integration into enterprise infrastructure.
The content preview is applied for preview and icons generation, printing, image scaling and other goals. LibreOffice is used for office formats (and PDF) conversion, ImageMagick is used for image (and PDF) conversion and SWF Tools is used for PDF to SWF conversion for the preview.
Services mentioned above are possible to function as Windows services, but Alfresco delivering does not provide for default Windows services.
Necessary Resources for Alfresco Functioning
Hardware
Alfresco System Requirements at a rough estimate are following:
For 50 concurrent or 500 registered users:
1,5 GB JVM RAM 2x server CPU (or 1xDual-core)
For 100 concurrent or 1000 registered users:
1,5 GB JVM RAM 4x server CPU (or 2xDual-core)
For 200 concurrent or 2000 registered users:
2,5 GB JVM RAM 8x server CPU (or 4xDual-core)
If it is going to deploy Alfresco server in virtual environment, indices below are should be doubled.
For detailed information please refer to: http://wiki.Alfresco.com/wiki/JVM_Tuning#JVM_Memory_and_CPU_Hardware_for_multiple_users
The DB size varies with approximate cards number. For example, documents number 1000 x fields number 20 x average files size 10B x average quantity of versions 5 x indexing factor 2 = 1000 × 20 × 10 × 5 × 2 = 2 000 000 B = 2 MB.
If according to the documentation there are installing processes, signed history or stored additional information, they should be also considered.
Disk space size depends on document content size. For example, documents number 1000 x average size 1 MB x average quantity of versions 5 = 1000 × 1 × 5 = 5 000 MB = 5 GB.
A disk space size for index (Lucene, SOLR) is calculated in the same way and is approximately 3 times less than the previous one.
Software
In Alfresco setup these services are created on Windows:
- AlfrescoPostgreSQL (for PostgreSQL start) - Alfresco DBMS serving Alfresco DB;
- AlfrescoTomcat (for Apache Tomcat start) –Alfresco servlet container serving Alfresco web applications.
If these names are already used, setter is to choose another name, for example AlfrescoPostgreSQL-1 or AlfrescoTomcatnum1.
In setting on Linux just one Alfresco service is created and it launches all other Alfresco components (default PostgreSQL and Tomcat).
Alfresco components take default TCP ports listed below (Table 4).
Table 4. Authentification Protocols Comparison
Port | Component | Protocol | Description |
5432 | PostgreSQL | PostgreSQL | |
8080 | Tomcat | HTTP | Web applications port |
8443 | Tomcat | HTTPS | Web applications port |
8009 | Tomcat | AJP | |
8005 | Tomcat | Shutdown port | |
8000 | Java | Java debug port (default off) | |
7070 | VTI | SharePoint | Online editing |
21 | Alfresco | FTP | May be turned off in Alfresco-global.properties |
445 | Alfresco | CIFS | May be turned on in Alfresco-global.properties |
50500 | Alfresco | RMI | Others RMI ports equally |
8100 | LibreOffice | LibreOffice |
If these ports are already used in setting up the other ports for PostgreSQL, Tomcat и VTI should be indicated. Alfresco ports are to be set in Alfresco-global.properties file. If there is a need to indicate other ports after the setting up, you can look through Component Setting Guide.
Methods of Alfresco extension and adjustment
Alfresco is distributed as WAR-file (WAR – Web Archive –Java archive format) or WAR-files which are installed on Java applications server(s) (for example, Apache Tomcat or JBoss Application Server). Usually extensions and modules are packed in the web applications, configuration elements are in the so-called shared classpath (for Tomcat this folder is called tomcat/shared).
Alfresco supports these types of adjacent applications package:
- non-packed files;
- ZIP file;
- JAR files;
- AMP files.
JAR files are recommended for the simple extension package, AMP files are recommended for the complex extension package.
JAR file (JAR – Java Archive) is Java archive format supported by all Java application servers. JAR files are installed into the shared classpath or directly into “WEB-INF/lib” folder of the web application. However, in case of JAR, the file integrity is not guaranteed, and for this reason, it is recommended to pack files in AMP archives.
AMP file is the Alfresco module format (AMP – Alfresco Module Package) and it is a recalled ZIP archive with special agreements on internal structure. Setting up the AMP file, its content is integrated in WAR file.
Configuration elements are in shared classpath. Particularly Alfresco-global.properties, which is a key Alfresco configuration file, is in the slash root of shared classpath (for Tomcat it is a folder tomcat/shared/classes). Other extensions configuration is classpath:Alfresco/extension (for Alfresco repository) and classpath:Alfresco/web-extension (for Alfresco Share). Commonly used configuration files are listed below:
- Alfresco/extension/*-log4j.properties –configuration file of log4j journal;
- Alfresco/extension/subsystems/Authentication – configuration files of authentication subsystem (for example, an interaction with MS Active Directory);
- Alfresco/extension/custom-vti* – configuration file of VTI module (SharePoint Protocol support for online editing);
- Alfresco/web-extension/share-config-custom.xml – custom configuration Share.
Alfreso Log Files (Event Log)
Alfresco log files either are in root folder Alfresco or in tomcat/bin or in file system root (Linux). It depends on the Alfresco version. Alfresco generates one log file per one web application. It means that in standard installation three log files should be presented:
- Alfresco.log –Alfresco repository event log
- share.log –Alfresco Share event log (web interface)
- solr.log –SOLR event log (indexing service)
In addition, Apache Tomcat also registers event logs located in tomcat/logs:
- catalina.out for Linux, Alfrescotomcat-stdout.YYYY-MM-DD.log for Windows – standard output Apache Tomcat
- localhost_access_log.YYYY-MM-DD.txt – served requests log
For all these files, a rotation is set - everyday a new file is created and used, but the old files are not deleted. In order to delete old files, Alfresco and Tomcat settings must be changed.
Citeck ECOS Add On Modules
Three-Tier Module Structure
The goal of extension modules is to add a new functionality to Alfresco system. In order to maximize a functionality’s reuse a three-tier structure is used:
- core modules – contain a basic functionality used in many applications (as Alfresco);
- applications modules – contain a functionality used only in some records-management applications, for example, in contracts, attorneys, orders management etc.
- custom modules – contain a functionality used only in one exact system integration in concrete organization. Every organization has its own modules set.
Standard system integration needs core modules, custom modules and one or several application modules (the last one is optional).
Alfresco infrastructure makes it possible to redefine a realization and configuration of general modules in more specific ones. Particularly, it is possible to redefine core and applications modules in custom module.
Core Module Configuration and Functions
Citeck ECOS system core includes modules:
- 1st-override-repo;
- 1st-override-share;
- idocs-repo;
- idocs-share.
“-repo” modules are to be installed in web application Alfresco.war (Alfresco repository), “-share” modules are to be installed in web application share.war (Alfresco Share – web interface).
“1st-override-” modules are intended for Alfresco files redefining. “idocs-” modules contain a basic functionality of с Citeck ECOS system core.
Citeck ECOS adds new functions to Alfresco:
Logs. Make possible document-focused view, and a documents and other subjects search in system. A distinguishing feature is that different types of content are considered, but only relevant attributes are displayed in every case.
Structure. Provides with modeling of structure (of organization where the system is used) through embedded Alfresco group mechanism. A distinguishing feature is that groups can be marked by different tags corresponding to various officials and departments. A group mechanism makes it possible to distribute rights to some departments and officials and to set tasks to officials.
Templates. Content templates make it possible to generate a document content using a template. Card templates provide with template generation of related documents such as approval pages, access history etc. Notification templates provide mailing of notifications for specified events. Automatic numbering templates make it possible to generate documents numbers using specified template. Templates of docs (MS Word 2007) are also supported.
Advanced process capabilities. During the task, an automatic rights distribution can be provided. After the end of the task, rights are taken away. Documents can be applied to tasks. Proxy support.
Lifecycles. Documents lifecycles can be described as a set of document states and transitions. A distinguishing feature is a simple realization and extension of lifecycles even after the start. Basic business processes composing documents lifecycles are completed (approving, signing etc.).
Reporting. Automatic information upload to the external base in order to facilitate a reporting by using adjacent tools.
Case management. Opportunity to organize cases, i.e. special containers with arbitrary attachments.
Integration. Opportunity to synchronize Alfresco directories with external sources such as SQL-compatibles DB, XML, flat files (for content import). The information upload to external storages is possible as well.
Document infobox. Could be formed of different cardlets. It is possible to interchange cardlets and indicate arbitrary conditions of its display.
User interface. Various visual components improving system usability and adding functionality to Alfresco Share.
Application Module Configuration and Options
Application modules include:
- contracts management;
- attorneys management;
- orders management.
Application modules deploy sites (contracts’ site, attorneys’ site etc.) and logs in order to manage records. Application modules include model and form definition for these records, special policies and preinstalled templates’ definition, some default settings which can be redefined in custom modules.
Sync Service Description
Synchronization (sync) service is aimed at synchronizing data in different data storages such as:
- Alfresco repository;
- external databases;
- folders with XML;
- folders with arbitrary files.
Sync service uses following abstractions:
- Object DAO – service providing an access to any data storage, there are Source DAO (data source) and Target DAO (data sink);
- Object Type – object type used by Object DAO. Every Object DAO uses its own Object Type, for example, repository objects, DB records, XML elements, etc.
- Object Info – information about real or potential object of Object Type, Object DAO makes it possible to get Object Info from Object Type and create (update) Object Type by Object Info;
- Object Converter – it converts Object Info from Source DAO to Target DAO;
- Sync Configuration – synchronization parameters: start point (Source DAO), how to convert (Object Converter), target point (Target DAO).
The scheme of sync configuration is presented below (Picture 3). Data flows conversion proceeds as follows:
- Source DAO receives objects for sync (either all objects or those updated after the last sync);
- Source DAO converts these objects to Object Info;
- Object Converter provides an according of Object Info between Source DAO and Target DAO; additional objects Object Converter can be used for some fields conversion;
- Target DAO creates or updates objects according to the received information.
PIcture 3. Sync configuration
Sync service supports objects’ linking because of special realizations of Object Converter which makes it possible to find and (or) create linking objects according to configuration.
Multithreading and merging of different objects’ sync is supported in sync service in order to increase an import/export speed. Maximum number of simultaneous transactions (i.e. data flows number) and maximum number of objects in transaction can be adjusted.