Citeck ECOS Guide

General Information

Abstract

This document shall constitute a technical description of Citeck ECOS system and Alfresco system being a base for Citeck ECOS. Described versions are Citeck ECOS2.8.1, Alfresco Community 4.2.c.

List of Abbreviations

Table 1. Abbreviations


Abbreviation

Abbreviation Expansion

AD

Microsoft Active Directory (Microsoft Directory Service)

SW

Software

OS

Operating System

DB

Database

DBMS

Database Management System

EDMS

Electronic Document Management System

IS

Information System

SSO

Single-Sign-On (transparent authentification)

JDBC

Java DataBase Connectivity (Java DB access library)

General System Terms

Content Application Server is a Java Enterprise Edition’s software component enabling a Java applications launch and execution.

Alfresco Structure

Three-tier Structure 

Alfresco keeps three-tier architecture (Picture 1):

  • Physical storage;
  • Alfresco Content Application Server;
  • Alfresco Client. 

 

Picture 1. Alfresco Tiers’ Structure

The Physical Storage tier consists of File System and Relational DB. File System is used for the storage of files and document versions content. All other information (metadata) is stored in the DB. Alfresco DB scheme is not a public API, it often changes from version to version and so it should not be used in interaction with Alfresco. It is recommended to use one of public services (Embedded API, Remote API, see infra) instead of Alfresco DB scheme.

Content Application Server offers advanced services for document management. The business part of data management is covered by this tier. Content Application Server provides with different types of exterior protocols for the working with different clients, such as CMIS, REST API, CIFS, IMAP, etc. The connection between Physical Storage and Content Application Service is provided by DBMS (JDBC) and OS file system mechanisms. All DBMS realizations and file systems can be used as Physical Storage. The only condition is to have necessary JDBC drivers and OS. 

Alfresco Client tier is presented by various Alfresco Client applications including browser web-clients, mobile and desktop applications and even the access through file system mechanisms (CIFS, FTP, WebDAV).

The architecture described above allows to implement different types of document-focused applications, such as Document Management (DM), Web Content Management (WCM), Records Management (RM) and others.

Structure of Content Application Server

Alfresco Content Application Server is a Java web application consisting of different components. The structure of Content Application Server on the highest level is presented below (Picture 2).

Picture 2. Alfresco Content Application Server Structure

Content Application Server should be considered as extended DBMS offering a large set of document management services:

  • Content Services – modeling, search, versions management, multilinguism, export/import, content conversion, content classification through categories and tags, metadata fields highlighting;   
  • Control Services – business processes, rules and policies, access permissions, access auditing, preview generation, publication;
  • Collaboration Services – favorites, like, events feed, wiki, blogs, forums.

These services are provided by following APIs and protocols:

  • Embedded APIs – interface for Java, JavaScript, FreeMarker (templates), content modeling and business processes languages;
  • Remote APIs – interface for web services (SOAP), web scripts (REST) and CMIS API for SOAP and REST;
  • Protocols – CIFS, WebDAV, FTP, IMAP, SharePoint.

Application service architecture helps to actualize different module and extension types for the standard set of interfaces:

  • content models;
  • business processes;
  • additional services for Java, JavaScript, FreeMarker;
  • rules, activities and policies;
  • additional web scripts (REST API).

Interaction with Adjacent Systems

Content Application Server uses adjacent systems services for its own services realization. Adjacent systems are used particularly for:

  • indexing and search (Lucene, SOLR);
  • authentification (LDAP, NTLM, Kerberos, External)
  • content conversion (LibreOffice, ImageMagick, SWF Tools).

Indexing and search are extremely important functions in most of Alfresco applications. There are two search subsystem realizations, i.e. either usage of embedded indexing library Lucene or usage of remote web application SOLR which uses Lucene. These two realizations have different opportunities and their comparison is presented below (Table 2).

Table 2. Comparison of Lucene and SOLR search realizations

Property

Lucene

SOLR

Indexing in transaction

Supported, can be turned off

Not supported

Alfresco server backoff

No, works on the same server

Yes, can be moved to other server

Clustering for improving performance

Alfresco clustering

Independent SOLR clustering


Indexing in transaction (or atomic content indexing) permits to maintain index in current status, which is necessary in some applications. In cases when maintaining the index in current status is not necessary, the deferred indexing permits to law transaction time and therefore improve performance. Indexing in transaction is about the only advantage of Lucene. Otherwise, SOLR is highly recommended by Alfresco.

Whatever the search realization is, the search subsystem maintains several properties that are relevant for applications:

  • Access permissions verification before search results showing to client;
  • Supporting of different search languages including Lucene, CMIS, Full Text Search.

Authentification subsystems delegate client verification function to external systems. Directory Service such as Active Directory is most frequently used for integration with enterprise structure.  In case of such integration client, information is imported from AD to Alfresco and authentification is proceeded through one or several LDAP, NTLM or Kerberos protocols. The authentification protocols are compared below (Table 3).

Table 3. Authentification protocols comparison

Property

LDAP

NTLM

Kerberos

Versioning

3

1

5

Encryption

Only SSL

Embedded

Embedded

Transparent authentification (SSO)

No

Yes

Yes

Authentification CIFS

No

Yes

Yes

Authentification SharePoint

Not SSO

Yes

Yes


Alfresco server acts as “Man in the middle” in NTLM usage, actually encouraging protocol vulnerability. For this reason, only the NTLM version 1 is supported. NTLM version 2 protocol is not supported in interaction with AD, as authentification through adjacent services is not needed for the NTLM protocol. Authentification protocol Kerberos is recommended for the integration into enterprise infrastructure.

The content preview is applied for preview and icons generation, printing, image scaling and other goals. LibreOffice is used for office formats (and PDF) conversion, ImageMagick is used for image (and PDF) conversion and SWF Tools is used for PDF to SWF conversion for the preview.

Services mentioned above are possible to function as Windows services, but Alfresco delivering does not provide for default Windows services.

Necessary Resources for Alfresco Functioning

Hardware

Alfresco System Requirements at a rough estimate are following:

For 50 concurrent or 500 registered users:

1,5 GB JVM RAM                                                   2x server CPU (or 1xDual-core)

For 100 concurrent or 1000 registered users:

1,5 GB JVM RAM                                                   4x server CPU (or 2xDual-core)

For 200 concurrent or 2000 registered users:

2,5 GB JVM RAM                                                   8x server CPU (or 4xDual-core)

If it is going to deploy Alfresco server in virtual environment, indices below are should be doubled.

For detailed information please refer to: http://wiki.Alfresco.com/wiki/JVM_Tuning#JVM_Memory_and_CPU_Hardware_for_multiple_users

The DB size varies with approximate cards number. For example, documents number 1000 x fields number 20 x average files size 10B x average quantity of versions 5 x indexing factor 2 = 1000 × 20 × 10 × 5 × 2 = 2 000 000 B = 2 MB.

If according to the documentation there are installing processes, signed history or stored additional information, they should be also considered.

Disk space size depends on document content size. For example, documents number 1000 x average size 1 MB x average quantity of versions 5 = 1000 × 1 × 5 = 5 000 MB = 5 GB.

A disk space size for index (Lucene, SOLR) is calculated in the same way and is approximately 3 times less than the previous one.

Software

In Alfresco setup these services are created on Windows:

  • AlfrescoPostgreSQL (for  PostgreSQL start) - Alfresco DBMS serving Alfresco DB;
  • AlfrescoTomcat (for  Apache Tomcat  start) –Alfresco servlet container serving Alfresco web applications.

If these names are already used, setter is to choose another name, for example AlfrescoPostgreSQL-1 or AlfrescoTomcatnum1.

In setting on Linux just one Alfresco service is created and it launches all other Alfresco components (default PostgreSQL and Tomcat).

Alfresco components take default TCP ports listed below (Table 4).

Table 4. Authentification Protocols Comparison

Port

Component

Protocol

Description

5432

PostgreSQL

PostgreSQL


8080

Tomcat

HTTP

Web applications port

8443

Tomcat

HTTPS

Web applications port

8009

Tomcat

AJP


8005

Tomcat


Shutdown port

8000

Java


Java debug port (default off)

7070

VTI

SharePoint

Online editing

21

Alfresco

FTP

May be turned off in Alfresco-global.properties

445

Alfresco

CIFS

May be turned on in Alfresco-global.properties

50500

Alfresco

RMI

Others RMI ports equally

8100

LibreOffice

LibreOffice


 

If these ports are already used in setting up the other ports for PostgreSQL, Tomcat и VTI should be indicated. Alfresco ports are to be set in Alfresco-global.properties file. If there is a need to indicate other ports after the setting up, you can look through Component Setting Guide.

Methods of Alfresco extension and adjustment

Alfresco is distributed as WAR-file (WAR – Web Archive –Java archive format) or WAR-files which are installed on Java applications server(s) (for example, Apache Tomcat or JBoss Application Server). Usually extensions and modules are packed in the web applications, configuration elements are in the so-called shared classpath (for Tomcat this folder is called tomcat/shared).

Alfresco supports these types of adjacent applications package:

  • non-packed files;
  • ZIP file;
  • JAR files;
  • AMP files.

JAR files are recommended for the simple extension package, AMP files are recommended for the complex extension package.

JAR file (JAR – Java Archive) is Java archive format supported by all Java application servers. JAR files are installed into the shared classpath or directly into “WEB-INF/lib” folder of the web application. However, in case of JAR, the file integrity is not guaranteed, and for this reason, it is recommended to pack files in AMP archives.

AMP file is the Alfresco module format (AMP – Alfresco Module Package) and it is a recalled ZIP archive with special agreements on internal structure. Setting up the AMP file, its content is integrated in WAR file.

Configuration elements are in shared classpath. Particularly Alfresco-global.properties, which is a key Alfresco configuration file, is in the slash root of shared classpath (for Tomcat it is a folder tomcat/shared/classes). Other extensions configuration is classpath:Alfresco/extension (for Alfresco repository) and classpath:Alfresco/web-extension (for Alfresco Share).  Commonly used configuration files are listed below:

  • Alfresco/extension/*-log4j.properties –configuration file of log4j journal;
  • Alfresco/extension/subsystems/Authentication – configuration files of authentication subsystem (for example, an interaction  with MS Active Directory);
  • Alfresco/extension/custom-vti* – configuration file of VTI module (SharePoint Protocol support for online editing);
  • Alfresco/web-extension/share-config-custom.xml – custom configuration Share.

Alfreso Log Files (Event Log)

Alfresco log files either are in root folder Alfresco or in tomcat/bin or in file system root (Linux). It depends on the Alfresco version.  Alfresco generates one log file per one web application. It means that in standard installation three log files should be presented:

  • Alfresco.log –Alfresco repository event log
  • share.log –Alfresco Share event log (web interface)
  • solr.log –SOLR event log (indexing service)

In addition, Apache Tomcat also registers event logs located in tomcat/logs:

  • catalina.out for Linux, Alfrescotomcat-stdout.YYYY-MM-DD.log for Windows – standard output Apache Tomcat
  • localhost_access_log.YYYY-MM-DD.txt – served requests log

For all these files, a rotation is set - everyday a new file is created and used, but the old files are not deleted. In order to delete old files, Alfresco and Tomcat settings must be changed.

Citeck ECOS Add On Modules

Three-Tier Module Structure

The goal of extension modules is to add a new functionality to Alfresco system. In order to maximize a functionality’s reuse a three-tier structure is used:

  1.  core modules – contain a basic functionality used in many applications (as  Alfresco);
  2. applications modules – contain a functionality used only in some records-management applications, for example, in contracts, attorneys, orders management etc.
  3. custom modules – contain a functionality used only in one exact system integration in concrete organization. Every organization has its own modules set.

Standard system integration needs core modules, custom modules and one or several application modules (the last one is optional).

Alfresco infrastructure makes it possible to redefine a realization and configuration of general modules in more specific ones. Particularly, it is possible to redefine core and applications modules in custom module. 

Core Module Configuration and Functions

Citeck ECOS system core includes modules:

  • 1st-override-repo;
  • 1st-override-share;
  • idocs-repo;
  • idocs-share.

“-repo” modules are to be installed in web application Alfresco.war (Alfresco repository), “-share” modules are to be installed in web application share.war (Alfresco Share – web interface).

“1st-override-” modules are intended for Alfresco files redefining. “idocs-” modules contain a basic functionality of с Citeck ECOS system core.

Citeck ECOS adds new functions to Alfresco:

Logs. Make possible document-focused view, and a documents and other subjects search in system. A distinguishing feature is that different types of content are considered, but only relevant attributes are displayed in every case.

Structure. Provides with modeling of structure (of organization where the system is used) through embedded Alfresco group mechanism. A distinguishing feature is that groups can be marked by different tags corresponding to various officials and departments. A group mechanism makes it possible to distribute rights to some departments and officials and to set tasks to officials.

Templates. Content templates make it possible to generate a document content using a template. Card templates provide with template generation of related documents such as approval pages, access history etc. Notification templates provide mailing of notifications for specified events.  Automatic numbering templates make it possible to generate documents numbers using specified template. Templates of docs (MS Word 2007) are also supported.

Advanced process capabilities. During the task, an automatic rights distribution can be provided. After the end of the task, rights are taken away. Documents can be applied to tasks. Proxy support.

Lifecycles. Documents lifecycles can be described as a set of document states and transitions.  A distinguishing feature is a simple realization and extension of lifecycles even after the start.  Basic business processes composing documents lifecycles are completed (approving, signing etc.).

Reporting. Automatic information upload to the external base in order to facilitate a reporting by using adjacent tools.

Case management. Opportunity to organize cases, i.e. special containers with arbitrary attachments.

Integration. Opportunity to synchronize Alfresco directories with external sources such as SQL-compatibles DB, XML, flat files (for content import).  The information upload to external storages is possible as well.

Document infobox. Could be formed of different cardlets. It is possible to interchange cardlets and indicate arbitrary conditions of its display.

User interface. Various visual components improving system usability and adding functionality to Alfresco Share.

Application Module Configuration and Options

Application modules include:

  • contracts management;
  • attorneys management;
  • orders management.

Application modules deploy sites (contracts’ site, attorneys’ site etc.) and logs in order to manage records. Application modules include model and form definition for these records, special policies and preinstalled templates’ definition, some default settings which can be redefined in custom modules.

Sync Service Description

Synchronization (sync) service is aimed at synchronizing data in different data storages such as:

  • Alfresco repository;
  • external databases;
  • folders with XML;
  • folders with arbitrary files.

Sync service uses following abstractions:

  • Object DAO – service providing an access to any data storage, there are Source DAO (data source) and Target DAO (data sink);
  • Object Type – object type used by Object DAO. Every Object DAO uses its own Object Type, for example, repository objects, DB records, XML elements, etc.
  • Object Info – information about real or potential object of Object Type, Object DAO makes it possible to get Object Info from Object Type and create (update) Object Type by Object Info;
  • Object Converter – it converts Object Info from Source DAO to Target DAO;
  • Sync Configuration – synchronization parameters: start point (Source DAO), how to convert (Object Converter), target point (Target DAO).

The scheme of sync configuration is presented below (Picture 3). Data flows conversion proceeds as follows:

  1. Source DAO receives objects for sync (either all objects or those updated after the last sync);
  2. Source DAO converts these objects to Object Info;
  3. Object Converter provides an according of Object Info between Source DAO and Target DAO; additional objects Object Converter can be used for some fields conversion;
  4. Target DAO creates or updates objects according to the received information.
                                                                                         


PIcture 3.  Sync configuration

Sync service supports objects’ linking because of special realizations of Object Converter which makes it possible to find and (or) create linking objects according to configuration. 

Multithreading and merging of different objects’ sync is supported in sync service in order to increase an import/export speed. Maximum number of simultaneous transactions (i.e. data flows number) and maximum number of objects in transaction can be adjusted.