Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

When the import and export are used

The ECM system is most often introduced into the existing infrastructure of the organization, and the existing data is already stored somewhere. Therefore, the issue is to import data from external sources and to export  data to external receivers.

...

Regular import and export tasks are typically required for normal interaction of the system with its environment. For example, it is possible to regularly import a reference from the available database or to load files from a folder. An example of a regular export task is to export the database for reporting. Citeck EcoS Citeck ECOS system contains a synchronization service designed for regular implementation of the import and export.

Managing the synchronization service

The synchronization service operates the synchronization configurations. Each configuration is a set of three components:

...

  1. The Source DAO receives a recurrent set of objects to synchronize (all or only updated since the last synchronization);
  2. The Source DAO converts the received objects in the Object Info format;
  3. Object Converter provides the coordination of Object Info formats between the Source DAO and Target DAO; additional objects of the Object Converter can be used to convert the individual fields;
  4. The Target DAO creates or updates the objects in accordance with the information received.

Description of the synchronization capabilities

In Citeck EcoSECOS, there are several implementations available for data sources and receivers:

...

  • constant value
  • string replacement by a regular expression
  • conversion of a map to a string by a template
  • conversion of a map to a repository object
  • conversion of a foreign key to a repository object
  • conversion of a repository object to a map
  • conversion of a map to another map
  • conversion of a string to a map with a single key
  • conversion of a string to date
  • current date
  • converters chain invocation

Examples of synchronization configurations

The simplest synchronization configuration

The synchronization configuration is performed in the Spring context-file (*-context.xml files). The reference import from an external database table in Alfresco is configured in the following example. Each object of the configuration (the source, the converter, the receiver and the configuration itself) is set in the Spring Context XML-notation.

We begin with the data source. First, the usual JDBC Data Source shall be configured; it specifies the parameters of the database connection:

...


Code Block
<bean id="test-import-datasource"
      class="org.springframework.jdbc.datasource.DriverManagerDataSource">
    <property name="driverClassName" value="org.postgresql.Driver"/>
    <property name="url" value="jdbc:postgresql://localhost:5432/testdata"/>
    <property name="username" value="alfresco"/>
    <property name="password" value="password"/>
</bean>

...

We configure the data receiver:

...


Code Block
<bean id="test-import-target" parent="ExternalSync.DAO.Node">
    <property name="nodeType"
        value="{http://www.citeck.ru/model/test/1.0}testRecord" />
    <property name="newNodesRootPath"
        value="/app:company_home/st:sites/cm:test/cm:documentLibrary" />
    <property name="keys">
        <set><value>extsync:id</value></set>
    </property>
</bean>

...


As can be seen from the configuration, the repository data type that corresponds to the records in the external table, as well as the path name, where new objects will be created, and a set of key properties are defined in the data source. Altogether, that works as follows: when the data to be stored in the repository are coming from an external source, we still do not know if they exist in the repository or not. The Lucene query is made for this purpose of the form:

...

Code Block
<bean id="test-import-config" parent="ExternalSync.Config">
    <property name="name" value="test-import" />
    <property name="sourceDAO" ref="test-import-source" />
    <property name="targetDAO" ref="test-import-target" />
    <property name="converter" ref="test-import-converter" />
</bean>

...


In addition to the source, receiver, and converter, its name is shown in the configuration. At synchronization starting, that name and the synchronization mode (full or differential) are pointed out.

Options to start the synchronization

There are the following options to start the synchronization:

...

When the synchronization starts the synchronization progress is captured in the log file as follows:

...


Code Block
test-import: starting full synchronization
test-import: current version: {version=1}
test-import: received 4 objects to update
test-import: synchronization finished with success, created: 0, updated: 4, deleted: 0 objects

...

Code Block
<bean id="test-import-job" class="org.alfresco.util.CronTriggerBean">
    <property name="jobDetail">
        <bean class="org.springframework.scheduling.quartz.JobDetailBean">
            <property name="jobClass">
                <value>ru.citeck.alfresco.acm.sync.ExternalSyncJob</value>
            </property>
            <property name="jobDataAsMap">
                <map>
                    <entry key="ExternalSyncService" 
                           value-ref="ExternalSyncService" />
                    <entry key="SyncConfigName" value="test-import" />
                    <entry key="FullSync" value="true" />
                </map>
            </property>
        </bean>
    </property>
    <property name="enabled" value="true" />
    <property name="scheduler" ref="schedulerFactory" />
    <property name="cronExpression" value="0 0 * * * ?" />
</bean>

...


Synchronization of linked reference tables

In the previous section, the property “test:author” was filled with a string from the “name” attribute of the external table. What if the “test: author” is not a property but association, and the string is some foreign key, for example, the user name. In such a case, the system shall find (or create) the linked object by a foreign key in the repository and create an association to the main object instead of setting the system property during the synchronization.

This can be done using the converter ExternalSync.Converter.Assoc. This converter allows to convert an arbitrary value to the repository object. As a preliminary, we need to define a separate receiver (Target DAO) for linked objects:

...


Code Block
<bean id="test-import-people" parent="ExternalSync.DAO.Node">
    <property name="nodeType"
        value="cm:person" />
    <property name="keys">
        <set><value>cm:userName</value></set>
    </property>
</bean>

...

Code Block
<bean id="test-import-people-converter"
      parent="ExternalSync.Converter.Assoc">
    <property name="targetDAO" ref="test-import-people" />
    <property name="converter">
        <bean parent="ExternalSync.Converter.ImportRecord">
            <property name="propertyMapping">
                <map><entry key="cm:userName" value="username" /></map>
            </property>
        </bean>
    </property>
</bean>

...


The ExternalSync.Converter.Assoc converter configuration is similar to the synchronization configuration: it identifies the receiver (targetDAO) and converter (converter). The source is not required because the data during the conversion is already retrieved from the source (in this case, one shall convert the foreign key to the repository object). The converter works as follows:

...

The configured converter test-import-people-converter waits for the map with “username” key at the input. It can be used as a part of the converter test-import-converter previously defined in order to organize an association:

...


Code Block
...
<property name="associationMapping">
    <map>
        <entry key="{http://www.citeck.ru/model/test/1.0}author"
               value="name" />
    </map>
</property>
<property name="associationConverters">
    <map>
        <entry key="{http://www.citeck.ru/model/test/1.0}author">
            <bean parent="ExternalSync.Converter.Sequence">
                <property name="converters">
                    <list>
                        <bean parent="ExternalSync.Converter.MakeMap">
                            <property name="key" value="username" />
                        </bean>
                        <ref bean="test-import-people-converter" />
                    </list>
                </property>
            </bean>
        </entry>
    </map>
</property>

Importing files from a folder

To import files from a folder, the source “ExternalSync.DAO.File” is used. For this source, one can specify the folder, from which files will be imported, as well as (optionally) the folder, into which the successfully imported files will be moved, and the folder, into which the files that cannot be imported will be moved.

...


Code Block
<bean id="test-import-file-source" parent="ExternalSync.DAO.File">
    <property name="sourceDir" value="/mnt/import-files/source"/>
    <property name="errorDir" value="/mnt/import-files/errors"/>
    <property name="successDir" value="/mnt/import-files/archive"/>
</bean>

...


Each object in this data source is a map with two keys:

...

In the simplest case, one can use propertyMapping, which maps these keys to the corresponding Alfresco properties:

...


Code Block
<property name="propertyMapping">
    <map>
        <entry key="cm:name" value="contentFileName" />
        <entry key="cm:content" value="contentFile"/>
    </map>
</property>

...


Further, one can continue to use this source on the analogy of the examples above.

...

  • Use single folder. The source stores the time when the file was modified as the version (file attribute in the file system), and the files that have changed later than last time are requested during the differential synchronization. This option is suitable if the number of files is small.
  • Use the folders successDir and errorDir. After the file is imported successfully it is moved to the folder successDir (if specified). If the import fails, the file is moved to the folder errorDir (if specified), along with the error description file (.error file). After fixing the errors in the file, it can be returned to the folder sourceDir and the import will be repeated. In this scenario, one shall use a full synchronization, i.e. read all files from the folder.

Importing XML files

To import XML files into EcoSECOS, a special data source is implemented. By analogy with the source of files (see previous section), one can specify the properties “successDir”, “errorDir”, and “sourceDir”, as well as a set of properties that defines the structure of the XML file:

...

 We assume that input XML files have the following format:

...


Code Block
<?xml version="1.0" encoding="UTF-8"?>
<documents>
    <document id="1">
        <author>Sergey</author>
        <text>Hi, I am Sergey</text>
    </document>
    <document ...>
        ...
    </document>
    ...
</documents>

...

Code Block
<bean id="test-import-xml-source" parent="ExternalSync.DAO.XML">
    <property name="sourceDir" value="/mnt/import-xml/source"/>
    <property name="errorDir" value="/mnt/import-xml/errors"/>
    <property name="successDir" value="/mnt/import-xml/archive"/>
    <property name="objectPath" value="/def:documents/def:document"/>
    <property name="attributesMap">
        <map>
            <entry key="id"   value="./@id"/>
            <entry key="name" value="./def:name/text()"/>
            <entry key="text" value="./def:text/text()"/>
        </map>
    </property>
    <property name="defaultNamespace"
              value="http://www.citeck.ru/schema/test/1.0" />
    <property name="xpathNamespaces">
        <map>
            <entry key="def" value="http://www.citeck.ru/schema/test/1.0" />
        </map>
    </property>
</bean>

...


The data source configured that way returns maps with “id”, “name”, and “text” keys, i.e. the same as JDBC source in the first example, so it is possible to determine the import configuration, which uses the same converter and receiver, as in the first example: 


Code Block
<bean id="test-import-xml-config" parent="ExternalSync.Config">
    <property name="name" value="test-import-xml" />
    <property name="sourceDAO" ref="test-import-xml-source" />
    <property name="targetDAO" ref="test-import-target" />
    <property name="converter" ref="test-import-converter" />
</bean>

...


Directory of synchronization components

Available synchronization components (sources, receivers, converters) are documented in the form of abstract or specific beans in the sync-services-context.xml file. Each of them contains some values by default, and also lists which properties should be specified for sure, and which can be ignored.

If the bean is declared abstract (abstract="true"), one shall create a child bean to use it:

...


Code Block
<bean id="..." parent="abstract-bean-id">
    <property name="..." value="..." />
    ...
</bean>

...

Code Block
<property name="..." ref="concrete-bean-id" />
<property name="...">
    <ref bean="concrete-bean-id" />
</property>
<property name="...">
    <map>
        <entry key="..." value-ref="concrete-bean-id" />
    </map>
</property>

...


Exporting data for reporting

In the EcoS ECOS system, the pre-defined configurations are done to export data to an external reporting database:

...

Using the DB access settings by default, the export occurs into the “alfrescoreporting” database on the same DBMS, with the same login and password as the primary database. To enable scheduled export, it is enough to specify in the configuration file alfresco-global properties:

 


Code Block
reporting.sync-nodes.enabled=true
reporting.sync-people.enabled=true

...

Code Block
CREATE DATABASE alfrescoreporting;
GRANT ALL ON DATABASE alfrescoreporting TO alfresco;
\c testdata
CREATE TABLE nodes (sys_node_uuid CHAR(36) PRIMARY KEY);
CREATE TABLE people (sys_node_uuid CHAR(36) PRIMARY KEY);

...


The format of the reporting tables is the following: a separate column in the table is made for each property, the column is called a property short name (all invalid characters are replaced with character _, for example, sys:node-uuid is displayed in the column sys_node_uuid). When new properties are occurred they are added as additional columns in the table. The mirroring properties are presented for associations in the system, such as the association letters:addressee will appear in column letters_addressee_added. Finally, the table has several special columns:

  • noderef contains nodeRef of an object;
  • parent contains nodeRef of a parent object;
  • type contains a short name of the type.

Tools for one-time import and export

Importing the organizational structure from an XML file

A Web script for a simple one-time organizational structure import from an XML file is implemented in the system. The input XML file shall be in the following format:

...


Code Block
<orgstruct>
    <record>
        <company>Рога и копыта</company>
        <branch>Филиал в г. Тула</branch>
        <department>Бухгалтерия</department>
        <employee>Бухгалтер</employee>
        <userid>kuznetsova</userid>
    </record>
    ...
</orgstruct>

...

Code Block
<config>
    <recordName>record</recordName>
    <groupTypes>
        <groupType id="company"              xmlField="company" 
                   shortName="{type}{index}" displayName="{name}"  
                   groupType="branch"        groupSubtype="company"    
                   scope="global" />
        <groupType ... />
        ...
    </groupTypes>
    <userTypes>
        <userType id="user" xmlField="userid" alfField="cm:userName"
                  scope="employee,manager" />
    </userTypes>
</config>

...


An example of a complete configuration of the organizational structure import can be viewed in the file

...

The ID of the loaded XMl file is specified in the attribute “nodeRef”; the number of the first entry to load (inclusive) is specified in the attribute “from”, the number of the last entry to load (not inclusive) is specified in the attribute “to”. The actual loading consists of sequential activation of Web scripts with different indexes:

 


Code Block
...&from=0&to=100
...&from=100&to=200
...&from=200&to=300
...&from=300&to=400
...

...


The import shall be stopped when the Web script returns an empty data structure, i.e. there are no entries with the specified indexes.

That import tool is one-time (i.e. it cannot be used for synchronization of the organizational structure on a regular basis). The first reason for this is that the group name, which can often vary, is used as a group ID. As a result, several different groups will be made for the same branch (role) that will lead to the incorrect organizational structure imported. Relocations of users and groups are also not tracked in this approach: if the user (group) has been moved to a different group, the user (group) will be in the old and new groups simultaneously as a result of the re-import.

Importing the organizational structure from Excel

The organizational structure also can be imported from an Excel file one-time. For this purpose, it is converted to an XML file and then the procedure described in the previous section is executed. The Excel file preview, similar to the previous XML file, is as follows:

...

First, it is necessary to describe the schema of the XML file in XSD format:

 


Code Block
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
   targetNamespace="http://www.citeck.ru/orgstruct/test/1.0"
   xmlns:dd="http://www.citeck.ru/orgstruct/test/1.0"
   elementFormDefault="qualified" version="0.1">
 
    <xs:complexType name="record">
        <xs:sequence>
            <xs:element name="company"     type="xs:string" minOccurs="0" />
            <xs:element name="branch"      type="xs:string" minOccurs="0" />
            <xs:element name="department"  type="xs:string" minOccurs="0" />
            <xs:element name="subdivision" type="xs:string" minOccurs="0" />
            <xs:element name="employee"    type="xs:string" minOccurs="0" />
            <xs:element name="userid"      type="xs:string" minOccurs="0" />
        </xs:sequence>
    </xs:complexType>
   
    <xs:element name="orgstruct">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="record" type="dd:record"
                            maxOccurs="unbounded" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>
 
</xs:schema>

...

Next, it is necessary to remove the substrings ns1: in it and the XML file to import the organizational structure will be ready: it can be used to import the organizational structure, as described in the previous section.