close

Filter

Importer Manual / Version 2010

Table Of Contents

4.3 XML Transformation

If your XML files were not explicitly created for CoreMedia import, they probably have a different format and must be transformed into the CoreMedia XML format first. The importer supports this with a multi-stage configurable transformation.

If no transformers are inserted in the configuration file, the sets of source documents delivered by next are imported directly. Of course, this only works if the documents are CoreMedia XML documents matching the document types of the Content Server. Typically, a transformation of the documents is necessary to achieve the correct format. This is subject of the next section.

If your documents are not in XML format and have a regular text structure, you can create XML documents out of the texts using regular expressions as known from the Perl5 programming language. Furthermore, you can use regular expressions to structure PCDATA sections into XML documents.

If your XML documents do not yet correspond to the CoreMedia XML format, they can be transformed with an XSLT style sheet. You only have to provide the style sheet and the importer carries out the transformation automatically.

The power of XSLT also has its limits. At the latest when the transformation has to be carried out over multiple documents (for example to realize relationships between articles and their teasers) or when the source documents are not XML documents, XSLT does not help any further. In addition, some transformations within an XML document can only be carried out awkwardly with XSLT, if they run counter to the declarative paradigm. In such cases you have the possibility with the Importer API of integrating your own special transformers into the importer. Your transformer can either process each source document separately or all of them at once. Furthermore, you can access the source documents in the Stream or DOM format and then return the transformed document as Stream or as a DOM tree. However, access to a source document in the DOM format requires that it is an XML document. On the other hand, Stream access is possible for all documents.

As already mentioned, you can combine multiple transformers in order to achieve the desired end result in the form of CoreMedia XML. The order of the transformers is determined in the configuration file of the importer, and during operation each transformer starts with the result of its predecessor. The first transformer directly accesses the document delivered by the document generator (see previous section). The flexibility regarding the access (Stream or DOM, individual or complete set) is unaffected by the order. For example, the first transformer can return a deeply nested MultiResult with DOM trees, but the second can start with individual Stream documents. The necessary reformatting is carried out automatically by the importer.

The following sections deal with these transformation possibilities in detail.

Search Results

Table Of Contents