Importer Manual / Version 2406.1
Table Of ContentsThe Extensible Markup Language, XML, is a standard for platform- and software-independent description of structured files and data published by the World Wide Web Consortium (W3C). Most content suppliers recognize XML as a simple yet powerful exchange format and deliver their content in XML format.
CoreMedia naturally also supports XML and provides an importer for XML files of any format and map them to your CoreMedia content types.
An XML import consists of the following steps:
Read in the XML files
Transform to CoreMedia XML (configurable)
Check consistency
Submit the documents to CoreMedia CMS
The importer can either be started as a command-line tool to run a single import or as an application that runs in the background permanently and checks at regular intervals whether new documents for import have arrived. If there are new documents, they are read in. See Section 3.3, “Deployment and Operation of a Standalone Importer” and Section 3.4, “Deployment and Operation of an Importer in Docker” for details.
For every individual import the source documents to be imported must first be gathered together. A source document can exist as a file for instance, be downloaded over the net or be generated dynamically. See Section 4.2, “Source Documents” for details about the source documents. This is configurable via a Java programming interface (in the following called Importer API or API for short). The standard case is that documents exist as files. It is already covered by the classes of the API. If source documents were not originally created for CoreMedia import, they do not yet correspond to the CoreMedia XML format directly supported by the importer.
After the source documents have been read in, a configurable step of conversion into the CoreMedia XML Format is carried out. See Section 4.1, “The CoreMedia XML Format” for details about the CoreMedia XML format. As standard, the importer supports XSLT transformations and transformations based on regular expressions. However, the Importer API also enables you to insert your own transformers which correspond to the "Java API for XML Processing" (a standard API from Oracle, called JAXP in the following).
The transformers can process the source documents either in Stream or in DOM format. The Stream format in particular enables parsers for the insertion of documents which are not in the XML format. Furthermore, a transformer can either process each source document individually (such as an XSLT transformer) or transform all the source documents in one step. (Unfortunately, the latter exceeds the possibilities of JAXP, so that some expansions have been defined.) See Section 4.3, “XML Transformation” for details about transformers.
When configuring an importer, you can place multiple transformers one after the other, which are then executed in sequence on import and each receive the result of the previous transformer as input document. For a non-XML format you can enter a thin parser as the first transformer, which transforms the document into an XML format close to the source format. Next, an XSLT transformer can transform this XML format into CoreMedia XML, and finally a CoreMedia filter can allocate the document to the correct repository path.
After processing by the last transformer, the documents must be in CoreMedia XML format. The details of this format depend on the content type. The transformer must be suitably adjusted.
The configurable phase of the importer ends when the CoreMedia XML format has been created. Now the structure of the content items created is validated. This also includes consistency checks which go beyond the DTD, especially conformity of the content items to the corresponding content types and the referential integrity.
If the documents are proven consistent the import is executed, that is the documents are incorporated into the CoreMedia CMS.
Section 4.4, “Example” gives you a complete example of the import and transformation of an XML file.