Importer Manual / 4.4.2 XSLT Transformation

Importer Manual / Version 2204

4.4.2 XSLT Transformation

In this step, a style sheet for the transformation of XMLNewsStory documents into CoreMedia XML documents with the document type Text is created. With the following example document you find out which information from the source format should be transferred to the target format, and then a style sheet is created.

<nitf>
  <head>
    <title>Snow, Freezing Rain Batter U.S. Northeast</title>
    <base href="http://cool.dot.com/news/xmlnews.xml"/>
  </head>

  <body>
    <body.head>
      <hedline>
        <hl1>
          Snow, Freezing Rain Batter 
          <location>
            <country>U.S.</country>
            <region>Northeast</region>
          </location>
        </hl1>
      </hedline>
      <byline>
        <bytag>By Matthew Lewis</bytag>
      </byline>
      <dateline>
        <location>
          <city>HARTFORD</city> 
          ,
          <state>Conn.</state>
        </location>
        <story.date>Friday January 15 12:27 PM ET</story.date>
      </dateline>
    </body.head>
    <body.content>
      <p>Snow and freezing rain punished the 
        <location>
          northeastern
          <country>United States</country>
        </location> 
         for a second straight day on 
        <chron norm="19990115">Friday</chron>
         , causing at least five weather-related deaths, 
         closing airports and spreading misery from
        <location>
          <city>Washington</city>
           , 
          <state>D.C.</state>
        </location>
         , to
        <location>
          <country>Canada</country>
        </location>
         .
      </p>
    </body.content>
  </body>
</nitf>

Example 4.24. An XmlNews document

In order to keep the style sheet simple, it is assumed that some elements are generally available although they are optional according to the DTD. The style sheet should execute the following obvious mappings:

The heading results from the contents of /nitf/body/body.head/hedline/hl1.
The contents of nitf/body/body.content should be imported as text.
For importing a document, you need a name and an ID. Since the document contains no element with suitable content for this, you simply take the filename.

The content should not simply be adopted as pure text but be sensibly matched to the structures of our coremedia-richtext-1.0.dtd DTD:

The  paragraphs of the xmlnews document are adopted 1:1 as in coremedia-richtext-1.0.dtd.

While paragraphs and headings are general standard building bricks of any document, the xmlnews inline markup within , for example <location>, is application-specific. Therefore, there are no adequate coremedia-richtext-1.0.dtd elements for this. Nevertheless, you want to save the information:

The xmlnews inline markup is matched to  elements whose CLASS attribute is set to the name of the original element (such as location).

This should be enough for an example of the functionality. The remaining components of our Text document will be filled with default values.

Before you deal with the actual transformation, define a utility function and a variable in which you first save the filename:

<xsl:template name="fetchFilename">
  <xsl:param name="filename">unkown</xsl:param> 
  <xsl:choose> 
    <xsl:when test="contains($filename,'/')"> 
      <xsl:call-template name="fetchFilename"> 
        <xsl:with-param name="filename"> 
          <xsl:value-of select="substring-after($filename,'/')"/> 
        </xsl:with-param> 
      </xsl:call-template>
    </xsl:when> 
    <xsl:otherwise> 
      <xsl:value-of select ="$filename"/> 
    </xsl:otherwise> 
  </xsl:choose>
</xsl:template>

<xsl:variable name="filename">
  <xsl:call-template name="fetchFilename">
    <xsl:with-param name="filename"> 
      <xsl:value-of select="/nitf/head/base/@href"/>
    </xsl:with-param>
  </xsl:call-template>
</xsl:variable>

Example 4.25. Filename

The attribute href of the element /nitf/head/base contains the complete URL of the document. The fetchFilename function recursively cuts off one level of the path using the "/" sign, until only the filename is left.

Now you begin top-down with the transformation templates. Use the root as the entry point for generating the <coremedia> element:

<xsl:template match="/">
  <coremedia>
    <xsl:apply-templates select="nitf"/>
  </coremedia>
</xsl:template>

Example 4.26. coremedia element

The <nitf> element is mapped to a <document> element:

<xsl:template match="nitf">
  <document>
    <xsl:attribute name="type">Text</xsl:attribute>
    <xsl:attribute name="path">Test/Xmlnews</xsl:attribute>
    <xsl:attribute name="name">
      <xsl:value-of select="$filename"></xsl:value-of>
    </xsl:attribute>
    <xsl:attribute name="id">
      <xsl:value-of select="$filename"></xsl:value-of>
    </xsl:attribute>

      <xsl:apply-templates select="body"/>
  </document>
</xsl:template>

Example 4.27. nitf/ document

You decided on the document type Text at the beginning. For simplicity reasons, set the fixed path Test/Xmlnews as the target directory. Set the name and ID to the already extracted filename.

Proceed down to <body> and generate a <version>.

<xsl:template match="body">
  <version>
    <xsl:attribute name="number">1</xsl:attribute>
    <xsl:apply-templates select="body.head/hedline/hl1"/>
    <xsl:apply-templates select="body.content"/>
    <xsl:element name="integer">
      <xsl:attribute name="name">Priority</xsl:attribute>
      <xsl:attribute name="value">42</xsl:attribute>
    </xsl:element>
    <xsl:element name="string">
      <xsl:attribute name="name">Source</xsl:attribute>
      <xsl:attribute name="value">known to editor</xsl:attribute>
    </xsl:element>
    <xsl:element name="date">
      <xsl:attribute name="name">AutoDeletedate</xsl:attribute>
    </xsl:element>
    <xsl:element name="date">
      <xsl:attribute name="name">AutoPdatum</xsl:attribute>
    </xsl:element>
    <xsl:element name="linklist">
      <xsl:attribute name="name">Images</xsl:attribute>
    </xsl:element>
  </version>
</xsl:template>

Example 4.28. body / version

The version number, number, is irrelevant for import and is simply set to 1.

In the version a corresponding field element must be generated for every field of your document type Text. Set Priority and Source to default values, while AutoDeletedate, AutoPdate and Images are left empty. Within this example, only the heading and the actual content should be taken from the source document. For this purpose, suitable templates for body.head/hedline/hl1 and for body.content are called.

<xsl:strip-space elements="hl1"/>

<xsl:template match="hl1">
  <string>
    <xsl:attribute name="name">Heading</xsl:attribute>
    <xsl:attribute name="value">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </string>
</xsl:template>

Example 4.29. Heading

The heading results directly from the textual content of the <hl1> element. Inline Markup is not taken into consideration here.

<xsl:template match="body.content">
  <text>
    <xsl:attribute name="name">Text</xsl:attribute>
    <div>
      <xsl:apply-templates select="p"/>
    </div>
  </text>
</xsl:template>

Example 4.30. Content

At this point, the transition from CoreMedia DTD to coremedia-richtext-1.0.dtd occurs. The template generates the CoreMedia field element for the document field Text and the coremedia-richtext-1.0.dtd element <div>, which is filled with  elements.

Due to the XSLT default templates working through elements recursively and copying text, the style sheet already produces correct CoreMedia XML in this version. However, you still want to transform the Inline Markup into  elements, and need a further template for that.

<xsl:template match="p/*">
  <span>
    <xsl:attribute name="class">
      <xsl:value-of select="local-name()"/>
    </xsl:attribute>
    <xsl:apply-templates/>
  </span>
</xsl:template>

Example 4.31. Inline Markup

If you are familiar with XPath, you will have noticed that only the direct child elements of are handled with match="p/*". This is deliberate, because span elements in coremedia-richtext-1.0.dtd may not be nested and therefore you cannot take nested markup into account with such simple means.

Our style sheet is ready now. Even if you have little experience with XSLT it should now be quite simple to obtain the author from the <bytag>, for example, and place it in a string field element of your document.

To make the importer automatically executing the style sheet enter it in the configuration file of the importer:

import.transformer.20.class=XsltTransformerFactory
import.transformer.20.name=XmlNews to CoreMedia
import.transformer.20.property.stylesheet=/path/to/xmlnews.xsl

Example 4.32. Configuration

When the style sheet is applied to our XML example document, the following file results:

<?xml version="1.0" encoding="UTF-8"?>

<coremedia>
  <document type="Text" 
      path="Test/Xmlnews" name="xmlnews.xml" id="xmlnews.xml">

      <version number="1">
      <string name="Ueberschrift" 
            value="Snow, Freezing Rain Batter U.S.Northeast"/>
      <text name="Text">
            <div>
              <p>Snow and freezing rain punished the 
           <span class="location">northeastern United States
      </span> for a second straight day on 
          <span class="chron">Friday</span>, causing at least 
         five weather-related deaths, closing airports and 
         spreading misery from <span class="location">
         Washington, D.C.</span>, to 
         <span class="location">Canada</span>.
              </p>
            </div>
      </text>
      <integer name="Prioritaet" value="42"/>
      <string name="Quelle" value="d. Red. bekannt"/>
      <date name="AutoLoeschdatum"/>
      <date name="AutoPdatum"/>
      <linklist name="Bilder"/>
    </version>
  </document>
</coremedia>

Example 4.33. A document

Search Results

Table Of Contents

Filter

Importer Manual / Version 2204

4.4.2 XSLT Transformation

Search Results