Importer Manual / Version 2101
Table Of Contents
In this step, a style sheet for the transformation of XMLNewsStory
documents into CoreMedia
XML documents with the document type Text is created.
With the following example document you find out which information from the source format
should be transferred to the target format, and then a style sheet is created.
<nitf> <head> <title>Snow, Freezing Rain Batter U.S. Northeast</title> <base href="http://cool.dot.com/news/xmlnews.xml"/> </head> <body> <body.head> <hedline> <hl1> Snow, Freezing Rain Batter <location> <country>U.S.</country> <region>Northeast</region> </location> </hl1> </hedline> <byline> <bytag>By Matthew Lewis</bytag> </byline> <dateline> <location> <city>HARTFORD</city> , <state>Conn.</state> </location> <story.date>Friday January 15 12:27 PM ET</story.date> </dateline> </body.head> <body.content> <p>Snow and freezing rain punished the <location> northeastern <country>United States</country> </location> for a second straight day on <chron norm="19990115">Friday</chron> , causing at least five weather-related deaths, closing airports and spreading misery from <location> <city>Washington</city> , <state>D.C.</state> </location> , to <location> <country>Canada</country> </location> . </p> </body.content> </body> </nitf>
Example 4.24. An XmlNews
document
In order to keep the style sheet simple, it is assumed that some elements are generally available although they are optional according to the DTD. The style sheet should execute the following obvious mappings:
The heading results from the contents of
/nitf/body/body.head/hedline/hl1
.The contents of
nitf/body/body.content
should be imported as text.For importing a document, you need a name and an ID. Since the document contains no element with suitable content for this, you simply take the filename.
The content should not simply be adopted as pure text but be sensibly matched to the
structures of our coremedia-richtext-1.0.dtd
DTD:
The
<p>
paragraphs of thexmlnews
document are adopted 1:1 as <p> incoremedia-richtext-1.0.dtd
.
While paragraphs and headings are general standard building bricks of any document, the
xmlnews
inline markup within <p>, for example <location>, is
application-specific. Therefore, there are no adequate
coremedia-richtext-1.0.dtd
elements for this. Nevertheless, you want to save
the information:
The
xmlnews
inline markup is matched to<SPAN>
elements whoseCLASS
attribute is set to the name of the original element (such as location).
This should be enough for an example of the functionality. The remaining components of our Text document will be filled with default values.
Before you deal with the actual transformation, define a utility function and a variable in which you first save the filename:
<xsl:template name="fetchFilename"> <xsl:param name="filename">unkown</xsl:param> <xsl:choose> <xsl:when test="contains($filename,'/')"> <xsl:call-template name="fetchFilename"> <xsl:with-param name="filename"> <xsl:value-of select="substring-after($filename,'/')"/> </xsl:with-param> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select ="$filename"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:variable name="filename"> <xsl:call-template name="fetchFilename"> <xsl:with-param name="filename"> <xsl:value-of select="/nitf/head/base/@href"/> </xsl:with-param> </xsl:call-template> </xsl:variable>
Example 4.25. Filename
The attribute href
of the element /nitf/head/base
contains the
complete URL of the document. The fetchFilename
function recursively cuts off
one level of the path using the "/" sign, until only the filename is left.
Now you begin top-down with the transformation templates. Use the root as the entry point
for generating the <coremedia>
element:
<xsl:template match="/"> <coremedia> <xsl:apply-templates select="nitf"/> </coremedia> </xsl:template>
Example 4.26. coremedia
element
The <nitf>
element is mapped to a <document>
element:
<xsl:template match="nitf"> <document> <xsl:attribute name="type">Text</xsl:attribute> <xsl:attribute name="path">Test/Xmlnews</xsl:attribute> <xsl:attribute name="name"> <xsl:value-of select="$filename"></xsl:value-of> </xsl:attribute> <xsl:attribute name="id"> <xsl:value-of select="$filename"></xsl:value-of> </xsl:attribute> <xsl:apply-templates select="body"/> </document> </xsl:template>
Example 4.27. nitf/
document
You decided on the document type Text
at the beginning. For simplicity reasons,
set the fixed path Test/Xmlnews
as the target directory. Set the name and ID to
the already extracted filename.
Proceed down to <body>
and generate a <version>
.
<xsl:template match="body"> <version> <xsl:attribute name="number">1</xsl:attribute> <xsl:apply-templates select="body.head/hedline/hl1"/> <xsl:apply-templates select="body.content"/> <xsl:element name="integer"> <xsl:attribute name="name">Priority</xsl:attribute> <xsl:attribute name="value">42</xsl:attribute> </xsl:element> <xsl:element name="string"> <xsl:attribute name="name">Source</xsl:attribute> <xsl:attribute name="value">known to editor</xsl:attribute> </xsl:element> <xsl:element name="date"> <xsl:attribute name="name">AutoDeletedate</xsl:attribute> </xsl:element> <xsl:element name="date"> <xsl:attribute name="name">AutoPdatum</xsl:attribute> </xsl:element> <xsl:element name="linklist"> <xsl:attribute name="name">Images</xsl:attribute> </xsl:element> </version> </xsl:template>
Example 4.28. body / version
The version number, number, is irrelevant for import and is simply set to 1.
In the version a corresponding field element must be generated for every field of your
document type Text
. Set Priority
and Source
to default values, while
AutoDeletedate
, AutoPdate
and Images
are left empty. Within this example, only the heading
and the actual content should be taken from the source document. For this purpose, suitable
templates for body.head/hedline/hl1
and for body.content
are
called.
<xsl:strip-space elements="hl1"/> <xsl:template match="hl1"> <string> <xsl:attribute name="name">Heading</xsl:attribute> <xsl:attribute name="value"> <xsl:value-of select="."/> </xsl:attribute> </string> </xsl:template>
Example 4.29. Heading
The heading results directly from the textual content of the <hl1>
element. Inline Markup is not taken into consideration here.
<xsl:template match="body.content"> <text> <xsl:attribute name="name">Text</xsl:attribute> <div> <xsl:apply-templates select="p"/> </div> </text> </xsl:template>
Example 4.30. Content
At this point, the transition from CoreMedia DTD to coremedia-richtext-1.0.dtd
occurs. The template generates the CoreMedia field element for the document field
Text and the coremedia-richtext-1.0.dtd
element <div>
, which is filled with <p>
elements.
Due to the XSLT default templates working through elements recursively and copying text, the
style sheet already produces correct CoreMedia XML in this version. However, you still want
to transform the Inline Markup into <span>
elements, and need a further
template for that.
<xsl:template match="p/*"> <span> <xsl:attribute name="class"> <xsl:value-of select="local-name()"/> </xsl:attribute> <xsl:apply-templates/> </span> </xsl:template>
Example 4.31. Inline Markup
If you are familiar with XPath, you will have noticed that only the direct child elements of
<p> are handled with match="p/*"
. This is
deliberate, because span
elements in
coremedia-richtext-1.0.dtd
may not be nested and therefore you cannot take
nested markup into account with such simple means.
Our style sheet is ready now. Even if you have little experience with XSLT it should now be
quite simple to obtain the author from the <bytag>
, for example, and
place it in a string field element of your document.
To make the importer automatically executing the style sheet enter it in the configuration file of the importer:
import.transformer.20.class=XsltTransformerFactory import.transformer.20.name=XmlNews to CoreMedia import.transformer.20.property.stylesheet=/path/to/xmlnews.xsl
Example 4.32. Configuration
When the style sheet is applied to our XML example document, the following file results:
<?xml version="1.0" encoding="UTF-8"?> <coremedia> <document type="Text" path="Test/Xmlnews" name="xmlnews.xml" id="xmlnews.xml"> <version number="1"> <string name="Ueberschrift" value="Snow, Freezing Rain Batter U.S.Northeast"/> <text name="Text"> <div> <p>Snow and freezing rain punished the <span class="location">northeastern United States </span> for a second straight day on <span class="chron">Friday</span>, causing at least five weather-related deaths, closing airports and spreading misery from <span class="location"> Washington, D.C.</span>, to <span class="location">Canada</span>. </p> </div> </text> <integer name="Prioritaet" value="42"/> <string name="Quelle" value="d. Red. bekannt"/> <date name="AutoLoeschdatum"/> <date name="AutoPdatum"/> <linklist name="Bilder"/> </version> </document> </coremedia>
Example 4.33. A document