Search Manual / Version 2010
Table Of Contents
The Content Feeder can be configured to index content properties into special index
fields. You can search for content in these fields if your Search Engine
indexes these fields. To this end, the fields must be added to the file schema.xml
in the Apache
Solr config set for the Content Feeder in directory
<solr-home>/configsets/content/conf
. Please refer to the
Apache Solr
documentation for more information.
Note
Configuration not mandatory: By default, all content properties are
indexed in the index field textbody
. They are also indexed in fields whose name starts with
cm
and ends with the lowercase name of the property - if such fields exist in the index. For
example, a property Headline
is indexed in the field cmheadline
. This configuration
allows you to use different index field names.
The Content Feeder supports two types of field configuration, the
PropertyField
and the FeedablePopulator
. A PropertyField
maps a content
property to an index field and whether the property value should also be indexed in the field
textbody
. The more flexible FeedablePopulator
interface allows you to populate a
Feedable
object from a given content.
If you configure a new field in the Solr schema.xml
, you can search for text in that specific
field. Note, that searching in specific fields is not possible in the Site
Manager and CoreMedia Studio but only in
custom search applications using CoreMedia APIs or native
Search Engine APIs.
The following example adds a field with the name myfield
to the Apache
Solr schema.xml
. Fields must be configured with the attributes indexed="true"
to enable support for searching, and stored="true"
(or at least docValues="true"
)
to support partial updates. For a more information, see the Apache
Solr documentation.
<fields> ... <field name="myfield" type="text_general" stored="true" indexed="true"/> </fields>
Configuring PropertyField Beans
Beans of type PropertyField
are configured in a customize:append
element in file
applicationContext.xml
. A PropertyField
bean requires the attributes
name
, doctype
and property
. Attribute name
specifies the
index field name as configured in the Solr schema.xml
. Attribute doctype
specifies the
name of the content type and attribute property
specifies the name of the content property, which
is mapped to the index field. Furthermore, it's possible to configure whether the property's value should also be
indexed in the field textbody
. By default, it will be indexed in textbody
but you can
disable this by setting the attribute textBody="false"
. Another optional attribute
ignoreIfEmpty
configures whether a missing or empty property value should be indexed. The default
value is false
meaning an empty value is indexed.
Note that excluded content types will not be indexed even if a matching
PropertyField
is configured. The following example configures indexing of the property
headline of content type
Article into the index field myfield
. It is not
indexed in field textbody
and empty values are ignored:
<customize:append id="addFeedableProperties" bean="contentConfiguration" property="propertyFields"> <list> <bean class="com.coremedia.cms.feeder.content.PropertyField"> <property name="name" value="myfield"/> <property name="doctype" value="Article"/> <property name="property" value="headline"/> <property name="textBody" value="false"/> <property name="ignoreIfEmpty" value="true"/> </list> </bean> </customize:append>
Configuring FeedablePopulator Beans
FeedablePopulator
Spring beans are configured in the list property feedablePopulators
and/or in the list property partialUpdateFeedablePopulators
of Spring bean index
using a customize:append
element, for example in file
applicationContext.xml
. There are some existing FeedablePopulator
public API classes
that you may use. For example:
PropertyPathFeedablePopulator
: Index specific values from a struct content property.XPathFeedablePopulator
: Extracts a text fragment from an XML content property.ImageDimensionFeedablePopulator
: Set image attributes like image orientation, dimension, and size category.ContentStatusFeedablePopulator
: Set the content status (approved, deleted, etc).
Your own populator classes just need to implement the FeedablePopulator
interface and can then be
configured the same way. The method FeedablePopulator#populate
will be called with a
com.coremedia.cap.content.Content
object, that is the type parameter T
of
FeedablePopulator
implementations must be Content
or a super type of
Content
.
Populators registered at property feedablePopulators
of Spring bean index
are
called when a content gets added or updated and the whole content data is sent to the search engine.
Populators registered at property partialUpdateFeedablePopulators
are called for partial updates,
when only content metadata is sent to the search engine. You can also register a custom
FeedablePopulator
at both list properties and use method isPartialUpdate
of the passed in
Feedable to detect
whether a partial update is being processed. Method getUpdatedAspects
returns which aspects of the index document are changed with a partial update.
Caution
When you configure a FeedablePopulator
for a Solr index field, you must
make sure that the type of the index field matches the possible values. For example, you should never
configure a PropertyPathFeedablePopulator
or an XPathFeedablePopulator
to set a numeric or date index field. Even if a nested struct property at the configured path is typically
used for dates, some content may contain a text value and cause indexing errors. In such a case, you should
use a custom FeedablePopulator
implementation and check the value type instead.
PropertyPathFeedablePopulator
The PropertyPathFeedablePopulator
is configured with a dot-separated property path to index a
specific property value from a struct content property. The first name in the property path denotes the
struct property itself while the following names specify nested properties of the struct.
The constructor argument type
selects the type of the content.
The argument element
maps to the field name in the index.
Furthermore, it's possible to configure whether the
value should also be indexed in the field textbody
using the property textBody
.
By default, it will not be indexed in the
textbody
field but you can enable this by setting the property textBody
to
true
.
The following example configures a populator to feed the index field author
from a
localSettings.metadata.author
struct property path of Article
contents.
<customize:append id="addAuthorFeedablePopulator" bean="index" property="feedablePopulators"> <list> <ref bean="authorFeedablePopulator"/> </list> </customize:append> <bean class= "com.coremedia.cap.feeder.populate.PropertyPathFeedablePopulator"> <constructor-arg index="0" name="type" value="Article"/> <constructor-arg index="1" name="propertyPath" value="localSettings.metadata.author"/> <constructor-arg index="2" name="element" value="author"/> </bean>
XPathFeedablePopulator
XPathFeedablePopulators
extract text of a fragment from an XML property. The fragment is specified
with an XPath expression in the property XPath
. The required property element
maps to
the field name in the index. The property contentType
selects the type of the content and the
property property
selects the content property. Furthermore, it's possible to configure whether the
property's value should also be indexed in the field textbody
. By default, it will be indexed in
textbody
but you can disable this by setting the property textBody
to
false
. The namespaces property defines namespaces which can be used in the XPath expression.
The following example configures a populator to feed the index field tabletext
from
Text
properties in Article
contents.
<customize:append id="addFeedablePopulators" bean="index" property="feedablePopulators"> <list> <bean class="com.coremedia.cap.feeder.populate. \ XPathFeedablePopulator"> <property name="element" value="tabletext"/> <property name="contentType" value="Article"/> <property name="property" value="Text"/> <property name="textBody" value="false"/> <property name="XPath" value="/r:div/r:table"/> <property name="namespaces"> <map> <entry key="r" value="http://www.coremedia.com/2003/richtext-1.0"/> </map> </property> </bean> </list> </customize:append>
ImageDimensionFeedablePopulator
The ImageDimensionFeedablePopulator
is used to detect the orientation (portrait, square,
landscape), dimension (width, height) and size category (small, medium, large) of an image. After detection the
following index fields are set:
imageOrientation
: portrait (value=0), square (value=1) and landscape (value=2) mode.imageSizeCategory
: small (value=0), medium (value=1) and large (value=2) mode.imageWidth
: image width in pixel.imageHeight
: image height in pixel.imageMaxLength
: maximum ofimageWidth
andimageHeight
An image has portrait(landscape) mode if its height(width) is larger than its width(height). If width and height
are equal, it has square mode. An image is categorized as large(as medium) if its width is larger than or equal
to the configured largeWidth
(mediumWidth
) property and its height is also larger than or equal to the configured
largeHeight
(mediumHeight
) property. The image is small, if its width is smaller
than mediumWidth
or its height
is smaller than mediumHeight
.
To categorize image orientation (portrait, square, landscape) and image size (small, medium, large), some filter properties must be configured:
docType:
the type of the content to be indexed, including subtypeswidthPropertyName
: the property name of the content which holds the width valueheightPropertyName:
the property name of the content which holds the height valuedataPropertyName:
the property name of the content which holds the image data. The value of this object must be of typecom.coremedia.cap.common.Blob
.
You must set either widthPropertyName
and heightPropertyName
or
dataPropertyName
or both. If the two dimension properties do not exist, the blob data is read to
determine the dimension.
largeWidth:
lower bound width of large imageslargeHeight:
lower bound height of large imagesmediumWidth:
lower bound width of medium imagesmediumHeight:
lower bound height of medium images
The following example shows an ImageDimensionFeedablePopulator
configuration.
<customize:append id="addFeedablePopulators" bean="index" property="feedablePopulators"> <list> <bean class= "com.coremedia.cap.feeder.populate.ImageDimensionFeedablePopulator"> <property name="largeWidth" value="${feeder.populator.imageDimension.largeWidth}"/> <property name="largeHeight" value="${feeder.populator.imageDimension.largeHeight}"/> <property name="mediumWidth" value="${feeder.populator.imageDimension.mediumWidth}"/> <property name="mediumHeight" value="${feeder.populator.imageDimension.mediumHeight}"/> <property name="docType" value="${feeder.populator.imageDimension.docType}"/> <property name="widthPropertyName" value="${feeder.populator.imageDimension.widthPropertyName}"/> <property name="heightPropertyName" value="${feeder.populator.imageDimension.heightPropertyName}"/> <property name="dataPropertyName" value="${feeder.populator.imageDimension.dataPropertyName}"/> </bean> </list> </customize:append>
The property values of the populator bean are filtered from a property file.
ContentStatusFeedablePopulator
The ContentStatusFeedablePopulator
classifies a content in one of four status categories:
0:
in production (not approved and not deleted)1:
approved (place and content)2:
published (place and content)3:
deleted
After classification, the status value of the content is stored in the index field status
. The
following example shows a ContentStatusFeedablePopulator
configuration:
<customize:append id="addFeedablePopulators" bean="index" property="feedablePopulators"> <list> <bean class="com.coremedia.cap.feeder. \ populate.ContentStatusFeedablePopulator"/> </list> </customize:append>
Caution
Note, that the Content Feeder does not update already processed contents after changing the fields to index. A configuration change only affects newly processed contents. You must reindex as described in Section 3.5, “Reindexing”, if you want to update all contents.