Search Manual / 4.2.2.3 Configuring Fields to Index in

Search Manual / Version 2204

4.2.2.3 Configuring Fields to Index in

The Content Feeder can be configured to index content properties into special index fields. You can search for content in these fields if your Search Engine indexes these fields. To this end, the fields must be added to the file schema.xml in the Apache Solr config set for the Content Feeder in directory <solr-home>/configsets/content/conf. Please refer to the Apache Solr documentation for more information.

Note

Configuration not mandatory: By default, all content properties are indexed in the index field textbody. They are also indexed in fields whose name starts with cm and ends with the lowercase name of the property - if such fields exist in the index. For example, a property Headline is indexed in the field cmheadline. This configuration allows you to use different index field names.

The Content Feeder supports two types of field configuration, the PropertyField and the FeedablePopulator. A PropertyField maps a content property to an index field and whether the property value should also be indexed in the field textbody. The more flexible FeedablePopulator interface allows you to populate a Feedable object from a given content.

If you configure a new field in the Solr schema.xml, you can search for text in that specific field. Note, that searching in specific fields is not possible in the Site Manager and CoreMedia Studio but only in custom search applications using CoreMedia APIs or native Search Engine APIs.

The following example adds a field with the name myfield to the Apache Solr schema.xml. Fields must be configured with the attributes indexed="true" to enable support for searching, and stored="true" (or at least docValues="true") to support partial updates. For a more information, see the Apache Solr documentation.

<fields>
  ...
  <field name="myfield" type="text_general"
                        stored="true" indexed="true"/>
</fields>

Configuring PropertyField Beans

Beans of type PropertyField are configured in a customize:append element in file applicationContext.xml. A PropertyField bean requires the attributes name, doctype and property. Attribute name specifies the index field name as configured in the Solr schema.xml. Attribute doctype specifies the name of the content type and attribute property specifies the name of the content property, which is mapped to the index field. Furthermore, it's possible to configure whether the property's value should also be indexed in the field textbody. By default, it will be indexed in textbody but you can disable this by setting the attribute textBody="false". Another optional attribute ignoreIfEmpty configures whether a missing or empty property value should be indexed. The default value is false meaning an empty value is indexed.

Note that excluded content types will not be indexed even if a matching PropertyField is configured. The following example configures indexing of the property headline of content type Article into the index field myfield. It is not indexed in field textbody and empty values are ignored:

<customize:append id="addFeedableProperties" 
bean="contentConfiguration" property="propertyFields">
  <list>
    <bean class="com.coremedia.cms.feeder.content.PropertyField">
      <property name="name" value="myfield"/>
      <property name="doctype" value="Article"/>
      <property name="property" value="headline"/>
      <property name="textBody" value="false"/>
      <property name="ignoreIfEmpty" value="true"/>
  </list>
</bean>
</customize:append>

Configuring FeedablePopulator Beans

FeedablePopulator Spring beans are configured in the list property feedablePopulators and/or in the list property partialUpdateFeedablePopulators of Spring bean index using a customize:append element, for example in file applicationContext.xml. There are some existing FeedablePopulator public API classes that you may use. For example:

PropertyPathFeedablePopulator: Index specific values from a struct content property.
XPathFeedablePopulator: Extracts a text fragment from an XML content property.
ImageDimensionFeedablePopulator: Set image attributes like image orientation, dimension, and size category.
ContentStatusFeedablePopulator: Set the content status (approved, deleted, etc).

Your own populator classes just need to implement the FeedablePopulator interface and can then be configured the same way. The method FeedablePopulator#populate will be called with a com.coremedia.cap.content.Content object, that is the type parameter T of FeedablePopulator implementations must be Content or a super type of Content.

Populators registered at property feedablePopulators of Spring bean index are called when a content gets added or updated and the whole content data is sent to the search engine. Populators registered at property partialUpdateFeedablePopulators are called for partial updates, when only content metadata is sent to the search engine. You can also register a custom FeedablePopulator at both list properties and use method isPartialUpdate of the passed in Feedable to detect whether a partial update is being processed. Method getUpdatedAspects returns which aspects of the index document are changed with a partial update.

Caution

When you configure a FeedablePopulator for a Solr index field, you must make sure that the type of the index field matches the possible values. For example, you should never configure a PropertyPathFeedablePopulator or an XPathFeedablePopulator to set a numeric or date index field. Even if a nested struct property at the configured path is typically used for dates, some content may contain a text value and cause indexing errors. In such a case, you should use a custom FeedablePopulator implementation and check the value type instead.

PropertyPathFeedablePopulator

The PropertyPathFeedablePopulator is configured with a dot-separated property path to index a specific property value from a struct content property. The first name in the property path denotes the struct property itself while the following names specify nested properties of the struct. The constructor argument type selects the type of the content. The argument element maps to the field name in the index. Furthermore, it's possible to configure whether the value should also be indexed in the field textbody using the property textBody. By default, it will not be indexed in the textbody field but you can enable this by setting the property textBody to true.

The following example configures a populator to feed the index field author from a localSettings.metadata.author struct property path of Article contents.

<customize:append id="addAuthorFeedablePopulator"
 bean="index" property="feedablePopulators">
  <list>
    <ref bean="authorFeedablePopulator"/>
  </list>
</customize:append>

<bean class=
"com.coremedia.cap.feeder.populate.PropertyPathFeedablePopulator">
  <constructor-arg index="0" name="type" value="Article"/>
  <constructor-arg index="1" name="propertyPath"
                   value="localSettings.metadata.author"/>
  <constructor-arg index="2" name="element" value="author"/>
</bean>

XPathFeedablePopulator

XPathFeedablePopulators extract text of a fragment from an XML property. The fragment is specified with an XPath expression in the property XPath. The required property element maps to the field name in the index. The property contentType selects the type of the content and the property property selects the content property. Furthermore, it's possible to configure whether the property's value should also be indexed in the field textbody. By default, it will be indexed in textbody but you can disable this by setting the property textBody to false. The namespaces property defines namespaces which can be used in the XPath expression.

The following example configures a populator to feed the index field tabletext from Text properties in Article contents.

<customize:append id="addFeedablePopulators" 
 bean="index" property="feedablePopulators">
  <list>
    <bean 
     class="com.coremedia.cap.feeder.populate. \
      XPathFeedablePopulator">
      <property name="element" value="tabletext"/>
      <property name="contentType" value="Article"/>
      <property name="property" value="Text"/>
      <property name="textBody" value="false"/>
      <property name="XPath" value="/r:div/r:table"/>
      <property name="namespaces">
        <map>
 <entry key="r" 
  value="http://www.coremedia.com/2003/richtext-1.0"/>
        </map>
      </property>
    </bean>
  </list>
</customize:append>

ImageDimensionFeedablePopulator

The ImageDimensionFeedablePopulator is used to detect the orientation (portrait, square, landscape), dimension (width, height) and size category (small, medium, large) of an image. After detection the following index fields are set:

imageOrientation: portrait (value=0), square (value=1) and landscape (value=2) mode.
imageSizeCategory: small (value=0), medium (value=1) and large (value=2) mode.
imageWidth: image width in pixel.
imageHeight: image height in pixel.
imageMaxLength: maximum of imageWidth and imageHeight

An image has portrait(landscape) mode if its height(width) is larger than its width(height). If width and height are equal, it has square mode. An image is categorized as large(as medium) if its width is larger than or equal to the configured largeWidth (mediumWidth) property and its height is also larger than or equal to the configured largeHeight (mediumHeight) property. The image is small, if its width is smaller than mediumWidth or its height is smaller than mediumHeight.

To categorize image orientation (portrait, square, landscape) and image size (small, medium, large), some filter properties must be configured:

docType: the type of the content to be indexed, including subtypes
widthPropertyName: the property name of the content which holds the width value
heightPropertyName: the property name of the content which holds the height value
dataPropertyName: the property name of the content which holds the image data. The value of this object must be of type com.coremedia.cap.common.Blob.

You must set either widthPropertyName and heightPropertyName or dataPropertyName or both. If the two dimension properties do not exist, the blob data is read to determine the dimension.

largeWidth: lower bound width of large images
largeHeight: lower bound height of large images
mediumWidth: lower bound width of medium images
mediumHeight: lower bound height of medium images

The following example shows an ImageDimensionFeedablePopulator configuration.

<customize:append id="addFeedablePopulators" 
 bean="index" property="feedablePopulators">
  <list>
    <bean 
     class=
"com.coremedia.cap.feeder.populate.ImageDimensionFeedablePopulator">
      <property name="largeWidth" 
       value="${feeder.populator.imageDimension.largeWidth}"/>
      <property name="largeHeight" 
       value="${feeder.populator.imageDimension.largeHeight}"/>
      <property name="mediumWidth" 
       value="${feeder.populator.imageDimension.mediumWidth}"/>
      <property name="mediumHeight" 
       value="${feeder.populator.imageDimension.mediumHeight}"/>
      <property name="docType" 
       value="${feeder.populator.imageDimension.docType}"/>
      <property name="widthPropertyName" 
       value="${feeder.populator.imageDimension.widthPropertyName}"/>
      <property name="heightPropertyName" 
       value="${feeder.populator.imageDimension.heightPropertyName}"/>
      <property name="dataPropertyName" 
       value="${feeder.populator.imageDimension.dataPropertyName}"/>
    </bean>  
  </list>
</customize:append>

The property values of the populator bean are filtered from a property file.

ContentStatusFeedablePopulator

The ContentStatusFeedablePopulator classifies a content in one of four status categories:

0: in production (not approved and not deleted)
1: approved (place and content)
2: published (place and content)
3: deleted

After classification, the status value of the content is stored in the index field status. The following example shows a ContentStatusFeedablePopulator configuration:

<customize:append id="addFeedablePopulators" 
bean="index" property="feedablePopulators">
  <list>
    <bean class="com.coremedia.cap.feeder. \
   populate.ContentStatusFeedablePopulator"/>
  </list>
</customize:append>