5.2. Configuring the CAE Feeder

This section describes common configuration tasks. See Section 6.3, “CAE Feeder Configuration” for a detailed description of configuration settings. All properties can be configured in the file WEB-INF/application.properties of the CAE Feeder web application.

Configuring the Content Server

The CAE Feeder can be used to index content beans for content from the Content Management Server or a Live Server. Configure the Content Server for the CAE Feeder as in the following example:

repository.url=http://localhost:44441/coremedia/ior
repository.user=webserver
repository.password=webserver
repository.domain=

Example 5.1. Configure the Content Server


The property repository.url specifies the URL of the Content Server. The properties repository.user, repository.password and repository.domain define the account of the user used by the CAE Feeder to log in to the Content Server.

Configuring the Database

The CAE Feeder persists the feeding state in a database. Configure the connection to the database with the following properties:

jdbc.driver
Specifies the class of the database driver
jdbc.url
Contains the URL of the database
jdbc.user
Specifies the account name of the database user
jdbc.password
Specifies the account password of the database user

For example:

jdbc.driver=oracle.jdbc.driver.OracleDriver
jdbc.url=jdbc:oracle:thin:@localhost:1521:oracle
jdbc.user=username
jdbc.password=password

Example 5.2. Configure the database


[Caution]Caution

Do not run multiple CAE Feeder applications on the same database schema.

Configuring the Search Engine

The configuration of the connection to the CoreMedia Search Engine includes setting host name and port of the installed search engine and the name of the target Solr core. This is done by setting the properties feeder.solr.url and feeder.solr.collection. Each feeding application needs a different index. Do not use the same index for multiple instances of the CAE Feeder or the Content Feeder.

If the Apache Solr web application has been secured and needs HTTP basic authentication, you must also configure the required user name and password in the properties feeder.solr.username and feeder.solr.password.

feeder.solr.url=http://localhost:8001/solr/preview
feeder.solr.username=
feeder.solr.password=
feeder.solr.collection=preview

Example 5.3. Configure the Search Engine for Apache Solr


Configuring Tika

Apache Tika is used to extract text from blob properties for indexing. It provides parsers for various formats, which can be customized in a special Apache Tika XML configuration file. The default configuration covers typical formats so that a custom configuration is rarely needed. If you need to fine-tune the configuration of Apache Tika, please have a look at the documentation of Apache Tika for the format of the Tika Config XML file. The location of this file can be configured with the Spring configuration property feeder.tika.config. The value of this property is a Spring Resource location. The following example configures an Apache Tika Config file from the local file system:

Example

feeder.tika.config=file:/opt/path/tika-config.xml

Configuring Tika metadata extraction

In addition to extracting body text, Tika can extract metadata for some binary formats such as the creator of a Microsoft Word file. You can use the following properties to extract and index metadata from binary formats:

  • feeder.tika.appendMetadata

  • feeder.tika.copyMetadata

The property feeder.tika.appendMetadata takes a comma-separated list of metadata identifiers. The CAE Feeder simply appends the matching metadata values to the indexed body text when Apache Tika extracts such a value.

The property feeder.tika.copyMetadata takes a comma-separated list where each entry consists of a metadata identifier followed by an equal sign (=) and the name of the index field the metadata should be copied to. When a matching metadata value is found, it will be stored in the configured index field. Note that with Apache Solr target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier. See also Section 5.4.4, “Modifying the Search Index”.

Example
feeder.tika.copyMetadata=creator=author

The above example configures the CAE Feeder to store the creator as extracted from the metadata in the index field author. You have to declare the index field in the Solr schema for this to work.

Metadata identifiers are specific to Apache Tika. You can find some of them in the API documentation of Apache Tika class org.apache.tika.metadata.TikaCoreProperties.

Configuring Error Handling

The CAE Feeder automatically retries operation after some communication problems with the Solr Search Server. The following properties configure the retry behavior:

Property Value Default Description
feeder.solr.sendRetryDelay time in seconds 30

The delay between a failed batch sending and the next try.

feeder.solr.connection.timeout time in milliseconds 0 The connection timeout set on the SolrJ SolrServer. It determines how long the client waits to establish a connection without any response from the server. The default value 0 means, that it will wait forever.
feeder.solr.socket.timeout time in milliseconds 600000 (10 minutes) The socket timeout set on the SolrJ SolrServer. It determines how long the client waits for a response from the server after the connection was established and the request was already sent.

Table 5.1.  Properties for retry on Solr server