CoreMedia Search Manual/5.2. Configuring the CAE Feeder

5.2. Configuring the CAE Feeder

This section describes common configuration tasks. See Section 6.3, “CAE Feeder Configuration” for a detailed description of configuration settings. All properties can be configured in the file WEB-INF/application.properties of the CAE Feeder web application.

Configuring the Content Server

The CAE Feeder can be used to index content beans for content from the Content Management Server or a Live Server. Configure the Content Server for the CAE Feeder as in the following example:

repository.url=http://localhost:44441/coremedia/ior
repository.user=webserver
repository.password=webserver
repository.domain=

Example 5.1. Configure the Content Server

The property repository.url specifies the URL of the Content Server. The properties repository.user, repository.password and repository.domain define the account of the user used by the CAE Feeder to log in to the Content Server.

Configuring the Database

The CAE Feeder persists the feeding state in a database. Configure the connection to the database with the following properties:

jdbc.driver: Specifies the class of the database driver
jdbc.url: Contains the URL of the database
jdbc.user: Specifies the account name of the database user
jdbc.password: Specifies the account password of the database user

For example:

jdbc.driver=oracle.jdbc.driver.OracleDriver
jdbc.url=jdbc:oracle:thin:@localhost:1521:oracle
jdbc.user=username
jdbc.password=password

Example 5.2. Configure the database

	Caution
	Do not run multiple CAE Feeder applications on the same database schema.

Configuring the Search Engine

The configuration of the connection to the CoreMedia Search Engine includes setting host name and port of the installed search engine and the name of the target Solr core. This is done by setting the properties feeder.solr.url and feeder.solr.collection. Each feeding application needs a different index. Do not use the same index for multiple instances of the CAE Feeder or the Content Feeder.

If the Apache Solr web application has been secured and needs HTTP basic authentication, you must also configure the required user name and password in the properties feeder.solr.username and feeder.solr.password.

feeder.solr.url=http://localhost:8001/solr/preview
feeder.solr.username=
feeder.solr.password=
feeder.solr.collection=preview

Example 5.3. Configure the Search Engine for Apache Solr

Configuring Tika

Apache Tika is used to extract text from blob properties for indexing. It provides parsers for various formats, which can be customized in a special Apache Tika XML configuration file. The default configuration covers typical formats so that a custom configuration is rarely needed. If you need to fine-tune the configuration of Apache Tika, please have a look at the documentation of Apache Tika for the format of the Tika Config XML file. The location of this file can be configured with the Spring configuration property feeder.tika.config. The value of this property is a Spring Resource location. The following example configures an Apache Tika Config file from the local file system:

Example

feeder.tika.config=file:/opt/path/tika-config.xml

Configuring Tika metadata extraction

In addition to extracting body text, Tika can extract metadata for some binary formats such as the creator of a Microsoft Word file. You can use the following properties to extract and index metadata from binary formats:

feeder.tika.appendMetadata
feeder.tika.copyMetadata

The property feeder.tika.appendMetadata takes a comma-separated list of metadata identifiers. The CAE Feeder simply appends the matching metadata values to the indexed body text when Apache Tika extracts such a value.

The property feeder.tika.copyMetadata takes a comma-separated list where each entry consists of a metadata identifier followed by an equal sign (=) and the name of the index field the metadata should be copied to. When a matching metadata value is found, it will be stored in the configured index field. Note that with Apache Solr target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier. See also Section 5.4.4, “Modifying the Search Index”.

Example

feeder.tika.copyMetadata=creator=author

The above example configures the CAE Feeder to store the creator as extracted from the metadata in the index field author. You have to declare the index field in the Solr schema for this to work.

Metadata identifiers are specific to Apache Tika. You can find some of them in the API documentation of Apache Tika class org.apache.tika.metadata.TikaCoreProperties.

Configuring Error Handling

The CAE Feeder automatically retries operation after some communication problems with the Solr Search Server. The following properties configure the retry behavior:

Property	Value	Default	Description
`feeder.solr.sendRetryDelay`	time in seconds	30	The delay between a failed batch sending and the next try.
`feeder.solr.connection.timeout`	time in milliseconds	0	The connection timeout set on the SolrJ `SolrServer`. It determines how long the client waits to establish a connection without any response from the server. The default value 0 means, that it will wait forever.
`feeder.solr.socket.timeout`	time in milliseconds	600000 (10 minutes)	The socket timeout set on the SolrJ `SolrServer`. It determines how long the client waits for a response from the server after the connection was established and the request was already sent.

Table 5.1. Properties for retry on Solr server

CoreMedia Search Manual, Version 7.5.45-10 Chapter 5. Searching for CAE Content Beans | 5.2. Configuring the CAE Feeder