This section describes common configuration tasks. See
Section 6.3, “CAE Feeder Configuration” for a detailed description of configuration
settings. All properties can be configured in the file
WEB-INF/application.properties
of the CAE
Feeder web application.
Configuring the Content Server
The CAE Feeder can be used to index content beans for content from the Content Management Server or a Live Server. Configure the Content Server for the CAE Feeder as in the following example:
repository.url=http://localhost:44441/coremedia/ior repository.user=webserver repository.password=webserver repository.domain=
Example 5.1. Configure the Content Server
The property repository.url
specifies the URL of the
Content Server. The properties
repository.user
, repository.password
and
repository.domain
define the account of the user used by the
CAE Feeder to log in to the
Content Server.
Configuring the Database
The CAE Feeder persists the feeding state in a database. Configure the connection to the database with the following properties:
jdbc.driver
- Specifies the class of the database driver
jdbc.url
- Contains the URL of the database
jdbc.user
- Specifies the account name of the database user
jdbc.password
- Specifies the account password of the database user
For example:
jdbc.driver=oracle.jdbc.driver.OracleDriver jdbc.url=jdbc:oracle:thin:@localhost:1521:oracle jdbc.user=username jdbc.password=password
Example 5.2. Configure the database
Caution | |
---|---|
Do not run multiple CAE Feeder applications on the same database schema. |
Configuring the Search Engine
The configuration of the connection to the CoreMedia Search
Engine includes setting host name and port of the installed search engine and the
name of the target Solr core. This is done by setting the properties
feeder.solr.url
and feeder.solr.collection
. Each feeding application
needs a different index. Do not use the same index for multiple instances of
the CAE Feeder or the Content
Feeder.
If the Apache Solr web application has been secured and needs HTTP basic authentication, you
must also configure the required user name and password in the properties
feeder.solr.username
and feeder.solr.password
.
feeder.solr.url=http://localhost:8001/solr/preview feeder.solr.username= feeder.solr.password= feeder.solr.collection=preview
Example 5.3. Configure the Search Engine for Apache Solr
Configuring Tika
Apache Tika is used to extract text from blob properties for indexing. It provides parsers for various formats,
which can be customized in a special Apache Tika XML configuration file. The default configuration covers
typical formats so that a custom configuration is rarely needed. If you need to fine-tune the
configuration of Apache Tika, please have a look at the documentation of Apache Tika for the format of the
Tika Config XML file. The location of this file can be configured with the Spring configuration
property feeder.tika.config
. The value of this property is a Spring Resource location.
The following example configures an Apache Tika Config file from the local file system:
Example
feeder.tika.config=file:/opt/path/tika-config.xml
Configuring Tika metadata extraction
In addition to extracting body text, Tika can extract metadata for some binary formats such as the creator of a Microsoft Word file. You can use the following properties to extract and index metadata from binary formats:
feeder.tika.appendMetadata
feeder.tika.copyMetadata
The property feeder.tika.appendMetadata
takes a comma-separated list of metadata identifiers.
The CAE Feeder simply appends the matching metadata values to the indexed body
text when Apache Tika extracts such a value.
The property feeder.tika.copyMetadata
takes a comma-separated list where each entry consists
of a metadata identifier followed by an equal sign (=
) and the name of the index field
the metadata should be copied to. When a matching metadata value is found, it will be stored in the configured
index field. Note that with Apache Solr target index fields must be defined as
multiValued="true"
to avoid indexing errors if there are multiple metadata values with the same
identifier. See also Section 5.4.4, “Modifying the Search Index”.
Example
feeder.tika.copyMetadata=creator=author
The above example configures the CAE Feeder to store the creator as extracted
from the metadata in the index field author
. You have to declare the index field in the
Solr schema for this to work.
Metadata identifiers are specific to Apache Tika. You can find some of them in the API documentation of
Apache Tika class org.apache.tika.metadata.TikaCoreProperties
.
Configuring Error Handling
The CAE Feeder automatically retries operation after some communication problems with the Solr Search Server. The following properties configure the retry behavior:
Property | Value | Default | Description |
---|---|---|---|
feeder.solr.sendRetryDelay
| time in seconds | 30 |
The delay between a failed batch sending and the next try. |
feeder.solr.connection.timeout
| time in milliseconds | 0 | The connection timeout set on the SolrJ
SolrServer . It determines how long the client waits to establish a connection
without any response from the server. The default value 0 means, that it
will wait forever.
|
feeder.solr.socket.timeout
| time in milliseconds | 600000 (10 minutes) |
The socket timeout set on the SolrJ
SolrServer . It determines how long the client waits for a response from the
server after the connection was established and the request was already sent.
|
Table 5.1. Properties for retry on Solr server