6.3. CAE Feeder Configuration

In this reference chapter you will find a description of the CAE Feeder configuration properties.

Property Value Default Description

repository.user

user name none The name of the user to connect to the CoreMedia Content Server.

repository.password

password none The password of the user to connect to the CoreMedia Content Server.

repository.domain

domain none The domain of the user to connect to the CoreMedia Content Server. Empty String for a built-in user.

repository.url

URL none The URL to the IOR of the CoreMedia Content Server.

jdbc.driver

driver class none The class of the database driver. For example: oracle.jdbc.driver.OracleDriver

jdbc.url

URL none The URL to connect to the database.

jdbc.user

user name none The name of the user to connect to the database.

jdbc.password

password none The password of the user to connect to the database.

feeder.contentSelector.basePath

String /Sites A comma-separated list of base folders for which content beans are indexed.

feeder.contentSelector.contentTypes

String

Document_

A comma-separated list of content types for which content beans are indexed.

feeder.contentSelector.includeSubTypes

Boolean

true

Specifies whether the sub types of the content types configured with property feeder.contentSelector.contentTypes are selected as well.

feeder.executorQueueCapacity

int 2000 Capacity of the CAE Feeder's executor queue, which is internally used to transfer evaluated values

feeder.executorRetryDelay

milliseconds 60000 The delay in milliseconds to wait before the CAE Feeder retries to access the source data after failures to do so.

feeder.maxBatchBytes

bytes 20971520 (20 MB) The maximum size of a batch in bytes. The CAE Feeder sends a batch to the Search Engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only.

feeder.maxBatchSize

int 500 The maximum number of entries in a batch. If the maximum number is reached, the CAE Feeder sends the batch to the Search Engine.

feeder.maxOpenBatches

int 5 The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received.

feeder.maxProcessedBatches

int 1 The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received.

feeder.retrySendIdleDelay

milliseconds 60000 The CAE Feeder sends a batch which only contains retried entries and is not full with regard to the feeder.maxBatchSize and feeder.maxBatchBytes properties after the CAE Feeder was idle for the time configured in this property. A retried entry is an entry which was sent to the Search Engine before but could not be indexed successfully. If the batch contains entries which are not retried, the value of property feeder.sendIdleDelay is used instead.

feeder.retrySendMaxDelay

milliseconds 600000 The maximum time in milliseconds between the time the CAE Feeder received an error from the Search Engine and the time, the CAE Feeder tries to send the failed entry as part of a batch to the Search Engine again. The time is exceeded if an error occurs while contacting the Search Engine. If the batch contains entries which are not retried, the value of property feeder.sendMaxDelay is used instead.

feeder.beanPropertyMaxBytes

number of bytes -1 The maximum size in bytes for the value of a bean property or -1 for no limitation. Larger values are ignored and will not be sent to the Search Engine.

feeder.beanMapping.mimeType.includes

comma-separated list of included MIME types */*

List of included MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator. For details, see the API documentation of method setMimeTypeIncludes of com.coremedia.cap.feeder.bean.BeanMappingFeedablePopulator

Example

feeder.beanMapping.mimeType.includes=text/*

Only indexes blobs of MIME type text/*.

feeder.beanMapping.mimeType.excludes

comma-separated list of excluded MIME types  

List of excluded MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator. For details, see the API documentation of method setMimeTypeExcludes of com.coremedia.cap.feeder.bean.BeanMappingFeedablePopulator

Example

feeder.beanMapping.mimeType.excludes=text/xml

Indexes all blobs except blobs of MIME type text/xml.

feeder.sendIdleDelay

milliseconds 10000 The CAE Feeder sends a batch which is not full with regard to the feeder.maxBatchSize and feeder.maxBatchBytes properties after the CAE Feeder was idle for the configured time in milliseconds.

feeder.sendMaxDelay

milliseconds 120000 The maximum time in milliseconds after which the CAE Feeder sends a batch which is not full with regard to the feeder.maxBatchSize and feeder.maxBatchBytes properties. The time may be exceeded if an error occurs while contacting the Search Engine or if the CAE Feeder is under high load.
feeder.tika.config location of Apache Tika Config XML (empty)

The location of an optional custom Apache Tika Config XML file with custom Tika parsers. The value is a Spring Resource location, for example a value such as file:/path/tika-config.xml can be used to reference a local file. Use an empty value for the default configuration.

feeder.tika.appendMetadata

comma-separated list of metadata identifiers (empty)

Comma-separated list of metadata identifiers extracted from blob properties by Apache Tika that are appended to the extracted body text. See Section 5.2, “Configuring the CAE Feeder”

feeder.tika.copyMetadata

comma-separated list of entries for the format <metadata identifier>=<index field name> (empty)

Comma-separated list of metadata identifiers extracted from blob properties by Apache Tika and index field names to copy the metadata to. See Section 5.2, “Configuring the CAE Feeder”

feeder.tika.timeout.milliseconds

milliseconds120000 (2 minutes) Set the maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction.

feeder.tika.warn.milliseconds

milliseconds15000 (15 seconds) Set the time after which a warning is logged when text extraction from binary data with Apache Tika takes some time.

proactiveengine.senders.evaluators

number of threads 50 Number of evaluator threads in the CAE Feeder. The number of threads influences performance not only because evaluations can execute concurrently but also because higher values increase the probability that the CAE Feeder writes the state of multiple evaluations to the database in one database transaction.

proactiveengine.senders.delay

milliseconds 0 Minimum delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component. Higher values lead to reduced throughput.

proactiveengine.senders.idledelay

milliseconds 10000 Delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component if the application is idle. Smaller values can be configured to reduce the latency of the CAE Feeder but may lead to increased load on the database.

dependencyStore.maxTransactionWeight

maximum number of changed keys per database transaction 2500 The maximum weight of a database transaction to change stored dependencies. The weight is interpreted as the number of changed keys, that is, a transaction with one deleted key has weight 1. Multiple transactions will be used to process an event that causes the invalidation of more keys.

Table 6.14. Configuration of general properties independent from the type of the search engine


The following properties are only used for a CoreMedia Search Engine based on Apache Solr:

Property Value Default Description
feeder.solr.url URL http://localhost:8082/solr/coremedia The URL where the CAE Feeder can reach the Search Engine. The URL points to the Apache Solr core for the CAE Feeder.

feeder.solr.collection

collection name coremedia The collection that should be used by the CAE Feeder.

feeder.solr.username

user name or empty (empty) User name for HTTP Basic authentication when connecting to the Apache Solr web application. Leave empty for no authentication.

feeder.solr.password

user name or empty (empty) Password for HTTP Basic authentication when connecting to the Apache Solr web application.

feeder.solr.sendRetryDelay

milliseconds 30000 The delay in milliseconds to wait before sending a batch to the Search Engine again after sending failed with an error in the Search Engine.
feeder.solr.connection.timeout time in milliseconds 0 The connection timeout set on the SolrJ SolrServer. It determines how long the client waits to establish a connection without any response from the server. The default value of 0 means it will wait forever.
feeder.solr.socket.timeout time in milliseconds 600000 (10 minutes) The socket timeout set on the SolrJ SolrServer. It determines how long the client waits for a response from the server after the connection was established and the request was already sent. The value of 0 means it will wait forever.

Table 6.15. Configuration properties for Apache Solr