In this reference chapter you will find a description of the CAE Feeder configuration properties.
Property | Value | Default | Description |
---|---|---|---|
| user name | none | The name of the user to connect to the CoreMedia Content Server. |
| password | none | The password of the user to connect to the CoreMedia Content Server. |
| domain | none | The domain of the user to connect to the CoreMedia Content Server. Empty String for a built-in user. |
| URL | none | The URL to the IOR of the CoreMedia Content Server. |
| driver class | none | The class of the database driver. For example:
oracle.jdbc.driver.OracleDriver
|
| URL | none | The URL to connect to the database. |
| user name | none | The name of the user to connect to the database. |
| password | none | The password of the user to connect to the database. |
| String | /Sites
| A comma-separated list of base folders for which content beans are indexed. |
| String |
| A comma-separated list of content types for which content beans are indexed. |
| Boolean |
| Specifies whether the sub types of the content types configured with property
feeder.contentSelector.contentTypes are selected as well.
|
| int | 2000 | Capacity of the CAE Feeder's executor queue, which is internally used to transfer evaluated values |
| milliseconds | 60000 | The delay in milliseconds to wait before the CAE Feeder retries to access the source data after failures to do so. |
| bytes | 20971520 (20 MB) | The maximum size of a batch in bytes. The CAE Feeder sends a batch to the Search Engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only. |
| int | 500 | The maximum number of entries in a batch. If the maximum number is reached, the CAE Feeder sends the batch to the Search Engine. |
| int | 5 | The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received. |
| int | 1 | The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received. |
| milliseconds | 60000 | The CAE Feeder sends a batch which only
contains retried entries and is not full with regard to the
feeder.maxBatchSize and feeder.maxBatchBytes properties
after the CAE Feeder was idle for the time
configured in this property. A retried entry is an entry which was sent to the
Search Engine before but could not be indexed
successfully. If the batch contains entries which are not retried, the value of
property feeder.sendIdleDelay is used instead. |
| milliseconds | 600000 | The maximum time in milliseconds between the time the CAE Feeder received an error from the Search
Engine and the time, the CAE Feeder
tries to send the failed entry as part of a batch to the Search Engine again. The time is exceeded if an error occurs while
contacting the Search Engine. If the batch
contains entries which are not retried, the value of property
feeder.sendMaxDelay is used instead. |
| number of bytes | -1 | The maximum size in bytes for the value of a bean property or -1 for no limitation. Larger values are ignored and will not be sent to the Search Engine. |
| comma-separated list of included MIME types | */* |
List of included MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator.
For details, see the API documentation of method Example
Only indexes blobs of MIME type |
| comma-separated list of excluded MIME types |
List of excluded MIME types for blob properties configured for indexing at the
BeanMappingFeedablePopulator.
For details, see the API documentation of method Example
Indexes all blobs except blobs of MIME type | |
| milliseconds | 10000 | The CAE Feeder sends a batch which is not
full with regard to the feeder.maxBatchSize and
feeder.maxBatchBytes properties after the CAE
Feeder was idle for the configured time in milliseconds.
|
| milliseconds | 120000 | The maximum time in milliseconds after which the CAE
Feeder sends a batch which is not full with regard to the
feeder.maxBatchSize and feeder.maxBatchBytes properties.
The time may be exceeded if an error occurs while contacting the Search Engine or if the CAE
Feeder is under high load. |
feeder.tika.config | location of Apache Tika Config XML | (empty) |
The location of an optional custom Apache Tika Config XML file with custom Tika parsers.
The value is a Spring Resource location, for example a value such as
|
| comma-separated list of metadata identifiers | (empty) |
Comma-separated list of metadata identifiers extracted from blob properties by Apache Tika that are appended to the extracted body text. See Section 5.2, “Configuring the CAE Feeder” |
| comma-separated list of entries for the format <metadata identifier>=<index field name> | (empty) |
Comma-separated list of metadata identifiers extracted from blob properties by Apache Tika and index field names to copy the metadata to. See Section 5.2, “Configuring the CAE Feeder” |
| milliseconds | 120000 (2 minutes)
| Set the maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction. |
| milliseconds | 15000 (15 seconds)
| Set the time after which a warning is logged when text extraction from binary data with Apache Tika takes some time. |
| number of threads | 50 | Number of evaluator threads in the CAE Feeder. The number of threads influences performance not only because evaluations can execute concurrently but also because higher values increase the probability that the CAE Feeder writes the state of multiple evaluations to the database in one database transaction. |
| milliseconds | 0 | Minimum delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component. Higher values lead to reduced throughput. |
| milliseconds | 10000 | Delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component if the application is idle. Smaller values can be configured to reduce the latency of the CAE Feeder but may lead to increased load on the database. |
| maximum number of changed keys per database transaction | 2500 | The maximum weight of a database transaction to change stored dependencies. The weight is interpreted as the number of changed keys, that is, a transaction with one deleted key has weight 1. Multiple transactions will be used to process an event that causes the invalidation of more keys. |
Table 6.14. Configuration of general properties independent from the type of the search engine
The following properties are only used for a CoreMedia Search Engine based on Apache Solr:
Property | Value | Default | Description |
---|---|---|---|
feeder.solr.url | URL | http://localhost:8082/solr/coremedia | The URL where the CAE Feeder can reach the Search Engine. The URL points to the Apache Solr core for the CAE Feeder. |
| collection name | coremedia | The collection that should be used by the CAE Feeder. |
| user name or empty | (empty) | User name for HTTP Basic authentication when connecting to the Apache Solr web application. Leave empty for no authentication. |
| user name or empty | (empty) | Password for HTTP Basic authentication when connecting to the Apache Solr web application. |
| milliseconds | 30000 | The delay in milliseconds to wait before sending a batch to the Search Engine again after sending failed with an error in the Search Engine. |
feeder.solr.connection.timeout | time in milliseconds | 0 | The connection timeout set on the
SolrJ SolrServer . It determines how long the client waits to establish a connection
without any response from the server. The default value of 0 means it will wait forever. |
feeder.solr.socket.timeout | time in milliseconds | 600000 (10 minutes) | The socket timeout set on the
SolrJ SolrServer . It determines how long the client waits for a response from the
server after the connection was established and the request was already sent.
The value of 0 means it will wait forever.
|
Table 6.15. Configuration properties for Apache Solr