Deployment Manual / Version 2207
Table Of ContentsProperties for general configuration
| |
Value | user name |
Default | none |
Description | The name of the user to connect to the CoreMedia Content Server. |
| |
Value | password |
Default | none |
Description | The password of the user to connect to the CoreMedia Content Server. |
| |
Value | domain |
Default | none |
Description | The domain of the user to connect to the CoreMedia Content Server. Empty String for a built-in user. |
| |
Value | URL |
Default | none |
Description | The URL to the IOR of the CoreMedia Content Server. |
| |
Value | driver class |
Default | none |
Description | The class of the database driver. For example:
oracle.jdbc.driver.OracleDriver
|
| |
Value | URL |
Default | none |
Description | The URL to connect to the database. |
| |
Value | user name |
Default | none |
Description | The name of the user to connect to the database. |
| |
Value | the user name for the database login |
Default | value of jdbc.user |
Description | The user name for a database login. If not set, the value of "jdbc.user" will be used to log in to the database. In some cases the login username differs from the actual user, e.g. with PostgreSQL on Azure a postfix on the user name is necessary to log in. Set this property additionally to jdbc.user. (e.g. jdbc.login-user-name=username@domain jdbc.user=username). |
| |
Value | password |
Default | none |
Description | The password of the user to connect to the database. |
| |
Value | String |
Default | /Sites
|
Description | A comma-separated list of base folders for which content beans are indexed. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing. |
| |
Value | String |
Default |
|
Description | A comma-separated list of content types for which content beans are indexed. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing. |
| |
Value | Boolean |
Default |
|
Description |
Specifies whether the sub types of the content types configured with property
feeder.contentSelector.contentTypes are selected as well.
Changing this property will not trigger any re-indexing of already indexed content.
See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing.
|
| |
Value | int |
Default | 2000 |
Description | Capacity of the CAE Feeder's executor queue, which is internally used to transfer evaluated values |
| |
Value | milliseconds |
Default | 60000 |
Description | The delay in milliseconds to wait before the CAE Feeder retries to access the source data after failures to do so. |
| |
Value | bytes |
Default | 20971520 (20 MB) |
Description | The maximum size of a batch in bytes. The CAE Feeder sends a batch to the Search Engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only. |
| |
Value | int |
Default | 500 |
Description | The maximum number of entries in a batch. If the maximum number is reached, the CAE Feeder sends the batch to the Search Engine. |
| |
Value | int |
Default | 5 |
Description | The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received. |
| |
Value | int |
Default | 1 |
Description | The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received. |
| |
Value | milliseconds |
Default | 60000 |
Description | The CAE Feeder sends a batch which only
contains retried entries and is not full with regard to the
feeder.batch.max-size and feeder.batch.max-bytes properties
after the CAE Feeder was idle for the time
configured in this property. A retried entry is an entry which was sent to the
Search Engine before but could not be indexed
successfully. If the batch contains entries which are not retried, the value of
property feeder.batch.send-idle-delay is used instead. |
| |
Value | milliseconds |
Default | 600000 |
Description | The maximum time in milliseconds between the time the CAE Feeder received an error from the Search
Engine and the time, the CAE Feeder
tries to send the failed entry as part of a batch to the Search Engine again. The time is exceeded if an error occurs while
contacting the Search Engine. If the batch
contains entries which are not retried, the value of property
feeder.batch.send-max-delay is used instead. |
| |
Value | number of bytes |
Default | 5242880 (5 MB) |
Description | The maximum size in bytes for the value of a bean property or -1 for no limitation. Larger values are ignored and will not be sent to the Search Engine. |
| |
Value | comma-separated list of included MIME types |
Default | */* |
Description |
List of included MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator.
For details, see the API documentation of method Example
Only indexes blobs of MIME type |
| |
Value | comma-separated list of excluded MIME types |
Default | |
Description |
List of excluded MIME types for blob properties configured for indexing at the
BeanMappingFeedablePopulator.
For details, see the API documentation of method Example
Indexes all blobs except blobs of MIME type |
| |
Value | milliseconds |
Default | 10000 |
Description | The CAE Feeder sends a batch which is not
full with regard to the feeder.batch.max-size and
feeder.batch.max-bytes properties after the CAE
Feeder was idle for the configured time in milliseconds.
|
| |
Value | milliseconds |
Default | 120000 |
Description | The maximum time in milliseconds after which the CAE
Feeder sends a batch which is not full with regard to the
feeder.batch.max-size and feeder.batch.max-bytes properties.
The time may be exceeded if an error occurs while contacting the Search Engine or if the CAE
Feeder is under high load. |
| |
Value | seconds |
Default | 600 |
Description | Set the time interval to log some statistics about the progress, including the number of keys that are currently invalid and still need to be computed. |
| |
Value | number of threads |
Default | 50 |
Description | Number of evaluator threads in the CAE Feeder. The number of threads influences performance not only because evaluations can execute concurrently but also because higher values increase the probability that the CAE Feeder writes the state of multiple evaluations to the database in one database transaction. |
| |
Value | milliseconds |
Default | 0 |
Description | Minimum delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component. Higher values lead to reduced throughput. |
| |
Value | milliseconds |
Default | 10000 |
Description | Delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component if the application is idle. Smaller values can be configured to reduce the latency of the CAE Feeder but may lead to increased load on the database. |
| |
Value | maximum number of changed keys per database transaction |
Default | 2500 |
Description | The maximum weight of a database transaction to change stored dependencies. The weight is interpreted as the number of changed keys, that is, a transaction with one deleted key has weight 1. Multiple transactions will be used to process an event that causes the invalidation of more keys. |
Table 3.48. Configuration of general properties independent from the type of the search engine
Properties to configure Apache Tika
You can customize text extraction with Apache Tika using the following properties:
feeder.tika.append-metadata
| |
Type | java.lang.String |
Default | |
Description | Comma-separated list of metadata identifiers returned by Apache Tika to append to the extracted body text. |
feeder.tika.config
| |
Type | org.springframework.core.io.Resource |
Default | |
Description | The location of a custom Tika Config XML, for example to customize the default Tika parsers. See Apache Tika documentation for details on configuring Tika. The value of this property must be a Spring Resource location (e.g. file:/path/to/local/file) or empty for defaults. |
feeder.tika.copy-metadata
| |
Type | java.lang.String |
Default | |
Description | Comma-separated list of metadata identifiers returned by Apache Tika and names of Feedable elements to copy the metadata to. Entries in the comma separated list have the following format: "metadata identifier"="element name". With Apache Solr, target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier. |
feeder.tika.timeout
| |
Type | java.time.Duration |
Default | 2m |
Description | The maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction. |
feeder.tika.warn-time-threshold
| |
Type | java.time.Duration |
Default | 15s |
Description | The time after which a warning is logged when text extraction from binary data with Apache Tika takes some time. |
feeder.tika.zip-bomb-prevention.enabled
| |
Type | java.lang.Boolean |
Default | true |
Description | Sets whether Apache Tika's "Zip bomb" prevention is enabled. When a "Zip bomb" is detected, no text will be extracted from the Blob, but a warning will be logged. Note that "Zip bombs" are not restricted to ZIP files but also apply to PDFs or other formats. Disabled "Zip bomb" prevention bears the risk of OutOfMemoryError-s. Note that false positives are possible. |
feeder.tika.zip-bomb-prevention.maximum-compression-ratio
| |
Type | java.lang.Long |
Default | -1 |
Description | Sets the ratio between output characters and input bytes for the Apache Tika "Zip bomb" prevention. If this ratio is exceeded (after the output threshold has been reached) then no text will be extracted and a warning will be logged. Set to -1 to use the default of Apache Tika. |
feeder.tika.zip-bomb-prevention.maximum-depth
| |
Type | java.lang.Integer |
Default | -1 |
Description | Sets the maximum XML element nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika. |
feeder.tika.zip-bomb-prevention.maximum-package-entry-depth
| |
Type | java.lang.Integer |
Default | -1 |
Description | Sets the maximum package entry nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika. |
Table 3.49. Feeder Tika Configuration Properties
Properties for Solr configuration
The following properties are only used for a CoreMedia Search Engine based on Apache Solr:
feeder.solr.nested-documents.enabled
| |
Type | java.lang.Boolean |
Default | true |
Description | Whether storing nested feedables as nested documents is supported in Solr. This requires that the Solr schema contains a _root_ field. Note that if you add that field to the schema, you have to recreate the index from scratch. |
feeder.solr.nested-documents.skip-index-check
| |
Type | java.lang.Boolean |
Default | false |
Description | If feeder.solr.nested-documents.enabled is true, the Solr index schema is checked whether it contains the _root_ field. The Feeder will log a warning and not use nested documents, if feeding of nested documents is attempted but the index does not support it. You can set this property to true to skip checking the index schema. |
feeder.solr.send-retry-delay
| |
Type | java.time.Duration |
Default | 30s |
Description | The delay to wait before the Feeder retries to send data after failures from Solr. |
solr.cae.collection
| |
Type | java.lang.String |
Default | |
Description | The name of the Solr collection for web site search. This property does not have a default. It's typically set to 'preview' or 'live'. |
solr.cae.config-set
| |
Type | java.lang.String |
Default | cae |
Description | The name of the Solr config set to use when creating the CAE collection. This property is used by the CAE Feeder. |
solr.cloud
| |
Type | java.lang.Boolean |
Default | false |
Description | Whether to connect to SolrCloud. If true, connect to a SolrCloud cluster. SolrCloud connection details must be set either as ZooKeeper addresses (solr.zookeeper.addresses) or, if the former is unset or empty as HTTP URLs (solr.url). If false, connect to stand-alone Solr nodes via HTTP URLs (solr.url). |
solr.connection-timeout
| |
Type | java.lang.Integer |
Default | 0 |
Description | Connection timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default. |
solr.index-data-directory
| |
Type | java.lang.String |
Default | data |
Description | Value for the "dataDir" parameter of the Solr CoreAdmin API / Collection API request to create a Solr index. |
solr.password
| |
Type | java.lang.String |
Default | |
Description | Password for HTTP basic authentication, used if a non-empty solr.username has been specified. The value may have been encrypted with the tool "cm encryptpasswordproperty". |
solr.socket-timeout
| |
Type | java.lang.Integer |
Default | 600000 |
Description | Socket timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default. |
solr.url
| |
Type | java.util.List<java.lang.String> |
Default | http://localhost:40080/solr |
Description | The list of Solr URLs to connect to. These URLs are ignored if connecting to SolrCloud (solr.cloud=true) and non-empty ZooKeeper addresses (solr.zookeeper.addresses) have been set. For a Feeder application that is not connected to a SolrCloud cluster, a single URL to the Solr leader must be configured. |
solr.use-xml-response-writer
| |
Type | java.lang.Boolean |
Default | false |
Description | Whether SolrJ should use XML response format instead of Javabin format. |
solr.username
| |
Type | java.lang.String |
Default | |
Description | Username for HTTP basic authentication, or empty string for no authentication. |
solr.zookeeper.addresses
| |
Type | java.util.List<java.lang.String> |
Default | |
Description | ZooKeeper addresses for connecting to SolrCloud. Only used if solr.cloud=true. |
solr.zookeeper.chroot
| |
Type | java.lang.String |
Default | |
Description | Optional ZooKeeper chroot path for Solr. ZooKeeper chroot support makes it possible to isolate the SolrCloud tree in a ZooKeeper instance that is Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value. |
solr.zookeeper.client-timeout
| |
Type | java.lang.Integer |
Default | 10000 |
Description | Client-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value. |
solr.zookeeper.connect-timeout
| |
Type | java.lang.Integer |
Default | 10000 |
Description | Connect-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value. |
Table 3.50. CAE Feeder Solr Configuration Properties