close

Filter

loading table of contents...

Deployment Manual / Version 2412.0

Table Of Contents

3.11.2 CAE Feeder Properties

General Properties

repository.user

Value user name
Default none
Description The name of the user to connect to the CoreMedia Content Server.

repository.password

Value password
Default none
Description The password of the user to connect to the CoreMedia Content Server.

repository.domain

Value domain
Default none
Description The domain of the user to connect to the CoreMedia Content Server. Empty String for a built-in user.

repository.url

Value URL
Default none
Description The URL to the IOR of the CoreMedia Content Server.

feeder.contentSelector.basePath

Value String
Default /Sites
Description A comma-separated list of base folders for which content beans are indexed. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing.

feeder.contentSelector.contentTypes

Value String
Default

Document_

Description A comma-separated list of content types for which content beans are indexed. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing.

feeder.contentSelector.includeSubTypes

Value Boolean
Default

true

Description Specifies whether the sub types of the content types configured with property feeder.contentSelector.contentTypes are selected as well. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing.

feeder.core.executor-queue-capacity

Value int
Default 2000
Description Capacity of the CAE Feeder's executor queue, which is internally used to transfer evaluated values

feeder.core.executor-retry-delay

Value milliseconds
Default 60000
Description The delay in milliseconds to wait before the CAE Feeder retries to access the source data after failures to do so.

feeder.batch.max-bytes

Value bytes
Default 20971520 (20 MB)
Description The maximum size of a batch in bytes. The CAE Feeder sends a batch to the Search Engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only.

feeder.batch.max-size

Value int
Default 500
Description The maximum number of entries in a batch. If the maximum number is reached, the CAE Feeder sends the batch to the Search Engine.

feeder.batch.max-open

Value int
Default 5
Description The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received.

feeder.batch.max-processed

Value int
Default 1
Description The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received.

feeder.batch.retry-send-idle-delay

Value milliseconds
Default 60000
Description The CAE Feeder sends a batch which only contains retried entries and is not full with regard to the feeder.batch.max-size and feeder.batch.max-bytes properties after the CAE Feeder was idle for the time configured in this property. A retried entry is an entry which was sent to the Search Engine before but could not be indexed successfully. If the batch contains entries which are not retried, the value of property feeder.batch.send-idle-delay is used instead.

feeder.batch.retry-send-max-delay

Value milliseconds
Default 600000
Description The maximum time in milliseconds between the time the CAE Feeder received an error from the Search Engine and the time, the CAE Feeder tries to send the failed entry as part of a batch to the Search Engine again. The time is exceeded if an error occurs while contacting the Search Engine. If the batch contains entries which are not retried, the value of property feeder.batch.send-max-delay is used instead.

feeder.beanPropertyMaxBytes

Value number of bytes
Default 5242880 (5 MB)
Description The maximum size in bytes for the value of a bean property or -1 for no limitation. Larger values are ignored and will not be sent to the Search Engine.

feeder.beanMapping.mimeType.includes

Value comma-separated list of included MIME types
Default */*
Description

List of included MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator. For details, see the API documentation of method setMimeTypeIncludes of com.coremedia.cap.feeder.bean.BeanMappingFeedablePopulator

Example

feeder.beanMapping.mimeType.includes=text/*

Only indexes blobs of MIME type text/*.

feeder.beanMapping.mimeType.excludes

Value comma-separated list of excluded MIME types
Default  
Description

List of excluded MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator. For details, see the API documentation of method setMimeTypeExcludes of com.coremedia.cap.feeder.bean.BeanMappingFeedablePopulator

Example

feeder.beanMapping.mimeType.excludes=text/xml

Indexes all blobs except blobs of MIME type text/xml.

feeder.batch.send-idle-delay

Value milliseconds
Default 10000
Description The CAE Feeder sends a batch which is not full with regard to the feeder.batch.max-size and feeder.batch.max-bytes properties after the CAE Feeder was idle for the configured time in milliseconds.

feeder.batch.send-max-delay

Value milliseconds
Default 120000
Description The maximum time in milliseconds after which the CAE Feeder sends a batch which is not full with regard to the feeder.batch.max-size and feeder.batch.max-bytes properties. The time may be exceeded if an error occurs while contacting the Search Engine or if the CAE Feeder is under high load.

proactiveengine.log.progress.interval.seconds

Value seconds
Default 600
Description Set the time interval to log some statistics about the progress, including the number of keys that are currently invalid and still need to be computed.

proactiveengine.senders.evaluators

Value number of threads
Default 50
Description Number of evaluator threads in the CAE Feeder. The number of threads influences performance not only because evaluations can execute concurrently but also because higher values increase the probability that the CAE Feeder writes the state of multiple evaluations to the database in one database transaction.

proactiveengine.senders.delay

Value milliseconds
Default 0
Description Minimum delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component. Higher values lead to reduced throughput.

proactiveengine.senders.idledelay

Value milliseconds
Default 10000
Description Delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component if the application is idle. Smaller values can be configured to reduce the latency of the CAE Feeder but may lead to increased load on the database.

dependencyStore.maxTransactionWeight

Value maximum number of changed keys per database transaction
Default 2500
Description The maximum weight of a database transaction to change stored dependencies. The weight is interpreted as the number of changed keys, that is, a transaction with one deleted key has weight 1. Multiple transactions will be used to process an event that causes the invalidation of more keys.

Table 3.52. Configuration of general properties independent from the type of the search engine


Database Properties

The properties in this section are used to configure the database, also known as data source. This is only an excerpt of all available properties. For more, consult the Spring Boot documentation for spring.datasource properties (SQL Databases :: Spring Boot, Spring Boot Appendix: Data Properties), which are available for the CAE Feeder under the application specific name caefeeder.datasource, and are backed by Spring Boot class org.​springframework.​boot.​autoconfigure.​jdbc.​DataSourceProperties.

HikariCP is used as database connection pool. To fine-tune its settings, see the Spring Boot documentation for spring.datasource.hikari properties, which are available for the CAE Feeder under the name caefeeder.datasource.hikari.

caefeeder.datasource.password
Type String
Description

Login password of the database, possibly encrypted.

caefeeder.datasource.url
Type String
Description

JDBC URL of the database connection for the CAE Feeder.

caefeeder.datasource.username
Type String
Description

Login username of the database.

jdbc.driver
Description

Fully qualified name of the JDBC driver (ignored).

Deprecation

This property has been deprecated since 2412.0.0 and will be removed in a future version.

Use caefeeder.datasource.driver-class-name instead.

Reason:

The database driver class name does not need to be specified anymore, because it gets auto-detected for the JDBC URL. If really needed, the auto-detected driver class name can still be overridden with property 'caefeeder.datasource.driver-class-name'.

jdbc.login-user-name
Description

Login username of the database.

Deprecation

This property has been deprecated since 2412.0.0 and will be removed in a future version.

Use caefeeder.datasource.username instead.

Reason:

The property was renamed to use the 'caefeeder.datasource' prefix, which exposes more data source properties from org.springframework.boot.autoconfigure.jdbc.DataSourceProperties. Use 'caefeeder.datasource.username' instead with the full login username.

jdbc.password
Description

Login password of the database, possibly encrypted.

Deprecation

This property has been deprecated since 2412.0.0 and will be removed in a future version.

Use caefeeder.datasource.password instead.

Reason:

The property was renamed to use the 'caefeeder.datasource' prefix, which exposes more data source properties from org.springframework.boot.autoconfigure.jdbc.DataSourceProperties.

jdbc.url
Description

JDBC URL of the database.

Deprecation

This property has been deprecated since 2412.0.0 and will be removed in a future version.

Use caefeeder.datasource.url instead.

Reason:

The property was renamed to use the 'caefeeder.datasource' prefix, which exposes more data source properties from org.springframework.boot.autoconfigure.jdbc.DataSourceProperties.

jdbc.user
Description

Login username of the database.

Deprecation

This property has been deprecated since 2412.0.0 and will be removed in a future version.

Use caefeeder.datasource.username instead.

Reason:

The property was renamed to use the 'caefeeder.datasource' prefix, which exposes more data source properties from org.springframework.boot.autoconfigure.jdbc.DataSourceProperties. Use 'caefeeder.datasource.username' instead with the full login username.

Table 3.53. CAE Feeder Data Source Properties


Apache Tika Properties

You can customize text extraction with Apache Tika using the following properties:

feeder.tika.append-metadata
Type String
Description

Comma-separated list of metadata identifiers returned by Apache Tika to append to the extracted body text.

feeder.tika.config
Type org.​springframework.​core.​io.​Resource
Description

The location of a custom Tika Config XML, for example to customize the default Tika parsers. See Apache Tika documentation for details on configuring Tika. The value of this property must be a Spring Resource location (e.g. file:/path/to/local/file) or empty for defaults.

feeder.tika.copy-metadata
Type String
Description

Comma-separated list of metadata identifiers returned by Apache Tika and names of Feedable elements to copy the metadata to. Entries in the comma separated list have the following format: "metadata identifier"="element name". With Apache Solr, target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier.

feeder.tika.timeout
Type Duration
Default 2m
Description

The maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction.

feeder.tika.warn-time-threshold
Type Duration
Default 15s
Description

The time after which a warning is logged when text extraction from binary data with Apache Tika takes some time.

feeder.tika.zip-bomb-prevention.enabled
Type Boolean
Default true
Description

Sets whether Apache Tika's "Zip bomb" prevention is enabled. When a "Zip bomb" is detected, no text will be extracted from the Blob, but a warning will be logged. Note that "Zip bombs" are not restricted to ZIP files but also apply to PDFs or other formats. Disabled "Zip bomb" prevention bears the risk of OutOfMemoryError-s. Note that false positives are possible.

feeder.tika.zip-bomb-prevention.maximum-compression-ratio
Type Long
Default -1
Description

Sets the ratio between output characters and input bytes for the Apache Tika "Zip bomb" prevention. If this ratio is exceeded (after the output threshold has been reached) then no text will be extracted and a warning will be logged. Set to -1 to use the default of Apache Tika.

feeder.tika.zip-bomb-prevention.maximum-depth
Type Integer
Default -1
Description

Sets the maximum XML element nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

feeder.tika.zip-bomb-prevention.maximum-package-entry-depth
Type Integer
Default -1
Description

Sets the maximum package entry nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

Table 3.54. Feeder Tika Configuration Properties


Solr Properties

The following properties are only used for a CoreMedia Search Engine based on Apache Solr:

feeder.solr.nested-documents.enabled
Type Boolean
Default true
Description

Whether storing nested feedables as nested documents is supported in Solr. This requires that the Solr schema contains a _root_ field. Note that if you add that field to the schema, you have to recreate the index from scratch.

feeder.solr.nested-documents.skip-index-check
Type Boolean
Default false
Description

If feeder.solr.nested-documents.enabled is true, the Solr index schema is checked whether it contains the _root_ field. The Feeder will log a warning and not use nested documents, if feeding of nested documents is attempted but the index does not support it. You can set this property to true to skip checking the index schema.

feeder.solr.send-retry-delay
Type Duration
Default 30s
Description

The delay to wait before the Feeder retries to send data after failures from Solr.

solr.cae.collection
Type String
Description

The name of the Solr collection for web site search. This property does not have a default. It's typically set to 'preview' or 'live'.

solr.cae.config-set
Type String
Default cae
Description

The name of the Solr config set to use when creating the CAE collection. This property is used by the CAE Feeder.

solr.cloud
Type Boolean
Default false
Description

Whether to connect to SolrCloud. If true, connect to a SolrCloud cluster. SolrCloud connection details must be set either as ZooKeeper addresses (solr.zookeeper.addresses) or, if the former is unset or empty as HTTP URLs (solr.url). If false, connect to stand-alone Solr nodes via HTTP URLs (solr.url).

solr.connection-timeout
Type Duration
Default 0
Description

The connection timeout set on the SolrJ SolrClient. It determines how long the client waits to establish a connection without any response from the server.

The default value 0 means, that it will wait forever. Set a negative value to use the SolrClient default. (Default unit is milliseconds)

solr.index-data-directory
Type String
Default data
Description

Value for the "dataDir" parameter of the Solr CoreAdmin API / Collection API request to create a Solr index.

solr.password
Type String
Description

Password for HTTP basic authentication, used if a non-empty solr.username has been specified. The value may have been encrypted with the tool "cm encryptpasswordproperty".

solr.proxy-host
Type String
Description

Proxy host for Solr communication that needs to be set if a proxy should be used.

solr.proxy-is-secure
Type Boolean
Default false
Description

Secure flag for Solr proxy.

solr.proxy-is-socks4
Type Boolean
Default false
Description

SOCKS 4 flag for Solr proxy.

solr.proxy-port
Type Integer
Default 0
Description

Proxy port for Solr communication that needs to be set if a proxy should be used.

solr.socket-timeout
Type Duration
Default 10m
Description

The socket timeout set on the SolrJ SolrClient. It determines how long the client waits for a response from the server after the connection was established and the request was already sent.

Set to 0 for no timeout, or to a negative value to use SolrClient default. (Default unit is milliseconds)

solr.url
Type List<String>
Default http://localhost:40080/solr
Description

The list of Solr URLs to connect to. These URLs are ignored if connecting to SolrCloud (solr.cloud=true) and non-empty ZooKeeper addresses (solr.zookeeper.addresses) have been set. For a Feeder application that is not connected to a SolrCloud cluster, a single URL to the Solr leader must be configured.

solr.use-http1
Type Boolean
Default false
Description

Whether HTTP/1 (true) or HTTP/2 (false) shall be used by Solr clients.

Deprecation

This property has been deprecated and will be removed in a future version.

solr.use-xml-response-writer
Type Boolean
Default false
Description

Whether SolrJ should use XML response format instead of Javabin format.

solr.username
Type String
Description

Username for HTTP basic authentication, or empty string for no authentication.

solr.zookeeper.addresses
Type List<String>
Description

ZooKeeper addresses for connecting to SolrCloud. Only used if solr.cloud=true.

solr.zookeeper.chroot
Type String
Description

Optional ZooKeeper chroot path for Solr. ZooKeeper chroot support makes it possible to isolate the SolrCloud tree in a ZooKeeper instance that is Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

solr.zookeeper.client-timeout
Type Duration
Default 10s
Description

Client-timeout duration for ZooKeeper. Set to a negative value to use SolrClient default. (Default unit is milliseconds)

Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

solr.zookeeper.connect-timeout
Type Duration
Default 10s
Description

Connect-timeout duration for ZooKeeper. Set to a negative value to use SolrClient default. (Default unit is milliseconds)

Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

Table 3.55. CAE Feeder Solr Configuration Properties


Search Results

Table Of Contents
warning

Your Internet Explorer is no longer supported.

Please use Mozilla Firefox, Google Chrome, or Microsoft Edge.