close

Filter

loading table of contents...

Deployment Manual / Version 2307

Table Of Contents

3.10.2 CAE Feeder Properties

Properties for general configuration

repository.user

Value user name
Default none
Description The name of the user to connect to the CoreMedia Content Server.

repository.password

Value password
Default none
Description The password of the user to connect to the CoreMedia Content Server.

repository.domain

Value domain
Default none
Description The domain of the user to connect to the CoreMedia Content Server. Empty String for a built-in user.

repository.url

Value URL
Default none
Description The URL to the IOR of the CoreMedia Content Server.

jdbc.driver

Value driver class
Default none
Description The class of the database driver. For example: oracle.jdbc.driver.OracleDriver

jdbc.url

Value URL
Default none
Description The URL to connect to the database.

jdbc.user

Value user name
Default none
Description The name of the user to connect to the database.

jdbc.login-user-name

Value the user name for the database login
Default value of jdbc.user
Description The user name for a database login. If not set, the value of "jdbc.user" will be used to log in to the database. In some cases the login username differs from the actual user, e.g. with PostgreSQL on Azure a postfix on the user name is necessary to log in. Set this property additionally to jdbc.user. (e.g. jdbc.login-user-name=username@domain jdbc.user=username).

jdbc.password

Value password
Default none
Description The password of the user to connect to the database.

feeder.contentSelector.basePath

Value String
Default /Sites
Description A comma-separated list of base folders for which content beans are indexed. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing.

feeder.contentSelector.contentTypes

Value String
Default

Document_

Description A comma-separated list of content types for which content beans are indexed. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing.

feeder.contentSelector.includeSubTypes

Value Boolean
Default

true

Description Specifies whether the sub types of the content types configured with property feeder.contentSelector.contentTypes are selected as well. Changing this property will not trigger any re-indexing of already indexed content. See Section 5.3.2, “Resetting” in Search Manual for details on re-indexing.

feeder.core.executor-queue-capacity

Value int
Default 2000
Description Capacity of the CAE Feeder's executor queue, which is internally used to transfer evaluated values

feeder.core.executor-retry-delay

Value milliseconds
Default 60000
Description The delay in milliseconds to wait before the CAE Feeder retries to access the source data after failures to do so.

feeder.batch.max-bytes

Value bytes
Default 20971520 (20 MB)
Description The maximum size of a batch in bytes. The CAE Feeder sends a batch to the Search Engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only.

feeder.batch.max-size

Value int
Default 500
Description The maximum number of entries in a batch. If the maximum number is reached, the CAE Feeder sends the batch to the Search Engine.

feeder.batch.max-open

Value int
Default 5
Description The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received.

feeder.batch.max-processed

Value int
Default 1
Description The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The CAE Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received.

feeder.batch.retry-send-idle-delay

Value milliseconds
Default 60000
Description The CAE Feeder sends a batch which only contains retried entries and is not full with regard to the feeder.batch.max-size and feeder.batch.max-bytes properties after the CAE Feeder was idle for the time configured in this property. A retried entry is an entry which was sent to the Search Engine before but could not be indexed successfully. If the batch contains entries which are not retried, the value of property feeder.batch.send-idle-delay is used instead.

feeder.batch.retry-send-max-delay

Value milliseconds
Default 600000
Description The maximum time in milliseconds between the time the CAE Feeder received an error from the Search Engine and the time, the CAE Feeder tries to send the failed entry as part of a batch to the Search Engine again. The time is exceeded if an error occurs while contacting the Search Engine. If the batch contains entries which are not retried, the value of property feeder.batch.send-max-delay is used instead.

feeder.beanPropertyMaxBytes

Value number of bytes
Default 5242880 (5 MB)
Description The maximum size in bytes for the value of a bean property or -1 for no limitation. Larger values are ignored and will not be sent to the Search Engine.

feeder.beanMapping.mimeType.includes

Value comma-separated list of included MIME types
Default */*
Description

List of included MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator. For details, see the API documentation of method setMimeTypeIncludes of com.coremedia.cap.feeder.bean.BeanMappingFeedablePopulator

Example

feeder.beanMapping.mimeType.includes=text/*

Only indexes blobs of MIME type text/*.

feeder.beanMapping.mimeType.excludes

Value comma-separated list of excluded MIME types
Default  
Description

List of excluded MIME types for blob properties configured for indexing at the BeanMappingFeedablePopulator. For details, see the API documentation of method setMimeTypeExcludes of com.coremedia.cap.feeder.bean.BeanMappingFeedablePopulator

Example

feeder.beanMapping.mimeType.excludes=text/xml

Indexes all blobs except blobs of MIME type text/xml.

feeder.batch.send-idle-delay

Value milliseconds
Default 10000
Description The CAE Feeder sends a batch which is not full with regard to the feeder.batch.max-size and feeder.batch.max-bytes properties after the CAE Feeder was idle for the configured time in milliseconds.

feeder.batch.send-max-delay

Value milliseconds
Default 120000
Description The maximum time in milliseconds after which the CAE Feeder sends a batch which is not full with regard to the feeder.batch.max-size and feeder.batch.max-bytes properties. The time may be exceeded if an error occurs while contacting the Search Engine or if the CAE Feeder is under high load.

proactiveengine.log.progress.interval.seconds

Value seconds
Default 600
Description Set the time interval to log some statistics about the progress, including the number of keys that are currently invalid and still need to be computed.

proactiveengine.senders.evaluators

Value number of threads
Default 50
Description Number of evaluator threads in the CAE Feeder. The number of threads influences performance not only because evaluations can execute concurrently but also because higher values increase the probability that the CAE Feeder writes the state of multiple evaluations to the database in one database transaction.

proactiveengine.senders.delay

Value milliseconds
Default 0
Description Minimum delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component. Higher values lead to reduced throughput.

proactiveengine.senders.idledelay

Value milliseconds
Default 10000
Description Delay in milliseconds between notifications of the Feeder by the internal Proactive Engine sub component if the application is idle. Smaller values can be configured to reduce the latency of the CAE Feeder but may lead to increased load on the database.

dependencyStore.maxTransactionWeight

Value maximum number of changed keys per database transaction
Default 2500
Description The maximum weight of a database transaction to change stored dependencies. The weight is interpreted as the number of changed keys, that is, a transaction with one deleted key has weight 1. Multiple transactions will be used to process an event that causes the invalidation of more keys.

Table 3.49. Configuration of general properties independent from the type of the search engine


Properties to configure Apache Tika

You can customize text extraction with Apache Tika using the following properties:

feeder.tika.append-metadata
Type java.lang.String
Default  
Description

Comma-separated list of metadata identifiers returned by Apache Tika to append to the extracted body text.

feeder.tika.config
Type org.springframework.core.io.Resource
Default  
Description

The location of a custom Tika Config XML, for example to customize the default Tika parsers. See Apache Tika documentation for details on configuring Tika. The value of this property must be a Spring Resource location (e.g. file:/path/to/local/file) or empty for defaults.

feeder.tika.copy-metadata
Type java.lang.String
Default  
Description

Comma-separated list of metadata identifiers returned by Apache Tika and names of Feedable elements to copy the metadata to. Entries in the comma separated list have the following format: "metadata identifier"="element name". With Apache Solr, target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier.

feeder.tika.timeout
Type java.time.Duration
Default 2m
Description

The maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction.

feeder.tika.warn-time-threshold
Type java.time.Duration
Default 15s
Description

The time after which a warning is logged when text extraction from binary data with Apache Tika takes some time.

feeder.tika.zip-bomb-prevention.enabled
Type java.lang.Boolean
Default true
Description

Sets whether Apache Tika's "Zip bomb" prevention is enabled. When a "Zip bomb" is detected, no text will be extracted from the Blob, but a warning will be logged. Note that "Zip bombs" are not restricted to ZIP files but also apply to PDFs or other formats. Disabled "Zip bomb" prevention bears the risk of OutOfMemoryError-s. Note that false positives are possible.

feeder.tika.zip-bomb-prevention.maximum-compression-ratio
Type java.lang.Long
Default -1
Description

Sets the ratio between output characters and input bytes for the Apache Tika "Zip bomb" prevention. If this ratio is exceeded (after the output threshold has been reached) then no text will be extracted and a warning will be logged. Set to -1 to use the default of Apache Tika.

feeder.tika.zip-bomb-prevention.maximum-depth
Type java.lang.Integer
Default -1
Description

Sets the maximum XML element nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

feeder.tika.zip-bomb-prevention.maximum-package-entry-depth
Type java.lang.Integer
Default -1
Description

Sets the maximum package entry nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

Table 3.50. Feeder Tika Configuration Properties


Properties for Solr configuration

The following properties are only used for a CoreMedia Search Engine based on Apache Solr:

feeder.solr.nested-documents.enabled
Type java.lang.Boolean
Default true
Description

Whether storing nested feedables as nested documents is supported in Solr. This requires that the Solr schema contains a _root_ field. Note that if you add that field to the schema, you have to recreate the index from scratch.

feeder.solr.nested-documents.skip-index-check
Type java.lang.Boolean
Default false
Description

If feeder.solr.nested-documents.enabled is true, the Solr index schema is checked whether it contains the _root_ field. The Feeder will log a warning and not use nested documents, if feeding of nested documents is attempted but the index does not support it. You can set this property to true to skip checking the index schema.

feeder.solr.send-retry-delay
Type java.time.Duration
Default 30s
Description

The delay to wait before the Feeder retries to send data after failures from Solr.

solr.cae.collection
Type java.lang.String
Default  
Description

The name of the Solr collection for web site search. This property does not have a default. It's typically set to 'preview' or 'live'.

solr.cae.config-set
Type java.lang.String
Default cae
Description

The name of the Solr config set to use when creating the CAE collection. This property is used by the CAE Feeder.

solr.cloud
Type java.lang.Boolean
Default false
Description

Whether to connect to SolrCloud. If true, connect to a SolrCloud cluster. SolrCloud connection details must be set either as ZooKeeper addresses (solr.zookeeper.addresses) or, if the former is unset or empty as HTTP URLs (solr.url). If false, connect to stand-alone Solr nodes via HTTP URLs (solr.url).

solr.connection-timeout
Type java.lang.Integer
Default 0
Description

Connection timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default.

solr.index-data-directory
Type java.lang.String
Default data
Description

Value for the "dataDir" parameter of the Solr CoreAdmin API / Collection API request to create a Solr index.

solr.password
Type java.lang.String
Default  
Description

Password for HTTP basic authentication, used if a non-empty solr.username has been specified. The value may have been encrypted with the tool "cm encryptpasswordproperty".

solr.proxy-host
Type java.lang.String
Default  
Description

Proxy host for Solr communication that needs to be set if a proxy should be used.

solr.proxy-is-secure
Type java.lang.Boolean
Default false
Description

Secure flag for Solr proxy.

solr.proxy-is-socks4
Type java.lang.Boolean
Default false
Description

SOCKS 4 flag for Solr proxy.

solr.proxy-port
Type java.lang.Integer
Default 0
Description

Proxy port for Solr communication that needs to be set if a proxy should be used.

solr.socket-timeout
Type java.lang.Integer
Default 600000
Description

Socket timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default.

solr.url
Type java.util.List<java.lang.String>
Default http://localhost:40080/solr
Description

The list of Solr URLs to connect to. These URLs are ignored if connecting to SolrCloud (solr.cloud=true) and non-empty ZooKeeper addresses (solr.zookeeper.addresses) have been set. For a Feeder application that is not connected to a SolrCloud cluster, a single URL to the Solr leader must be configured.

solr.use-xml-response-writer
Type java.lang.Boolean
Default false
Description

Whether SolrJ should use XML response format instead of Javabin format.

solr.username
Type java.lang.String
Default  
Description

Username for HTTP basic authentication, or empty string for no authentication.

solr.zookeeper.addresses
Type java.util.List<java.lang.String>
Default  
Description

ZooKeeper addresses for connecting to SolrCloud. Only used if solr.cloud=true.

solr.zookeeper.chroot
Type java.lang.String
Default  
Description

Optional ZooKeeper chroot path for Solr. ZooKeeper chroot support makes it possible to isolate the SolrCloud tree in a ZooKeeper instance that is Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

solr.zookeeper.client-timeout
Type java.lang.Integer
Default 10000
Description

Client-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

solr.zookeeper.connect-timeout
Type java.lang.Integer
Default 10000
Description

Connect-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

solr.use-http1
Type java.lang.Boolean
Default false
Description

Whether HTTP/1 (true) or HTTP/2 (false) shall be used by Solr clients.

Table 3.51. CAE Feeder Solr Configuration Properties


Search Results

Table Of Contents
warning

Your Internet Explorer is no longer supported.

Please use Mozilla Firefox, Google Chrome, or Microsoft Edge.