Deployment Manual / 3.11 Content Feeder Properties

Deployment Manual / Version 2506.0

3.11 Content Feeder Properties

Properties for the Content Feeder

`feeder.blob.enabled`
Type	`Map<org.springframework.util.MimeType,Boolean>`
Description	The MIME-types of indexed blobs. This property maps included MIME-types to true, and excluded MIME-types to false. Entries for more specific types like text/xml override mappings for less specific types like text/. Example Configuration:* feeder.blob.enabled[application/pdf]=true If the map does not contain any enabled MIME-type, then the catch-all type / is enabled implicitly. See also property feeder.blob.max-size, which restricts blobs based on their size. Blobs are only included if allowed by both feeder.blob.enabled and feeder.blob.max-size.
`feeder.blob.max-size`
Type	`Map<org.springframework.util.MimeType,org.springframework.util.unit.DataSize>`
Description	The maximum size of indexed blobs per MIME-type. Larger blob values will be skipped. This property maps MIME-types to their maximum size. Values must be non-negative. Entries for more specific types like text/xml override mappings for less specific types like text/. Example Configuration:* feeder.blob.max-size[application/pdf]=5MB If the map does not contain a configuration for an enabled MIME-type, then blobs of that MIME-type are indexed regardless of their size. See also property feeder.blob.enabled, which restricts blob values based on type. Blobs are only included if allowed by both feeder.blob.enabled and feeder.blob.max-size.
`feeder.content.background-feed-delay`
Type	`Duration`
Default	3s
Description	The minimum time after editorial changes were sent to the Search Engine and before background feeding takes place. This is used to prioritize feeding of editorial changes over background feeding, for example to process rights-rule changes or for periodic issue reindexing. It should not be necessary to change the default setting.
`feeder.content.index-deleted`
Type	`Boolean`
Default	true
Description	Whether contents in the trash should be indexed. If you do not need to find contents in the trash and want to keep your index smaller, you can change this to false.
`feeder.content.index-groups`
Type	`Boolean`
Default	true
Description	Whether the IDs of groups with potential rights to read the content are indexed in the field "groups". This set of groups is then used to narrow a user's search to the contents where he might have read rights to. This is an optimization to get smaller search results for some queries and content structures and to get more accurate search suggestion counts. The client has to check for read rights anyway. For details, see also the description of the field "groups" in Solr schema.xml. If set to false, then you must also configure Studio Server and Content Server to not add a query condition for the indexed groups. To this end, set the Studio property "studio.rest.search-service.use-groups-filter-query" and the Content Server "solr.use-groups-filter-query" to "false".
`feeder.content.index-name-in-textbody`
Type	`Boolean`
Default	true
Description	Whether the content name should be indexed in field "textbody". It can make sense to disable this if lots of content names contain unique identifiers (from third-party systems, for example) to avoid problems with too many unique terms in field "textbody".
`feeder.content.index-referrers`
Type	`Boolean`
Default	false
Description	Whether a content is reindexed after its referrers have changed.
`feeder.content.index-translation-state`
Type	`Boolean`
Default	true
Description	Whether the translation state should be indexed. Computing the translation state can be an expensive operation when there are many derived sites and a lot of content is changed, for example when many derived sites that contain many content items are deleted.
`feeder.content.issues.index`
Type	`Boolean`
Default	true
Description	Whether to index content issues.
`feeder.content.issues.initial-feeding`
Type	`Boolean`
Default	false
Description	Whether content issues are already part of the initial feeding of an empty index. This property does not have any effect if feeder.content.issues.index is set to false. If true, initial feeding may take longer. If false, feeding of content issues starts after initial feeding has been completed.
`feeder.content.issues.reindex-after`
Type	`Duration`
Default	1d
Description	The duration after which indexed issues are considered outdated and become subject to periodic reindexing. This property does not have any effect if feeder.content.issues.index or feeder.content.issues.reindex-periodically are set to false.
`feeder.content.issues.reindex-periodically`
Type	`Boolean`
Default	true
Description	Whether content issues are reindexed periodically. Note that issue reindexing is performed with low priority, and will not block feeding of editorial changes. Issue reindexing will be paused as long as editorial changes need to be processed. This property does not have any effect if feeder.content.issues.index is set to false.
`feeder.content.issues.reindex-time-max-percentage`
Type	`Integer`
Default	100
Description	The maximum percentage of time used to trigger issue reindexing. If set to a value below 100, periodic issue reindexing will try to pause and stay inactive for some time, so that it does not use more than the configured percentage of a time window, even if issues are older than configured in feeder.content.issues.reindex-after. This only applies to issue reindexing and the Content Feeder may still perform other tasks. The configured value must be in the range of 1 to 100. Note that issue reindexing is always performed with low priority, and will be paused as long as editorial changes need to be processed, even if this property is set to 100. This property does not have any effect if feeder.content.issues.index or feeder.content.issues.reindex-periodically are set to false.
`feeder.content.issues.reindex-time-window`
Type	`Duration`
Default	10m
Description	The time window used with feeder.content.issues.reindex-time-max-percentage. Larger values for the time window lead to less but longer pauses. This property does not have any effect if feeder.content.issues.index or feeder.content.issues.reindex-periodically are set to false, or if feeder.content.issues.reindex-time-max-percentage is 100.
`feeder.content.management.password`
Type	`String`
Default	feeder
Description	The password to be used in the HTTP authentication of the administration page of the Content Feeder.
`feeder.content.management.user`
Type	`String`
Default	feeder
Description	The user name to be used in the HTTP authentication of the administration page of the Content Feeder. This is not an account from the user management of the Content Server.
`feeder.content.partial-update-aspects`
Type	`List<String>`
Default	*
Description	Configures the aspects of index documents that can be updated with a partial update, provided that the connected Indexer supports partial updates (for example, feeder.solr.partial-updates.enabled=true for Solr). Multiple values are separated by comma. Use the special value "*" to use partial updates for all aspects, if possible. An empty value means that partial updates are not used. See the API documentation of Feedable.isPartialUpdate, FeedableAspect and ContentFeedableAspect in package com.coremedia.cap.feeder for more details.
`feeder.content.property-type.blob-max-size`
Type	`org.springframework.util.unit.DataSize`
Description	Configure the maximum size of indexed blob properties. Larger blob values will be skipped. This configuration can be overridden for specific MIME-types by customizing Spring bean "feederContentBlobMaxSizePerMimeType".
Deprecation	This property has been deprecated since 2506.0.0 and will be removed in a future version. Use `feeder.blob.max-size[/]` instead. Reason: The property was replaced by 'feeder.blob.max-size', which allows configuration by MIME-type and also limits blobs that are added by custom FeedablePopulator implementations.
`feeder.content.property-type.blob-mime-type.excludes`
Type	`List<String>`
Description	List of MIME-types of "Blob" properties excluded from indexing. You can exclude a more specific type (e.g. text/xml) while including the corresponding primary type (e.g. text/*).
Deprecation	This property has been deprecated since 2506.0.0 and will be removed in a future version. Use `feeder.blob.enabled` instead. Reason: The property was replaced by 'feeder.blob.enable', which takes a map value instead of a comma-separated list and also limits blobs that are added by custom FeedablePopulator implementations.
`feeder.content.property-type.blob-mime-type.includes`
Type	`List<String>`
Default	/
Description	List of MIME-types of indexed "Blob" properties. If you don't configure any MIME-types in the includes property, no blob properties will be indexed.
Deprecation	This property has been deprecated since 2506.0.0 and will be removed in a future version. Use `feeder.blob.enabled` instead. Reason: The property was replaced by 'feeder.blob.enable', which takes a map value instead of a comma-separated list and also limits blobs that are added by custom FeedablePopulator implementations.
`feeder.content.property-type.date`
Type	`Boolean`
Default	false
Description	Whether properties of type "Date" are indexed.
`feeder.content.property-type.integer`
Type	`Boolean`
Default	false
Description	Whether properties of type "Integer" are indexed.
`feeder.content.property-type.link-list`
Type	`Boolean`
Default	false
Description	Whether properties of type "LinkList" are indexed.
`feeder.content.property-type.string`
Type	`Boolean`
Default	true
Description	Whether properties of type "String" are indexed.
`feeder.content.property-type.struct`
Type	`Boolean`
Default	false
Description	Whether properties of type "Struct" are indexed.
`feeder.content.property-type.xml-grammars`
Type	`List<String>`
Default	coremedia-richtext-1.0
Description	The list of grammars of indexed "Markup" properties (as used in the document type definition as attribute "Name" of element "XmlGrammar").
`feeder.content.retry-connect-to-index-delay`
Type	`Duration`
Default	10s
Description	The time to wait between retries to connect to the search engine on startup.
`feeder.content.type.excludes`
Type	`List<String>`
Default	[Preferences, EditorPreferences, Dictionary, Query]
Description	List of abstract or concrete content types excluded from feeding. With the configuration of some type, all of its subtypes are excluded implicitly, if not configured otherwise. Note that it is an error to configure the same content type in this property and in feeder.content.type.includes. Rules for more specific types override rules for less specific types. Regular expressions are not supported.
`feeder.content.type.includes`
Type	`List<String>`
Default	Content_
Description	List of abstract or concrete content types included for feeding. With the configuration of some type, all of its subtypes are included implicitly, if not configured otherwise. Note that it is an error to configure the same content type in this property and in feeder.content.type.excludes. Rules for more specific types override rules for less specific types. Regular expressions are not supported.
`feeder.content.update-groups-immediately`
Type	`Boolean`
Default	false
Description	If feeder.content.index-groups is true, configures whether the field "groups" is updated immediately after a change of a folder's right rule. It is recommended to keep this set to false, and let the Content Feeder update the index field in the background with lower priority than updates for editorial changes. It is quite expensive to set this to true because all contents below the folder would be reindexed.

Table 3.49. Content Feeder Configuration Properties

Solr specific properties for Content Feeder

`feeder.content.issues.solr.fetch-size`
Type	`Integer`
Default	1000
Description	The maximum number of results to fetch with a single paginated Solr query when retrieving content items with outdated issues. If more results are available, multiple queries with Solr cursor pagination will be used, and each one will be restricted to this configured maximum number of results.
`feeder.content.issues.solr.filter`
Type	`String`
Default	types:Document_
Description	Solr filter query to restrict the content items for which outdated issues are reindexed.
`feeder.content.issues.solr.query-min-delay`
Type	`Duration`
Default	10s
Description	The minimum time to wait before Solr is queried again for content items with outdated issues after the last query. This delay is not used for paginated queries that just retrieve the next page for a previous query.
`feeder.solr.nested-documents.enabled`
Type	`Boolean`
Default	true
Description	Whether storing nested feedables as nested documents is supported in Solr. This requires that the Solr schema contains a _root_ field. Note that if you add that field to the schema, you have to recreate the index from scratch.
`feeder.solr.nested-documents.skip-index-check`
Type	`Boolean`
Default	false
Description	If feeder.solr.nested-documents.enabled is true, the Solr index schema is checked whether it contains the _root_ field. The Feeder will log a warning and not use nested documents, if feeding of nested documents is attempted but the index does not support it. You can set this property to true to skip checking the index schema.
`feeder.solr.partial-updates.enabled`
Type	`Boolean`
Default	true
Description	Whether partial updates are supported for updating content metadata in Solr. This requires that all fields in the Solr index are configured as stored="true" or docValues="true" except fields that are copyField destinations, which must be configured as stored="false". This is because partial updates are applied to the index document reconstructed from the existing stored field values.
`feeder.solr.partial-updates.skip-index-check`
Type	`Boolean`
Default	false
Description	If feeder.solr.partial-updates.enabled is true, the Solr index schema is analyzed whether fields are stored as required for partial updates. The Feeder will log a warning and not use partial update functionality if the index seems to not support it. You can set this property to true to skip the check.
`feeder.solr.send-retry-delay`
Type	`Duration`
Default	30s
Description	The delay to wait before the Feeder retries to send data after failures from Solr.
`solr.cloud`
Type	`Boolean`
Default	false
Description	Whether to connect to SolrCloud. If true, connect to a SolrCloud cluster. SolrCloud connection details must be set either as ZooKeeper addresses (solr.zookeeper.addresses) or, if the former is unset or empty as HTTP URLs (solr.url). If false, connect to stand-alone Solr nodes via HTTP URLs (solr.url).
`solr.connection-timeout`
Type	`Duration`
Default	0
Description	The connection timeout set on the SolrJ SolrClient. It determines how long the client waits to establish a connection without any response from the server. The default value 0 means, that it will wait forever. Set a negative value to use the SolrClient default. (Default unit is milliseconds)
`solr.content.collection`
Type	`String`
Default	studio
Description	The name of the Solr collection for editorial search.
`solr.content.config-set`
Type	`String`
Default	content
Description	The name of the Solr config set to use when creating the collection for editorial search. This property is used by the Content Feeder.
`solr.index-data-directory`
Type	`String`
Default	data
Description	Value for the "dataDir" parameter of the Solr CoreAdmin API / Collection API request to create a Solr index.
`solr.password`
Type	`String`
Description	Password for HTTP basic authentication, used if a non-empty solr.username has been specified. The value may have been encrypted with the tool "cm encryptpasswordproperty".
`solr.proxy-host`
Type	`String`
Description	Proxy host for Solr communication that needs to be set if a proxy should be used.
`solr.proxy-is-secure`
Type	`Boolean`
Default	false
Description	Secure flag for Solr proxy.
`solr.proxy-is-socks4`
Type	`Boolean`
Default	false
Description	SOCKS 4 flag for Solr proxy.
`solr.proxy-port`
Type	`Integer`
Default	0
Description	Proxy port for Solr communication that needs to be set if a proxy should be used.
`solr.socket-timeout`
Type	`Duration`
Default	10m
Description	The socket timeout set on the SolrJ SolrClient. It determines how long the client waits for a response from the server after the connection was established and the request was already sent. Set to 0 for no timeout, or to a negative value to use SolrClient default. (Default unit is milliseconds)
`solr.url`
Type	`List<String>`
Default	http://localhost:40080/solr
Description	The list of Solr URLs to connect to. These URLs are ignored if connecting to SolrCloud (solr.cloud=true) and non-empty ZooKeeper addresses (solr.zookeeper.addresses) have been set. For a Feeder application that is not connected to a SolrCloud cluster, a single URL to the Solr leader must be configured.
`solr.use-http1`
Type	`Boolean`
Default	false
Description	Whether HTTP/1 (true) or HTTP/2 (false) shall be used by Solr clients.
Deprecation	This property has been deprecated and will be removed in a future version.
`solr.use-xml-response-writer`
Type	`Boolean`
Default	false
Description	Whether SolrJ should use XML response format instead of Javabin format.
`solr.username`
Type	`String`
Description	Username for HTTP basic authentication, or empty string for no authentication.
`solr.zookeeper.addresses`
Type	`List<String>`
Description	ZooKeeper addresses for connecting to SolrCloud. Only used if solr.cloud=true.
`solr.zookeeper.chroot`
Type	`String`
Description	Optional ZooKeeper chroot path for Solr. ZooKeeper chroot support makes it possible to isolate the SolrCloud tree in a ZooKeeper instance that is Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.
`solr.zookeeper.client-timeout`
Type	`Duration`
Default	10s
Description	Client-timeout duration for ZooKeeper. Set to a negative value to use SolrClient default. (Default unit is milliseconds) Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.
`solr.zookeeper.connect-timeout`
Type	`Duration`
Default	10s
Description	Connect-timeout duration for ZooKeeper. Set to a negative value to use SolrClient default. (Default unit is milliseconds) Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

Table 3.50. Content Feeder Solr Configuration Properties

Batch configuration properties for Content Feeder

With these properties you can configure the processing of batches.

`feeder.batch.max-bytes`
Type	`org.springframework.util.unit.DataSize`
Description	The maximum batch size in bytes. The Feeder sends a batch to the search engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only. A smaller batch may be sent if the maximum number of index documents is reached before, or if configured delays are reached. The Content Feeder default value is 5 MB. The CAE Feeder default value is 20 MB.
`feeder.batch.max-open`
Type	`Integer`
Default	5
Description	The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received.
`feeder.batch.max-processed`
Type	`Integer`
Default	1
Description	The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received.
`feeder.batch.max-size`
Type	`Integer`
Default	500
Description	The maximum number of index documents in a batch. If the maximum number is reached, the Feeder sends the batch to the search engine. A smaller batch may be sent if the maximum byte size is reached before, or if configured delays are reached.
`feeder.batch.retry-send-idle-delay`
Type	`Duration`
Default	1m
Description	The time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is idle.
`feeder.batch.retry-send-max-delay`
Type	`Duration`
Default	10m
Description	The maximum time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is not idle. The setting is typically larger than retry-send-idle-delay.
`feeder.batch.send-idle-delay`
Type	`Duration`
Description	The time between adding an index document to a batch and sending that batch to the search engine, if the batch is not yet full according to the max-size and max-bytes configuration properties, and if the feeder is idle. If a change needs to be sent to the search engine, and no further changes were made within the specified time, then an index document for the change will be sent after that time to the search engine. A small delay ensures low latency for changes to become visible in the search engine, as long as the system is not too busy. The Content Feeder default value is 3 seconds. The CAE Feeder default value is 10 seconds.
`feeder.batch.send-max-delay`
Type	`Duration`
Description	The maximum time between adding an index document to a batch and sending that batch to the search engine. This setting is typically larger than send-idle-delay to allow batches to grow and increase throughput, for example when large amounts of content are created by an import process. The configured value may still be exceeded under high load, or if there are problems connecting to the search engine. The Content Feeder default value is 20 seconds. The CAE Feeder default value is 2 minutes.

Table 3.51. Feeder Batch Configuration Properties

Apache Tika Properties

You can customize text extraction with Apache Tika using the following properties:

`feeder.tika.append-metadata`
Type	`String`
Description	Comma-separated list of metadata identifiers returned by Apache Tika to append to the extracted body text.
`feeder.tika.config`
Type	`org.springframework.core.io.Resource`
Description	The location of a custom Tika Config XML, for example to customize the default Tika parsers. See Apache Tika documentation for details on configuring Tika. The value of this property must be a Spring Resource location (e.g. file:/path/to/local/file) or empty for defaults.
`feeder.tika.copy-metadata`
Type	`String`
Description	Comma-separated list of metadata identifiers returned by Apache Tika and names of Feedable elements to copy the metadata to. Entries in the comma separated list have the following format: "metadata identifier"="element name". With Apache Solr, target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier.
`feeder.tika.parse`
Type	`Map<org.springframework.util.MimeType,Boolean>`
Description	The MIME-types of blobs in Feedable elements to parse with Apache Tika for plain text and/or metadata extraction. This property maps included MIME-types to true, and excluded MIME-types to false. Entries for more specific types like application/pdf override entries for less specific types like application/. If the property does not map any MIME-type to true, then the catch-all type /* is included implicitly. Blobs of included MIME-types are parsed with Apache Tika, and are replaced with their extracted plain text and optional metadata in Feedable elements, so that the indexer receives string values. Blobs of excluded MIME-types are not parsed with Apache Tika, and are kept unchanged in Feedable elements, so that the indexer receives the original binary data. See also related property feeder.blob.enabled, which generally restricts feeding of blobs by MIME-type. Blobs that have been excluded with that property are neither parsed with Apache Tika, nor passed as binary data to the indexer. Example configuration: Parse all blobs except JPEG images feeder.tika.parse[image/jpeg]=false Note, that MIME-type / is included implicitly in this example, because no type is explicitly mapped to true.
`feeder.tika.timeout`
Type	`Duration`
Default	2m
Description	The maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction.
`feeder.tika.warn-time-threshold`
Type	`Duration`
Default	15s
Description	The time after which a warning is logged when text extraction from binary data with Apache Tika takes some time.
`feeder.tika.zip-bomb-prevention.enabled`
Type	`Boolean`
Default	true
Description	Sets whether Apache Tika's "Zip bomb" prevention is enabled. When a "Zip bomb" is detected, no text will be extracted from the Blob, but a warning will be logged. Note that "Zip bombs" are not restricted to ZIP files but also apply to PDFs or other formats. Disabled "Zip bomb" prevention bears the risk of OutOfMemoryError-s. Note that false positives are possible.
`feeder.tika.zip-bomb-prevention.maximum-compression-ratio`
Type	`Long`
Default	-1
Description	Sets the ratio between output characters and input bytes for the Apache Tika "Zip bomb" prevention. If this ratio is exceeded (after the output threshold has been reached) then no text will be extracted and a warning will be logged. Set to -1 to use the default of Apache Tika.
`feeder.tika.zip-bomb-prevention.maximum-depth`
Type	`Integer`
Default	-1
Description	Sets the maximum XML element nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.
`feeder.tika.zip-bomb-prevention.maximum-package-entry-depth`
Type	`Integer`
Default	-1
Description	Sets the maximum package entry nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

Table 3.52. Feeder Tika Configuration Properties

Feeder Core Properties

You can use the following properties to customize some internal settings of the Content Feeder.

`feeder.core.executor-queue-capacity`
Type	`Integer`
Description	Maximum capacity of the Feeder's executor queue, which is internally used to transfer evaluated values. The Content Feeder default value is 100. The CAE Feeder default value is 2000.
`feeder.core.executor-retry-delay`
Type	`Duration`
Default	1m
Description	The delay to wait before the Feeder retries to access the source data after failures.

Table 3.53. Feeder Core Configuration Properties

Search Results

Table Of Contents

Filter