Deployment Manual / 4.9.1 Content Feeder Properties

Deployment Manual / Version 2010

4.9.1 Content Feeder Properties

Solr specific properties for Content Feeder

`feeder.solr.partial-updates.enabled`
Type	java.lang.Boolean
Default	true
Description	Whether partial updates are supported for updating content metadata in Solr. This requires that all fields in the Solr index are configured as stored="true" or docValues="true" except fields that are copyField destinations, which must be configured as stored="false". This is because partial updates are applied to the index document reconstructed from the existing stored field values.
`feeder.solr.partial-updates.skip-index-check`
Type	java.lang.Boolean
Default	false
Description	If feeder.solr.partial-updates.enabled is true, the Solr index schema is analyzed whether fields are stored as required for partial updates. The Feeder will log a warning and not use partial update functionality if the index seems to not support it. You can set this property to true to skip the check.
`feeder.solr.send-retry-delay`
Type	java.time.Duration
Default	30s
Description	The delay to wait before the Feeder retries to send data after failures from Solr.
`solr.cloud`
Type	java.lang.Boolean
Default	false
Description	Whether to connect to SolrCloud. If true, connect to a SolrCloud cluster. SolrCloud connection details must be set either as ZooKeeper addresses (solr.zookeeper.addresses) or, if the former is unset or empty as HTTP URLs (solr.url). If false, connect to stand-alone Solr nodes via HTTP URLs (solr.url).
`solr.connection-timeout`
Type	java.lang.Integer
Default	0
Description	Connection timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default.
`solr.content.collection`
Type	java.lang.String
Default	studio
Description	The name of the Solr collection for editorial search.
`solr.content.config-set`
Type	java.lang.String
Default	content
Description	The name of the Solr config set to use when creating the collection for editorial search. This property is used by the Content Feeder.
`solr.index-data-directory`
Type	java.lang.String
Default	data
Description	Value for the "dataDir" parameter of the Solr CoreAdmin API / Collection API request to create a Solr index.
`solr.password`
Type	java.lang.String
Default
Description	Password for HTTP basic authentication, used if a non-empty solr.username has been specified. The value may have been encrypted with the tool "cm encryptpasswordproperty".
`solr.socket-timeout`
Type	java.lang.Integer
Default	600000
Description	Socket timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default.
`solr.url`
Type	java.util.List<java.lang.String>
Default	http://localhost:40080/solr
Description	The list of Solr URLs to connect to. These URLs are ignored if connecting to SolrCloud (solr.cloud=true) and non-empty ZooKeeper addresses (solr.zookeeper.addresses) have been set. For a Feeder application that is not connected to a SolrCloud cluster, a single URL to the Solr master must be configured.
`solr.use-xml-response-writer`
Type	java.lang.Boolean
Default	false
Description	Whether SolrJ should use XML response format instead of Javabin format.
`solr.username`
Type	java.lang.String
Default
Description	Username for HTTP basic authentication, or empty string for no authentication.
`solr.zookeeper.addresses`
Type	java.util.List<java.lang.String>
Default
Description	ZooKeeper addresses for connecting to SolrCloud. Only used if solr.cloud=true.
`solr.zookeeper.chroot`
Type	java.lang.String
Default
Description	Optional ZooKeeper chroot path for Solr. ZooKeeper chroot support makes it possible to isolate the SolrCloud tree in a ZooKeeper instance that is Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.
`solr.zookeeper.client-timeout`
Type	java.lang.Integer
Default	10000
Description	Client-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.
`solr.zookeeper.connect-timeout`
Type	java.lang.Integer
Default	10000
Description	Connect-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

Table 4.34. Content Feeder Solr Configuration Properties

Login properties for Content Feeder

The following properties are used to define the login data for the Content Server and the administration page of the Search Engine.

`feeder.management.user`
Value	user name
Default	feeder
Description	The user name to be used in the HTTP authentication of the administration page of the Content Feeder. This is not an account from the user management of the Content Server.
`feeder.management.password`
Value	password
Default	feeder
Description	The password to be used in the HTTP authentication of the administration page of the Content Feeder.
`repository.user`
Value	user name
Default	feeder
Description	The user account the Content Feeder uses to read content.
`repository.password`
Value	password
Default	feeder
Description	The password for the user account of the Content Feeder.

Table 4.35. Properties for login

Partial Update specific Properties for Content Feeder

With this property you can configure the usage of partial updates, if supported by the connected Indexer - for example for Solr as configured with property feeder.solr.partial-updates.enabled.

`feeder.partialUpdate.aspects`
Value	comma-separated list of index document aspects or *
Default	*
Description	The aspects of index documents that can be updated with a partial update, provided that the connected Indexer supports partial updates (for example, `feeder.solr.partial-updates.enabled=true` for Solr). Multiple values are separated by comma. Use the special value * to use partial updates for all aspects, if possible. An empty value means that partial updates are not used. See the API documentation of `Feedable.isPartialUpdate`, `FeedableAspect` and `ContentFeedableAspect` in package `com.coremedia.cap.feeder` for more details.

Table 4.36. Partial update configuration

Batch configuration properties for Content Feeder

With these properties you can configure the processing of batches.

`feeder.batch.max-bytes`
Type	org.springframework.util.unit.DataSize
Default	5MB
Description	The maximum batch size in bytes. The Feeder sends a batch to the search engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only. A smaller batch may be sent if the maximum number of index documents is reached before, or if configured delays are reached.
`feeder.batch.max-open`
Type	java.lang.Integer
Default	5
Description	The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received.
`feeder.batch.max-processed`
Type	java.lang.Integer
Default	1
Description	The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received.
`feeder.batch.max-size`
Type	java.lang.Integer
Default	500
Description	The maximum number of index documents in a batch. If the maximum number is reached, the Feeder sends the batch to the search engine. A smaller batch may be sent if the maximum byte size is reached before, or if configured delays are reached.
`feeder.batch.retry-send-idle-delay`
Type	java.time.Duration
Default	1m
Description	The time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is idle.
`feeder.batch.retry-send-max-delay`
Type	java.time.Duration
Default	10m
Description	The maximum time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is not idle. The setting is typically larger than retry-send-idle-delay.
`feeder.batch.send-idle-delay`
Type	java.time.Duration
Default	3s
Description	The time between adding an index document to a batch and sending that batch to the search engine, if the batch is not yet full according to the max-size and max-bytes configuration properties, and if the feeder is idle. If a change needs to be sent to the search engine, and no further changes were made within the specified time, then an index document for the change will be sent after that time to the search engine. A small delay ensures low latency for changes to become visible in the search engine, as long as the system is not too busy.
`feeder.batch.send-max-delay`
Type	java.time.Duration
Default	20s
Description	The maximum time between adding an index document to a batch and sending that batch to the search engine. This setting is typically larger than send-idle-delay to allow batches to grow and increase throughput, for example when large amounts of content are created by an import process. The configured value may still be exceeded under high load, or if there are problems connecting to the search engine.

Table 4.37. Feeder Batch Configuration Properties

Solr specific properties to define Feedable for Content Feeder

You can use the following properties to define which elements the Content Feeder should feed to the Search Engine.

`feeder.indexDeleted`
Value	`true` or `false`
Default	`true`
Description	`true` if contents in the trash should be indexed. If you do not need to find contents in the trash and want to keep your index smaller, you can change this to `false`.
`feeder.indexReferrers`
Value	`true` or `false`
Default	`false`
Description	`true` to reindex a content after its referrers have changed.
`feeder.indexNameInTextBody`
Value	`true` or `false`
Default	`true`
Description	Configures whether the content name should be indexed in index field textbody. It can make sense to disable this if lots of content names contain unique identifiers (from third-party systems, for example) to avoid problems with too many unique terms in field textbody.
`feeder.indexGroups`
Value	`true` or `false`
Default	`true`
Description	`true` to index the groups with potential read rights for the content in the index field `groups`. This set of groups is then used to narrow a user's search to the contents where he might have read rights to. This is an optimization to get smaller search results for some queries and content structures and to get more accurate search suggestion counts. The client has to check for read rights anyway. If set to `false`, then you must also configure Studio and Content Server to not add a query condition for the indexed groups. To this end, set the Studio property `studio.rest.searchService.useGroupsFilterQuery` and the Content Server property `solr.useGroupsFilterQuery` to `false`.
`feeder.updateGroups.immediately`
Value	`true` or `false`
Default	`false`
Description	If `feeder.indexGroups` is `true`, configures whether the field `groups` is updated immediately after a change of a folder's right rule. It is recommended to keep this set to `false` and let the Content Feeder update the index field `groups` in the background with lower priority than updates for editorial changes. It is quite expensive to set this to `true` because all contents below the folder will be reindexed.

Table 4.38. Properties to feed additional items

Properties to define content types for feeding

You can restrict the indexed contents by their type using the includes and excludes properties.

`feeder.content.type.includes`
Value	content type name
Default	Content_
Description	The name of the abstract or concrete content type whose contents should be indexed. Regular expressions are not allowed.
`feeder.content.type.excludes`
Value	content type name
Default	Preferences, EditorPreferences, Dictionary, Query
Description	The name of the abstract or concrete content type whose contents should not be indexed. Regular expressions are not allowed.

Table 4.39. Properties to specify content types.

Properties to define property types for feeding

The default configuration feeds all properties for all specified content types. For configuration of indexed properties by their name, see the section for XML configuration below.

Property types to feed

You can only select a content property from a content type if its property type is specified with the following rules.

`feeder.content.propertyType.string`
Value	`true` or `false`
Default	`true`
Description	Set this property to `false` in order to exclude `String` properties from indexing.
`feeder.content.propertyType.integer`
Value	`true` or `false`
Default	`false`
Description	Set this property to `true` in order to include `Integer` properties when indexing.
`feeder.content.propertyType.date`
Value	`true` or `false`
Default	`false`
Description	Set this property to `true` in order to include `Date` properties when indexing.
`feeder.content.propertyType.linkList`
Value	`true` or `false`
Default	`false`
Description	Set this property to `true` in order to include `LinkList` properties when indexing.
`feeder.content.propertyType.struct`
Value	`true` or `false`
Default	`false`
Description	Set this property to `true` in order to include `Struct` properties when indexing.
`feeder.content.propertyType.xmlGrammars`
Value	List of included grammar names separated by comma
Default	`coremedia-richtext-1.0`
Description	You can define which XML properties should be indexed by specifying their grammar. Example `feeder.content.propertyType.xmlGrammars=coremedia-richtext-1.0`
`feeder.content.propertyType.blobMimeType.includes`
Value	List of included MIME types separated by comma
Default	`text/*,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document`
Description	You can define which blob properties are indexed, depending on the MIME type. Example `feeder.content.propertyType.blobMimeType.includes=text/` All blobs of MIME type `text/` are indexed.
`feeder.content.propertyType.blobMimeType.excludes`
Value	List of excluded MIME types separated by comma
Default	(empty)
Description	Exclude some blobs from indexing depending on the MIME type. If you've included a primary MIME type such as `text/` or even the catch all type `/`, you can exclude some concrete types with this property. Example* `feeder.content.propertyType.blobMimeType.excludes=text/plain` Blobs of MIME type `text/plain` will not be indexed.
`feeder.content.propertyType.blobMaxSize`
Value	size in bytes
Default	5242880 (5 MB)
Description	Configure the maximum size of indexed blob properties. Larger values will be skipped. This configuration can be overridden in a Spring XML configuration file where you can configure the maximum size per MIME type by customizing the bean `feederContentBlobMaxSizePerMimeType`. See XML configuration for an example.

Table 4.40. Include property types

Properties to configure Apache Tika

You can customize text extraction with Apache Tika using the following properties:

`feeder.tika.append-metadata`
Type	java.lang.String
Default
Description	Comma-separated list of metadata identifiers returned by Apache Tika to append to the extracted body text.
`feeder.tika.config`
Type	org.springframework.core.io.Resource
Default
Description	The location of a custom Tika Config XML, for example to customize the default Tika parsers. See Apache Tika documentation for details on configuring Tika. The value of this property must be a Spring Resource location (e.g. file:/path/to/local/file) or empty for defaults.
`feeder.tika.copy-metadata`
Type	java.lang.String
Default
Description	Comma-separated list of metadata identifiers returned by Apache Tika and names of Feedable elements to copy the metadata to. Entries in the comma separated list have the following format: "metadata identifier"="element name". With Apache Solr, target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier.
`feeder.tika.timeout`
Type	java.time.Duration
Default	2m
Description	The maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction.
`feeder.tika.warn-time-threshold`
Type	java.time.Duration
Default	15s
Description	The time after which a warning is logged when text extraction from binary data with Apache Tika takes some time.
`feeder.tika.zip-bomb-prevention.enabled`
Type	java.lang.Boolean
Default	true
Description	Sets whether Apache Tika's "Zip bomb" prevention is enabled. When a "Zip bomb" is detected, no text will be extracted from the Blob, but a warning will be logged. Note that "Zip bombs" are not restricted to ZIP files but also apply to PDFs or other formats. Disabled "Zip bomb" prevention bears the risk of OutOfMemoryError-s. Note that false positives are possible.
`feeder.tika.zip-bomb-prevention.maximum-compression-ratio`
Type	java.lang.Long
Default	-1
Description	Sets the ratio between output characters and input bytes for the Apache Tika "Zip bomb" prevention. If this ratio is exceeded (after the output threshold has been reached) then no text will be extracted and a warning will be logged. Set to -1 to use the default of Apache Tika.
`feeder.tika.zip-bomb-prevention.maximum-depth`
Type	java.lang.Integer
Default	-1
Description	Sets the maximum XML element nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.
`feeder.tika.zip-bomb-prevention.maximum-package-entry-depth`
Type	java.lang.Integer
Default	-1
Description	Sets the maximum package entry nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

Table 4.41. Feeder Tika Configuration Properties

Feeder Core Properties

You can use the following properties to customize some internal settings of the Content Feeder.

`feeder.core.executor-queue-capacity`
Type	java.lang.Integer
Default	100
Description	Maximum capacity of the Feeder's executor queue, which is internally used to transfer evaluated values.
`feeder.core.executor-retry-delay`
Type	java.time.Duration
Default	1m
Description	The delay to wait before the Feeder retries to access the source data after failures.

Table 4.42. Feeder Core Configuration Properties

Error behavior specific Properties for Content Feeder

You can use the following properties to customize the Content Feeder behavior in case of errors.

`feeder.retryConnectToIndexDelay.seconds`
Value	time in seconds
Default	10
Description	The time to wait between retries to connect to the search engine on startup.

Table 4.43. Error Handling Configuration Properties

Renamed Content Feeder Properties

Deprecated Name	New Name
`feeder.executorQueueCapacity`	`feeder.core.executor-queue-capacity`
`feeder.executorRetryDelay`	`feeder.core.executor-retry-delay`
`feeder.maxBatchByteSize`	`feeder.batch.max-bytes`
`feeder.maxBatchBytes`	`feeder.batch.max-bytes`
`feeder.maxBatchSize`	`feeder.batch.max-size`
`feeder.maxOpenBatches`	`feeder.batch.max-open`
`feeder.maxProcessedBatches`	`feeder.batch.max-processed`
`feeder.retrySendIdleDelay`	`feeder.batch.retry-send-idle-delay`
`feeder.retrySendMaxDelay`	`feeder.batch.retry-send-max-delay`
`feeder.sendIdleDelay`	`feeder.batch.send-idle-delay`
`feeder.sendMaxDelay`	`feeder.batch.send-max-delay`
`solr.partialUpdates`	`feeder.solr.partial-updates.enabled`
`solr.partialUpdatesSkipIndexCheck`	`feeder.solr.partial-updates.skip-index-check`
`feeder.tika.timeout.milliseconds`	`feeder.tika.timeout`
`feeder.tika.warn.milliseconds`	`feeder.tika.warn-time-threshold`
`solr.collection.content`	`solr.content.collection`
`solr.configSet`	`solr.cae.config-set (CAE Feeder), solr.content.config-set (Content Feeder)`

Table 4.44. Renamed Content Feeder Configuration Properties

Search Results

Table Of Contents

Filter