close

Filter

loading table of contents...

Deployment Manual / Version 2010

Table Of Contents

4.9.1 Content Feeder Properties

Solr specific properties for Content Feeder
feeder.solr.partial-updates.enabled
Type java.lang.Boolean
Default true
Description

Whether partial updates are supported for updating content metadata in Solr. This requires that all fields in the Solr index are configured as stored="true" or docValues="true" except fields that are copyField destinations, which must be configured as stored="false". This is because partial updates are applied to the index document reconstructed from the existing stored field values.

feeder.solr.partial-updates.skip-index-check
Type java.lang.Boolean
Default false
Description

If feeder.solr.partial-updates.enabled is true, the Solr index schema is analyzed whether fields are stored as required for partial updates. The Feeder will log a warning and not use partial update functionality if the index seems to not support it. You can set this property to true to skip the check.

feeder.solr.send-retry-delay
Type java.time.Duration
Default 30s
Description

The delay to wait before the Feeder retries to send data after failures from Solr.

solr.cloud
Type java.lang.Boolean
Default false
Description

Whether to connect to SolrCloud. If true, connect to a SolrCloud cluster. SolrCloud connection details must be set either as ZooKeeper addresses (solr.zookeeper.addresses) or, if the former is unset or empty as HTTP URLs (solr.url). If false, connect to stand-alone Solr nodes via HTTP URLs (solr.url).

solr.connection-timeout
Type java.lang.Integer
Default 0
Description

Connection timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default.

solr.content.collection
Type java.lang.String
Default studio
Description

The name of the Solr collection for editorial search.

solr.content.config-set
Type java.lang.String
Default content
Description

The name of the Solr config set to use when creating the collection for editorial search. This property is used by the Content Feeder.

solr.index-data-directory
Type java.lang.String
Default data
Description

Value for the "dataDir" parameter of the Solr CoreAdmin API / Collection API request to create a Solr index.

solr.password
Type java.lang.String
Default
Description

Password for HTTP basic authentication, used if a non-empty solr.username has been specified. The value may have been encrypted with the tool "cm encryptpasswordproperty".

solr.socket-timeout
Type java.lang.Integer
Default 600000
Description

Socket timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default.

solr.url
Type java.util.List<java.lang.String>
Default http://localhost:40080/solr
Description

The list of Solr URLs to connect to. These URLs are ignored if connecting to SolrCloud (solr.cloud=true) and non-empty ZooKeeper addresses (solr.zookeeper.addresses) have been set. For a Feeder application that is not connected to a SolrCloud cluster, a single URL to the Solr master must be configured.

solr.use-xml-response-writer
Type java.lang.Boolean
Default false
Description

Whether SolrJ should use XML response format instead of Javabin format.

solr.username
Type java.lang.String
Default
Description

Username for HTTP basic authentication, or empty string for no authentication.

solr.zookeeper.addresses
Type java.util.List<java.lang.String>
Default  
Description

ZooKeeper addresses for connecting to SolrCloud. Only used if solr.cloud=true.

solr.zookeeper.chroot
Type java.lang.String
Default
Description

Optional ZooKeeper chroot path for Solr. ZooKeeper chroot support makes it possible to isolate the SolrCloud tree in a ZooKeeper instance that is Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

solr.zookeeper.client-timeout
Type java.lang.Integer
Default 10000
Description

Client-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

solr.zookeeper.connect-timeout
Type java.lang.Integer
Default 10000
Description

Connect-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value.

Table 4.34. Content Feeder Solr Configuration Properties


Login properties for Content Feeder

The following properties are used to define the login data for the Content Server and the administration page of the Search Engine.

feeder.management.user
Value user name
Default feeder
Description The user name to be used in the HTTP authentication of the administration page of the Content Feeder. This is not an account from the user management of the Content Server.
feeder.management.password
Value password
Default feeder
Description The password to be used in the HTTP authentication of the administration page of the Content Feeder.
repository.user
Value user name
Default feeder
Description The user account the Content Feeder uses to read content.
repository.password
Value password
Default feeder
Description The password for the user account of the Content Feeder.

Table 4.35. Properties for login


Partial Update specific Properties for Content Feeder

With this property you can configure the usage of partial updates, if supported by the connected Indexer - for example for Solr as configured with property feeder.solr.partial-updates.enabled.

feeder.partialUpdate.aspects
Value comma-separated list of index document aspects or *
Default *
Description The aspects of index documents that can be updated with a partial update, provided that the connected Indexer supports partial updates (for example, feeder.solr.partial-updates.enabled=true for Solr). Multiple values are separated by comma. Use the special value * to use partial updates for all aspects, if possible. An empty value means that partial updates are not used. See the API documentation of Feedable.isPartialUpdate, FeedableAspect and ContentFeedableAspect in package com.coremedia.cap.feeder for more details.

Table 4.36. Partial update configuration


Batch configuration properties for Content Feeder

With these properties you can configure the processing of batches.

feeder.batch.max-bytes
Type org.springframework.util.unit.DataSize
Default 5MB
Description

The maximum batch size in bytes. The Feeder sends a batch to the search engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only. A smaller batch may be sent if the maximum number of index documents is reached before, or if configured delays are reached.

feeder.batch.max-open
Type java.lang.Integer
Default 5
Description

The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received.

feeder.batch.max-processed
Type java.lang.Integer
Default 1
Description

The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received.

feeder.batch.max-size
Type java.lang.Integer
Default 500
Description

The maximum number of index documents in a batch. If the maximum number is reached, the Feeder sends the batch to the search engine. A smaller batch may be sent if the maximum byte size is reached before, or if configured delays are reached.

feeder.batch.retry-send-idle-delay
Type java.time.Duration
Default 1m
Description

The time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is idle.

feeder.batch.retry-send-max-delay
Type java.time.Duration
Default 10m
Description

The maximum time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is not idle. The setting is typically larger than retry-send-idle-delay.

feeder.batch.send-idle-delay
Type java.time.Duration
Default 3s
Description

The time between adding an index document to a batch and sending that batch to the search engine, if the batch is not yet full according to the max-size and max-bytes configuration properties, and if the feeder is idle. If a change needs to be sent to the search engine, and no further changes were made within the specified time, then an index document for the change will be sent after that time to the search engine. A small delay ensures low latency for changes to become visible in the search engine, as long as the system is not too busy.

feeder.batch.send-max-delay
Type java.time.Duration
Default 20s
Description

The maximum time between adding an index document to a batch and sending that batch to the search engine. This setting is typically larger than send-idle-delay to allow batches to grow and increase throughput, for example when large amounts of content are created by an import process. The configured value may still be exceeded under high load, or if there are problems connecting to the search engine.

Table 4.37. Feeder Batch Configuration Properties


Solr specific properties to define Feedable for Content Feeder

You can use the following properties to define which elements the Content Feeder should feed to the Search Engine.

feeder.indexDeleted
Value true or false
Default true
Description true if contents in the trash should be indexed. If you do not need to find contents in the trash and want to keep your index smaller, you can change this to false.
feeder.indexReferrers
Value true or false
Default false
Description true to reindex a content after its referrers have changed.
feeder.indexNameInTextBody
Value true or false
Default true
Description

Configures whether the content name should be indexed in index field textbody. It can make sense to disable this if lots of content names contain unique identifiers (from third-party systems, for example) to avoid problems with too many unique terms in field textbody.

feeder.indexGroups
Value true or false
Default true
Description

true to index the groups with potential read rights for the content in the index field groups. This set of groups is then used to narrow a user's search to the contents where he might have read rights to. This is an optimization to get smaller search results for some queries and content structures and to get more accurate search suggestion counts. The client has to check for read rights anyway.

If set to false, then you must also configure Studio and Content Server to not add a query condition for the indexed groups. To this end, set the Studio property studio.rest.searchService.useGroupsFilterQuery and the Content Server property solr.useGroupsFilterQuery to false.

feeder.updateGroups.immediately
Value true or false
Default false
Description If feeder.indexGroups is true, configures whether the field groups is updated immediately after a change of a folder's right rule. It is recommended to keep this set to false and let the Content Feeder update the index field groups in the background with lower priority than updates for editorial changes. It is quite expensive to set this to true because all contents below the folder will be reindexed.

Table 4.38. Properties to feed additional items


Properties to define content types for feeding

You can restrict the indexed contents by their type using the includes and excludes properties.

feeder.content.type.includes
Value content type name
Default Content_
Description The name of the abstract or concrete content type whose contents should be indexed. Regular expressions are not allowed.
feeder.content.type.excludes
Value content type name
Default Preferences, EditorPreferences, Dictionary, Query
Description The name of the abstract or concrete content type whose contents should not be indexed. Regular expressions are not allowed.

Table 4.39. Properties to specify content types.


Properties to define property types for feeding

The default configuration feeds all properties for all specified content types. For configuration of indexed properties by their name, see the section for XML configuration below.

Property types to feed

You can only select a content property from a content type if its property type is specified with the following rules.

feeder.content.propertyType.string
Value true or false
Default true
Description Set this property to false in order to exclude String properties from indexing.
feeder.content.propertyType.integer
Value true or false
Default false
Description Set this property to true in order to include Integer properties when indexing.
feeder.content.propertyType.date
Value true or false
Default false
Description Set this property to true in order to include Date properties when indexing.
feeder.content.propertyType.linkList
Value true or false
Default false
Description Set this property to true in order to include LinkList properties when indexing.
feeder.content.propertyType.struct
Value true or false
Default false
Description Set this property to true in order to include Struct properties when indexing.
feeder.content.propertyType.xmlGrammars
Value List of included grammar names separated by comma
Default coremedia-richtext-1.0
Description

You can define which XML properties should be indexed by specifying their grammar.

Example

feeder.content.propertyType.xmlGrammars=coremedia-richtext-1.0

feeder.content.propertyType.blobMimeType.includes
Value List of included MIME types separated by comma
Default text/*,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document
Description

You can define which blob properties are indexed, depending on the MIME type.

Example

feeder.content.propertyType.blobMimeType.includes=text/*

All blobs of MIME type text/* are indexed.

feeder.content.propertyType.blobMimeType.excludes
Value List of excluded MIME types separated by comma
Default (empty)
Description

Exclude some blobs from indexing depending on the MIME type. If you've included a primary MIME type such as text/* or even the catch all type */*, you can exclude some concrete types with this property.

Example

feeder.content.propertyType.blobMimeType.excludes=text/plain

Blobs of MIME type text/plain will not be indexed.

feeder.content.propertyType.blobMaxSize
Value size in bytes
Default 5242880 (5 MB)
Description

Configure the maximum size of indexed blob properties. Larger values will be skipped.

This configuration can be overridden in a Spring XML configuration file where you can configure the maximum size per MIME type by customizing the bean feederContentBlobMaxSizePerMimeType. See XML configuration for an example.

Table 4.40. Include property types


Properties to configure Apache Tika

You can customize text extraction with Apache Tika using the following properties:

feeder.tika.append-metadata
Type java.lang.String
Default
Description

Comma-separated list of metadata identifiers returned by Apache Tika to append to the extracted body text.

feeder.tika.config
Type org.springframework.core.io.Resource
Default  
Description

The location of a custom Tika Config XML, for example to customize the default Tika parsers. See Apache Tika documentation for details on configuring Tika. The value of this property must be a Spring Resource location (e.g. file:/path/to/local/file) or empty for defaults.

feeder.tika.copy-metadata
Type java.lang.String
Default
Description

Comma-separated list of metadata identifiers returned by Apache Tika and names of Feedable elements to copy the metadata to. Entries in the comma separated list have the following format: "metadata identifier"="element name". With Apache Solr, target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier.

feeder.tika.timeout
Type java.time.Duration
Default 2m
Description

The maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction.

feeder.tika.warn-time-threshold
Type java.time.Duration
Default 15s
Description

The time after which a warning is logged when text extraction from binary data with Apache Tika takes some time.

feeder.tika.zip-bomb-prevention.enabled
Type java.lang.Boolean
Default true
Description

Sets whether Apache Tika's "Zip bomb" prevention is enabled. When a "Zip bomb" is detected, no text will be extracted from the Blob, but a warning will be logged. Note that "Zip bombs" are not restricted to ZIP files but also apply to PDFs or other formats. Disabled "Zip bomb" prevention bears the risk of OutOfMemoryError-s. Note that false positives are possible.

feeder.tika.zip-bomb-prevention.maximum-compression-ratio
Type java.lang.Long
Default -1
Description

Sets the ratio between output characters and input bytes for the Apache Tika "Zip bomb" prevention. If this ratio is exceeded (after the output threshold has been reached) then no text will be extracted and a warning will be logged. Set to -1 to use the default of Apache Tika.

feeder.tika.zip-bomb-prevention.maximum-depth
Type java.lang.Integer
Default -1
Description

Sets the maximum XML element nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

feeder.tika.zip-bomb-prevention.maximum-package-entry-depth
Type java.lang.Integer
Default -1
Description

Sets the maximum package entry nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika.

Table 4.41. Feeder Tika Configuration Properties


Feeder Core Properties

You can use the following properties to customize some internal settings of the Content Feeder.

feeder.core.executor-queue-capacity
Type java.lang.Integer
Default 100
Description

Maximum capacity of the Feeder's executor queue, which is internally used to transfer evaluated values.

feeder.core.executor-retry-delay
Type java.time.Duration
Default 1m
Description

The delay to wait before the Feeder retries to access the source data after failures.

Table 4.42. Feeder Core Configuration Properties


Error behavior specific Properties for Content Feeder

You can use the following properties to customize the Content Feeder behavior in case of errors.

feeder.retryConnectToIndexDelay.seconds
Value time in seconds
Default 10
Description The time to wait between retries to connect to the search engine on startup.

Table 4.43. Error Handling Configuration Properties


Renamed Content Feeder Properties
Deprecated NameNew Name
feeder.executorQueueCapacity feeder.core.executor-queue-capacity
feeder.executorRetryDelay feeder.core.executor-retry-delay
feeder.maxBatchByteSize feeder.batch.max-bytes
feeder.maxBatchBytes feeder.batch.max-bytes
feeder.maxBatchSize feeder.batch.max-size
feeder.maxOpenBatches feeder.batch.max-open
feeder.maxProcessedBatches feeder.batch.max-processed
feeder.retrySendIdleDelay feeder.batch.retry-send-idle-delay
feeder.retrySendMaxDelay feeder.batch.retry-send-max-delay
feeder.sendIdleDelay feeder.batch.send-idle-delay
feeder.sendMaxDelay feeder.batch.send-max-delay
solr.partialUpdates feeder.solr.partial-updates.enabled
solr.partialUpdatesSkipIndexCheck feeder.solr.partial-updates.skip-index-check
feeder.tika.timeout.milliseconds feeder.tika.timeout
feeder.tika.warn.milliseconds feeder.tika.warn-time-threshold
solr.collection.content solr.content.collection
solr.configSet solr.cae.config-set (CAE Feeder), solr.content.config-set (Content Feeder)

Table 4.44. Renamed Content Feeder Configuration Properties


Search Results

Table Of Contents