Deployment Manual / Version 2010
Table Of ContentsSolr specific properties for Content Feeder
feeder.solr.partial-updates.enabled
| |
Type | java.lang.Boolean |
Default | true |
Description | Whether partial updates are supported for updating content metadata in Solr. This requires that all fields in the Solr index are configured as stored="true" or docValues="true" except fields that are copyField destinations, which must be configured as stored="false". This is because partial updates are applied to the index document reconstructed from the existing stored field values. |
feeder.solr.partial-updates.skip-index-check
| |
Type | java.lang.Boolean |
Default | false |
Description | If feeder.solr.partial-updates.enabled is true, the Solr index schema is analyzed whether fields are stored as required for partial updates. The Feeder will log a warning and not use partial update functionality if the index seems to not support it. You can set this property to true to skip the check. |
feeder.solr.send-retry-delay
| |
Type | java.time.Duration |
Default | 30s |
Description | The delay to wait before the Feeder retries to send data after failures from Solr. |
solr.cloud
| |
Type | java.lang.Boolean |
Default | false |
Description | Whether to connect to SolrCloud. If true, connect to a SolrCloud cluster. SolrCloud connection details must be set either as ZooKeeper addresses (solr.zookeeper.addresses) or, if the former is unset or empty as HTTP URLs (solr.url). If false, connect to stand-alone Solr nodes via HTTP URLs (solr.url). |
solr.connection-timeout
| |
Type | java.lang.Integer |
Default | 0 |
Description | Connection timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default. |
solr.content.collection
| |
Type | java.lang.String |
Default | studio |
Description | The name of the Solr collection for editorial search. |
solr.content.config-set
| |
Type | java.lang.String |
Default | content |
Description | The name of the Solr config set to use when creating the collection for editorial search. This property is used by the Content Feeder. |
solr.index-data-directory
| |
Type | java.lang.String |
Default | data |
Description | Value for the "dataDir" parameter of the Solr CoreAdmin API / Collection API request to create a Solr index. |
solr.password
| |
Type | java.lang.String |
Default | |
Description | Password for HTTP basic authentication, used if a non-empty solr.username has been specified. The value may have been encrypted with the tool "cm encryptpasswordproperty". |
solr.socket-timeout
| |
Type | java.lang.Integer |
Default | 600000 |
Description | Socket timeout in milliseconds, or 0 for no timeout, or a negative value to use SolrClient default. |
solr.url
| |
Type | java.util.List<java.lang.String> |
Default | http://localhost:40080/solr |
Description | The list of Solr URLs to connect to. These URLs are ignored if connecting to SolrCloud (solr.cloud=true) and non-empty ZooKeeper addresses (solr.zookeeper.addresses) have been set. For a Feeder application that is not connected to a SolrCloud cluster, a single URL to the Solr master must be configured. |
solr.use-xml-response-writer
| |
Type | java.lang.Boolean |
Default | false |
Description | Whether SolrJ should use XML response format instead of Javabin format. |
solr.username
| |
Type | java.lang.String |
Default | |
Description | Username for HTTP basic authentication, or empty string for no authentication. |
solr.zookeeper.addresses
| |
Type | java.util.List<java.lang.String> |
Default | |
Description | ZooKeeper addresses for connecting to SolrCloud. Only used if solr.cloud=true. |
solr.zookeeper.chroot
| |
Type | java.lang.String |
Default | |
Description | Optional ZooKeeper chroot path for Solr. ZooKeeper chroot support makes it possible to isolate the SolrCloud tree in a ZooKeeper instance that is Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value. |
solr.zookeeper.client-timeout
| |
Type | java.lang.Integer |
Default | 10000 |
Description | Client-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value. |
solr.zookeeper.connect-timeout
| |
Type | java.lang.Integer |
Default | 10000 |
Description | Connect-timeout for ZooKeeper in milliseconds, or a negative value to use SolrClient default. Only used if solr.cloud=true and solr.zookeeper.addresses is set to non-empty value. |
Table 4.34. Content Feeder Solr Configuration Properties
Login properties for Content Feeder
The following properties are used to define the login data for the Content Server and the administration page of the Search Engine.
feeder.management.user | |
Value | user name |
Default | feeder |
Description | The user name to be used in the HTTP authentication of the administration page of the Content Feeder. This is not an account from the user management of the Content Server. |
feeder.management.password | |
Value | password |
Default | feeder |
Description | The password to be used in the HTTP authentication of the administration page of the Content Feeder. |
repository.user | |
Value | user name |
Default | feeder |
Description | The user account the Content Feeder uses to read content. |
repository.password | |
Value | password |
Default | feeder |
Description | The password for the user account of the Content Feeder. |
Table 4.35. Properties for login
Partial Update specific Properties for Content Feeder
With this property you can configure the usage of partial updates, if supported by the connected Indexer -
for example for Solr as configured with property feeder.solr.partial-updates.enabled
.
feeder.partialUpdate.aspects | |
Value | comma-separated list of index document aspects or * |
Default | * |
Description | The aspects of index documents that can be updated with a partial update, provided that
the connected Indexer supports partial updates (for example, feeder.solr.partial-updates.enabled=true for Solr).
Multiple values are separated by comma. Use the special value * to use partial updates for all aspects,
if possible. An empty value means that partial updates are not used.
See the API documentation of Feedable.isPartialUpdate ,
FeedableAspect and ContentFeedableAspect in
package com.coremedia.cap.feeder for more details.
|
Table 4.36. Partial update configuration
Batch configuration properties for Content Feeder
With these properties you can configure the processing of batches.
feeder.batch.max-bytes
| |
Type | org.springframework.util.unit.DataSize |
Default | 5MB |
Description | The maximum batch size in bytes. The Feeder sends a batch to the search engine if its maximum size would be exceeded when adding more entries. Note, that byte computation is a rough estimate only. A smaller batch may be sent if the maximum number of index documents is reached before, or if configured delays are reached. |
feeder.batch.max-open
| |
Type | java.lang.Integer |
Default | 5 |
Description | The maximum number of batches indexed in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the maximum number of parallel batches has been reached. The method will not be called until a callback about the persistence of one of these batches has been received. |
feeder.batch.max-processed
| |
Type | java.lang.Integer |
Default | 1 |
Description | The maximum number of batches processed by the Indexer in parallel. This setting is not used with the default integration of Apache Solr but only with custom implementations of the com.coremedia.cap.feeder.index.async.AsyncIndexer interface. The Feeder does not call the index method of the AsyncIndexer interface to index another batch if the configured number of currently processed batches has been reached. The method will not be called until a callback about completed processing or persistence of one of these batches has been received. |
feeder.batch.max-size
| |
Type | java.lang.Integer |
Default | 500 |
Description | The maximum number of index documents in a batch. If the maximum number is reached, the Feeder sends the batch to the search engine. A smaller batch may be sent if the maximum byte size is reached before, or if configured delays are reached. |
feeder.batch.retry-send-idle-delay
| |
Type | java.time.Duration |
Default | 1m |
Description | The time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is idle. |
feeder.batch.retry-send-max-delay
| |
Type | java.time.Duration |
Default | 10m |
Description | The maximum time to wait before retrying to send index documents to the search engine after failures. This delay is used if the feeder is not idle. The setting is typically larger than retry-send-idle-delay. |
feeder.batch.send-idle-delay
| |
Type | java.time.Duration |
Default | 3s |
Description | The time between adding an index document to a batch and sending that batch to the search engine, if the batch is not yet full according to the max-size and max-bytes configuration properties, and if the feeder is idle. If a change needs to be sent to the search engine, and no further changes were made within the specified time, then an index document for the change will be sent after that time to the search engine. A small delay ensures low latency for changes to become visible in the search engine, as long as the system is not too busy. |
feeder.batch.send-max-delay
| |
Type | java.time.Duration |
Default | 20s |
Description | The maximum time between adding an index document to a batch and sending that batch to the search engine. This setting is typically larger than send-idle-delay to allow batches to grow and increase throughput, for example when large amounts of content are created by an import process. The configured value may still be exceeded under high load, or if there are problems connecting to the search engine. |
Table 4.37. Feeder Batch Configuration Properties
Solr specific properties to define Feedable for Content Feeder
You can use the following properties to define which elements the Content Feeder should feed to the Search Engine.
feeder.indexDeleted | |
Value | true or false
|
Default | true
|
Description | true if contents in the trash should be indexed. If you do not need to
find contents in the trash and want to keep your index smaller, you can change this
to false . |
feeder.indexReferrers | |
Value | true or false
|
Default | false
|
Description | true to reindex a content after its referrers have changed. |
feeder.indexNameInTextBody | |
Value | true or false
|
Default | true
|
Description |
Configures whether the content name should be indexed in index field textbody. It can make sense to disable this if lots of content names contain unique identifiers (from third-party systems, for example) to avoid problems with too many unique terms in field textbody. |
feeder.indexGroups | |
Value | true or false
|
Default | true
|
Description |
If set to |
feeder.updateGroups.immediately | |
Value | true or false
|
Default | false
|
Description | If feeder.indexGroups is true , configures whether the
field groups is updated immediately after a change of a folder's right
rule. It is recommended to keep this set to false and let the
Content Feeder update the index field groups
in the background with lower priority than updates for editorial changes.
It is quite expensive to set this to true because all contents
below the folder will be reindexed. |
Table 4.38. Properties to feed additional items
Properties to define content types for feeding
You can restrict the indexed contents by their type using the includes
and excludes
properties.
feeder.content.type.includes | |
Value | content type name |
Default | Content_ |
Description | The name of the abstract or concrete content type whose contents should be indexed. Regular expressions are not allowed. |
feeder.content.type.excludes | |
Value | content type name |
Default | Preferences, EditorPreferences, Dictionary, Query |
Description | The name of the abstract or concrete content type whose contents should not be indexed. Regular expressions are not allowed. |
Table 4.39. Properties to specify content types.
Properties to define property types for feeding
The default configuration feeds all properties for all specified content types. For configuration of indexed properties by their name, see the section for XML configuration below.
Property types to feed
You can only select a content property from a content type if its property type is specified with the following rules.
feeder.content.propertyType.string | |
Value | true or false
|
Default | true
|
Description | Set this property to false in order to exclude String
properties from indexing. |
feeder.content.propertyType.integer | |
Value | true or false
|
Default | false
|
Description | Set this property to true in order to include Integer
properties when indexing. |
feeder.content.propertyType.date | |
Value | true or false
|
Default | false
|
Description | Set this property to true in order to include Date
properties when indexing. |
feeder.content.propertyType.linkList | |
Value | true or false
|
Default | false
|
Description | Set this property to true in order to include
LinkList properties when indexing. |
feeder.content.propertyType.struct | |
Value | true or false
|
Default | false
|
Description | Set this property to true in order to include Struct
properties when indexing. |
feeder.content.propertyType.xmlGrammars | |
Value | List of included grammar names separated by comma |
Default | coremedia-richtext-1.0
|
Description |
You can define which XML properties should be indexed by specifying their grammar. Example
|
feeder.content.propertyType.blobMimeType.includes | |
Value | List of included MIME types separated by comma |
Default | text/*,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document |
Description |
You can define which blob properties are indexed, depending on the MIME type. Example
All blobs of MIME type |
feeder.content.propertyType.blobMimeType.excludes | |
Value | List of excluded MIME types separated by comma |
Default | (empty) |
Description |
Exclude some blobs from indexing depending on the MIME type. If you've included a primary MIME type such
as Example
Blobs of MIME type |
feeder.content.propertyType.blobMaxSize | |
Value | size in bytes |
Default | 5242880 (5 MB) |
Description |
Configure the maximum size of indexed blob properties. Larger values will be skipped.
This configuration can be overridden in a Spring XML configuration file where you can configure the
maximum size per MIME type by customizing the bean |
Table 4.40. Include property types
Properties to configure Apache Tika
You can customize text extraction with Apache Tika using the following properties:
feeder.tika.append-metadata
| |
Type | java.lang.String |
Default | |
Description | Comma-separated list of metadata identifiers returned by Apache Tika to append to the extracted body text. |
feeder.tika.config
| |
Type | org.springframework.core.io.Resource |
Default | |
Description | The location of a custom Tika Config XML, for example to customize the default Tika parsers. See Apache Tika documentation for details on configuring Tika. The value of this property must be a Spring Resource location (e.g. file:/path/to/local/file) or empty for defaults. |
feeder.tika.copy-metadata
| |
Type | java.lang.String |
Default | |
Description | Comma-separated list of metadata identifiers returned by Apache Tika and names of Feedable elements to copy the metadata to. Entries in the comma separated list have the following format: "metadata identifier"="element name". With Apache Solr, target index fields must be defined as multiValued="true" to avoid indexing errors if there are multiple metadata values with the same identifier. |
feeder.tika.timeout
| |
Type | java.time.Duration |
Default | 2m |
Description | The maximum time after which text extraction from binary data with Apache Tika fails. If extraction fails, the binary data will be skipped for the index document. Lower values will avoid that the Feeder is blocked for a long time in text extraction. |
feeder.tika.warn-time-threshold
| |
Type | java.time.Duration |
Default | 15s |
Description | The time after which a warning is logged when text extraction from binary data with Apache Tika takes some time. |
feeder.tika.zip-bomb-prevention.enabled
| |
Type | java.lang.Boolean |
Default | true |
Description | Sets whether Apache Tika's "Zip bomb" prevention is enabled. When a "Zip bomb" is detected, no text will be extracted from the Blob, but a warning will be logged. Note that "Zip bombs" are not restricted to ZIP files but also apply to PDFs or other formats. Disabled "Zip bomb" prevention bears the risk of OutOfMemoryError-s. Note that false positives are possible. |
feeder.tika.zip-bomb-prevention.maximum-compression-ratio
| |
Type | java.lang.Long |
Default | -1 |
Description | Sets the ratio between output characters and input bytes for the Apache Tika "Zip bomb" prevention. If this ratio is exceeded (after the output threshold has been reached) then no text will be extracted and a warning will be logged. Set to -1 to use the default of Apache Tika. |
feeder.tika.zip-bomb-prevention.maximum-depth
| |
Type | java.lang.Integer |
Default | -1 |
Description | Sets the maximum XML element nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika. |
feeder.tika.zip-bomb-prevention.maximum-package-entry-depth
| |
Type | java.lang.Integer |
Default | -1 |
Description | Sets the maximum package entry nesting level for the Apache Tika "Zip bomb" prevention. If this depth level is exceeded then no text will be extracted, and a warning will be logged. Set to -1 to use the default of Apache Tika. |
Table 4.41. Feeder Tika Configuration Properties
Feeder Core Properties
You can use the following properties to customize some internal settings of the Content Feeder.
feeder.core.executor-queue-capacity
| |
Type | java.lang.Integer |
Default | 100 |
Description | Maximum capacity of the Feeder's executor queue, which is internally used to transfer evaluated values. |
feeder.core.executor-retry-delay
| |
Type | java.time.Duration |
Default | 1m |
Description | The delay to wait before the Feeder retries to access the source data after failures. |
Table 4.42. Feeder Core Configuration Properties
Error behavior specific Properties for Content Feeder
You can use the following properties to customize the Content Feeder behavior in case of errors.
feeder.retryConnectToIndexDelay.seconds | |
Value | time in seconds |
Default | 10 |
Description | The time to wait between retries to connect to the search engine on startup. |
Table 4.43. Error Handling Configuration Properties
Renamed Content Feeder Properties
Deprecated Name | New Name |
---|---|
feeder.executorQueueCapacity
|
feeder.core.executor-queue-capacity
|
feeder.executorRetryDelay
|
feeder.core.executor-retry-delay
|
feeder.maxBatchByteSize
|
feeder.batch.max-bytes
|
feeder.maxBatchBytes
|
feeder.batch.max-bytes
|
feeder.maxBatchSize
|
feeder.batch.max-size
|
feeder.maxOpenBatches
|
feeder.batch.max-open
|
feeder.maxProcessedBatches
|
feeder.batch.max-processed
|
feeder.retrySendIdleDelay
|
feeder.batch.retry-send-idle-delay
|
feeder.retrySendMaxDelay
|
feeder.batch.retry-send-max-delay
|
feeder.sendIdleDelay
|
feeder.batch.send-idle-delay
|
feeder.sendMaxDelay
|
feeder.batch.send-max-delay
|
solr.partialUpdates
|
feeder.solr.partial-updates.enabled
|
solr.partialUpdatesSkipIndexCheck
|
feeder.solr.partial-updates.skip-index-check
|
feeder.tika.timeout.milliseconds
|
feeder.tika.timeout
|
feeder.tika.warn.milliseconds
|
feeder.tika.warn-time-threshold
|
solr.collection.content
|
solr.content.collection
|
solr.configSet
|
solr.cae.config-set (CAE Feeder), solr.content.config-set (Content Feeder)
|
Table 4.44. Renamed Content Feeder Configuration Properties