Search Manual / 4.2.3.3 Configuring Tika

Search Manual / Version 2512.0

4.2.3.3 Configuring Tika

The Feeder uses Apache Tika to extract text and metadata from blob properties for indexing.

Extracted text and metadata is cached in heap memory to avoid repeated potentially expensive processing of the same blob. Caching can improve feeding performance, if a content was modified but its blob property was not changed, or if the same blob value is used in multiple content items. Use configuration property cache.capacities.feeder.tika.heap to configure the cache capacity in estimated bytes. The Blueprint configures a default capacity of 10 MB as follows:

Example

cache.capacities.feeder.tika.heap=10485760

Tika provides parsers for various formats, which can be customized in a special Apache Tika XML configuration file. The default configuration covers typical formats so that a custom configuration is rarely needed. If you need to fine-tune the configuration of Apache Tika, please have a look at the documentation of Apache Tika for the format of the Tika Config XML file. The location of this file can be configured with the Spring configuration property feeder.tika.config. The value of this property is a Spring Resource location. The following example configures an Apache Tika Config file from the local file system:

Example

feeder.tika.config=file:/opt/path/tika-config.xml

Search Results

Table Of Contents

Filter

Search Manual / Version 2512.0

4.2.3.3 Configuring Tika

Search Results