Search Manual / Version 2512.0
Table Of ContentsThe Feeder uses Apache Tika to extract text and metadata from blob properties for indexing.
Extracted text and metadata is cached in heap memory to avoid repeated potentially expensive processing
of the same blob. Caching can improve feeding performance, if a content was modified but its blob
property was not changed, or if the same blob value is used in multiple content items. Use configuration property
cache.capacities.feeder.tika.heap to configure the cache capacity in
estimated bytes. The Blueprint configures a default capacity of 10 MB as follows:
Example
cache.capacities.feeder.tika.heap=10485760
Tika provides parsers for various formats,
which can be customized in a special Apache Tika XML configuration file. The default configuration covers
typical formats so that a custom configuration is rarely needed. If you need to fine-tune the
configuration of Apache Tika, please have a look at the documentation of Apache Tika for the format of the
Tika Config XML file. The location of this file can be configured with the Spring configuration
property feeder.tika.config. The value of this property is a Spring Resource location.
The following example configures an Apache Tika Config file from the local file system:
Example
feeder.tika.config=file:/opt/path/tika-config.xml


