Search Manual / Version 2401
Table Of ContentsApache Tika uses a heuristic to detect 'Zip Bombs', that is files that expand to a huge amount of text when parsed. Parsing such files can lead to severe memory and/or performance issues in the Feeder. To prevent denial of service attacks or problems caused by malfunctioning parsers, the prevention is enabled by default. If Tika detects a blob to be a 'Zip Bomb', no text will be extracted from that blob and a warning will be logged instead. Note that 'Zip Bomb' attacks are not limited to ZIP files but can also occur for example with PDF files.
Normally, there's no need to change the configuration but if you encounter false positives, you may
want to tweak the settings for Tika's heuristic or even turn off the prevention. You can disable
'Zip Bomb' detection with property feeder.tika.zip-bomb-prevention.enabled=false
and tweak
the heuristic with various properties starting with feeder.tika.zip-bomb-prevention
.
For details, see Section 3.10.2, “CAE Feeder Properties” in Deployment Manual.