close

Filter

loading table of contents...

Search Manual / Version 2010

Table Of Contents

4.2.3.4 Configuring Tika Zip Bomb Prevention

Apache Tika uses a heuristic to detect 'Zip Bombs', that is files that expand to a huge amount of text when parsed. Parsing such files can lead to severe memory and/or performance issues in the Feeder. To prevent denial of service attacks or problems caused by malfunctioning parsers, the prevention is enabled by default. If Tika detects a blob to be a 'Zip Bomb', no text will be extracted from that blob and a warning will be logged instead. Note that 'Zip Bomb' attacks are not limited to ZIP files but can also occur for example with PDF files.

Normally, there's no need to change the configuration but if you encounter false positives, you may want to tweak the settings for Tika's heuristic or even turn off the prevention. You can disable 'Zip Bomb' detection with property feeder.tika.zip-bomb-prevention.enabled=false and tweak the heuristic with various properties starting with feeder.tika.zip-bomb-prevention. For details, see Section 4.9.1, “Content Feeder Properties” in Deployment Manual.

Search Results

Table Of Contents