When computing the data for a feedable, dependencies on accessed objects are tracked and recorded by the CAE Feeder. Modifications of recorded dependencies will lead to the invalidation of the feedable. The CAE Feeder will then construct a new feedable with recomputed data and send it to the search engine. For example, a content bean will be reindexed after changing some content that was used to compute the feedable for that content bean.
In some cases, however, the invalidation of a dependency does not necessarily lead to a different value for feeding and the overhead of reindexing could be avoided for better performance.
For example, an indexed bean property gets its data from a document with global settings.
Such a document may contain lots of different settings in different properties or in a
single struct property. Imagine, that a single setting S1
from this document is accessed during the construction of each indexed feedable. Because of
this, each indexed bean will depend on the properties of the settings document. Now, if
somebody changes the document, for example by changing setting
, all indexed beans will be invalidated and reindexed.
This can take some time. And the data did not even change.
Of course, you want to avoid such situations. One possibility is to disable such
expensive dependencies by wrapping the code that creates them with the methods
and enableDependencies()
of the class
But often this is not possible, because sometimes an invalid dependency really indicates
changed data and the index must be updated. To solve this problem, the
CAE Feeder supports fragment keys, which can be
used to revalidate an unchanged result of a computation after some of its dependencies
became invalid. Revalidation means that the CAE Feeder recognizes
that an invalidation of a dependency does not change the result so that expensive reindexing
can be skipped.
Caution | |
Revalidating fragment keys should be used when it's possible to encapsulate a fragment that is used for the computation of many feedables, and if dependencies get invalidated without changing the feedable's data. You should not use fragment keys, if each fragment is used in just one feedable instance. The overhead of maintaining a lot of fragment keys in the CAE Feeder can be much higher than reindexing a few content beans. The number of fragment keys should be lower than the number of indexed content beans, for which the fragment keys are used. |
This section continues with an example how to use revalidating fragments to avoid unnecessary reindexing.
Example: Using Revalidating Fragments for the Repository Path
In the following example, users should be able to search for articles below a given
repository path. Therefore, the CAE Feeder is configured
to feed the repository path into the field folderpath
. The path is indexed as
path of numeric IDs. For example for a document that resides in folder /foo/bar
the value
will be indexed if foo's ID is 41 and bar's ID is 43. /1
represents the root folder here. The advantage of this approach is that folders can be
renamed without the need to reindex documents. To find all articles below the folder
, the search application can simply use foo's ID in a query.
The CAE Feeder is configured to index the folder path for content beans of type Article by setting the following property:
and customizing the bean caeFeederBeanPropertiesByClass
<customize:append id="caeFeederBeanPropertiesByClassCustomizer" bean="caeFeederBeanPropertiesByClass"> <map> <entry key="com.customer.example.beans.Article" value="folderpath"/> </map> </customize:append>
Without fragment keys the implementation of the Article's bean property might look like:
public String getFolderPath() { Content content = getContent().getParent(); StringBuilder sb = new StringBuilder(); while (content != null) { sb.insert(0, "/" + IdHelper.parseContentId(content.getId())); content = content.getParent(); } return sb.toString(); }
Content#getParent creates a dependency on the place of the content, which is invalidated if either the name or the parent of the content changes. If the name of a parent folder changes, the article will be reindexed, even though the indexed value has not changed. You can avoid this by using revalidating fragments. Using revalidating fragments in this example consists of the following steps:
Implement a fragment key that encapsulates the part of the computation that can be revalidated when collecting data for the feedable.
Implement a fragment key factory that returns a fragment key from a serialized version of the key.
Register your factory in the Spring context.
Inject the factory into the content bean and use the factory to get the fragment key's value.
Configure the capacity of the internally used cache.
Implementing a Fragment Key
First, implement a fragment key class that extends
RevalidatingFragmentPersistentCacheKey. This key encapsulates the computation
of the repository path in its evaluate()
method. The computed path constitutes a fragment of the
overall computation of the feedable's data. The implementation uses the
Persistent Cache, which is an internal component of the
CAE Feeder, to recursively get the fragment value for the parent folder.
package com.customer.example; import com.coremedia.cap.content.*; import com.coremedia.cap.common.IdHelper; import com.coremedia.cap.persistentcache.*; import java.io.UnsupportedEncodingException; public class IdPathKey extends RevalidatingFragmentPersistentCacheKey<String> { static final String PREFIX = "idpath:"; private final PersistentCache2 persistentCache; private final ContentRepository contentRepository; private final String contentId; public IdPathKey(PersistentCache2 persistentCache, ContentRepository contentRepository, String contentId) { this.persistentCache = persistentCache; this.contentRepository = contentRepository; this.contentId = contentId; } @Override public String getSerialized() { return PREFIX + contentId; } @Override public String evaluate() throws Exception { Content content = contentRepository.getContent(contentId); if (content==null) { String s = getSerialized(); throw new InvalidPersistentCacheKeyException(s); } return getPath(content.getParent()) + '/' + IdHelper.parseContentId(contentId); } private String getPath(Content content) { if (content == null) { return ""; } IdPathKey key = new IdPathKey(persistentCache, contentRepository, content.getId()); return (String)persistentCache.getCached(key); } @Override public byte[] getBytesForHashing(String value) { try { return String.valueOf(value).getBytes("UTF-8"); } catch (UnsupportedEncodingException e) { throw new RuntimeException("UTF-8 not supported", e); } }
Example 5.7. Example of a fragment key implementation
To implement a fragment key, the methods getSerialized()
, evaluate()
and getBytesForHashing(String)
are implemented. In the following, the methods are described in general.
Method evaluate()
computes the fragment value. It does not take any parameters that
specify the source data for the computation. Such parameters are part of the key's
identity and are passed to its constructor. In the example, the contentId
is such a key parameter.
Method calls on
objects in the implementation of evaluate()
implicitly trigger all relevant
dependencies. These content dependencies are automatically invalidated after corresponding content changes.
There may be situations where you want to avoid content dependencies. To this end, you can use the following pattern to disable dependency tracking for a code block by calling static methods of class com.coremedia.cache.Cache:
Cache.disableDependencies(); try { // dependencies are disabled for this code block ... } finally { Cache.enableDependencies(); }
Additional dependencies may be triggered explicitly by calling the following static methods from inside the
com.coremedia.cache.Cache#cacheFor(long millis): Triggers a relative time dependency making the value become invalid when the time is reached.
com.coremedia.cache.Cache#cacheUntil(Date date): Triggers an absolute time dependency again making the value become invalid when the time is reached.
com.coremedia.cache.Cache#dependencyOn(Object dependent): Triggers an explicit dependency on a certain object. The CAE Feeder only supports dependencies on
values. Dependencies of other types are ignored.Custom dependencies on
values can be invalidated programmatically by invoking methodinvalidate(Object)
of class com.coremedia.cap.persistentcache.dependencycache.PersistentDependencyCacheManagement on the Spring beanpersistentDependencyCacheManager
. Alternatively, you can invalidate a String dependency with the JMX operationinvalidateSerialized(String)
of thePersistentDependencyCache
MBean. The parameter of this JMX operation is the String dependency itself, prefixed with"string:"
(i.e."string:" + value
Method getSerialized()
returns the key's serialized form as
as it is stored in the database of the CAE Feeder.
The returned string contains all parameters
that are needed to reconstruct the fragment key instance. It is good practice to use different prefixes for
different types of fragment keys. In the example, the prefix "idpath:"
and the Content ID are
used to create serialized keys such as idpath:coremedia:///cap/content/41
Keep in mind, that the serialized key is stored in the database when making the dependencies persistent. Thus, using short keys will result in less disk space usage.
getBytesForHashing(String value)
Method getBytesForHashing(String)
returns a byte representation for a computed value.
The CAE Feeder computes a hash from these bytes and stores it in its database.
The hash is used to detect if a fragment value has changed after it was recomputed.
The CAE Feeder avoids reindexing if nothing has changed.
Implementing a Factory for Fragment Keys
Next, you need a
PersistentCacheKeyFactory, which is used to create
fragment key instances based on the keys' serialized representations. Its method
is the inverse function for the fragment key's method getSerializedKey()
In an environment where several types of fragment keys and therefore
several PersistentCacheKeyFactory
instances are used, a mechanism for selecting the
right factory needs to be provided. As a convention, a PersistentCacheKeyFactory
may answer null
to signal that it is not responsible for a given serialized key. The
CAE Feeder sequentially asks all known PersistentCacheKeyFactories
until a factory returns a non null result.
In case that the PersistentCacheKeyFactory
is asked to reconstruct a key
whose resources are no longer available, it nevertheless must return a fragment key. This returned key should
throw an
when its evaluate()
method is called. You may use the static method
InvalidPersistentCacheKeyException.wrap(String serializedKey)
for creating such an instance.
In the example, the PersistentCacheKeyFactory
just creates an instance of IdPathKey
with the Content ID extracted from the serialized key. It returns null
if the serialized key does
not start with the correct prefix:
package com.customer.example; import com.coremedia.cap.content.*; import com.coremedia.cap.persistentcache.*; public class IdPathKeyFactory implements PersistentCacheKeyFactory { private PersistentCache2 persistentCache; private ContentRepository contentRepository; public void setPersistentCache(PersistentCache2 pc) { this.persistentCache = pc; } public void setContentRepository(ContentRepository cr) { this.contentRepository = cr; } public PersistentCacheKey createKey(String serializedKey) { if (serializedKey.startsWith(IdPathKey.PREFIX)) { int l = IdPathKey.PREFIX.length(); String contentId = serializedKey.substring(l); return keyForContent(contentId); } return null; } private PersistentCacheKey keyForContent(String contentId) { return new IdPathKey(persistentCache, contentRepository, contentId); } public String get(Content content) { String contentId = content.getId(); PersistentCacheKey key = keyForContent(contentId); return (String)persistentCache.getCached(key); } }
Example 5.8. Example of a PersistenCacheKeyFactory implementation
The PersistentCacheKeyFactory
for creating fragment keys must be defined in the
Spring application context and registered as a fragment key factory. Note, that the key factory is initialized with the
bean for the persistentCache
It's important to always use the persistentDependencyCache
bean to get fragment
<bean id="idPathKeyFactory" class="com.coremedia.amaro.feeder.beans.IdPathKeyFactory"> <property name="persistentCache" ref="persistentDependencyCache"/> <property name="contentRepository" ref="contentRepository"/> </bean> <customize:append id="idPathKeyFactoryCustomizer" bean="fragmentPersistentCacheKeyFactory" property="keyFactories"> <list> <ref local="idPathKeyFactory"/> </list> </customize:append>
Example 5.9. Define and register the factory in the Spring context
Using the Fragment Key Value in a Content Bean
The IdPathKeyFactory
example class contains the convenience method get(Content)
, which
can be used in the content bean implementation to get the path for a Content:
package com.customer.example.beans; public class ArticleImpl extends ArticleBase implements Article { private IdPathKeyFactory factory; public void setIdPathKeyFactory(IdPathKeyFactory factory) { this.factory = factory; } public String getFolderPath() { Content parent = getContent().getParent(); if (parent == null) { return ""; } return factory.get(parent); } }
Example 5.10. Using the fragment key in the content bean
The content bean definition for the article bean must be configured with the key factory:
<bean name="contentBeanFactory:Article" class="com.customer.example.beans.ArticleImpl" scope="prototype" parent="abstractContentBean"> <property name="idPathKeyFactory" ref="idPathKeyFactory"/> </bean>
Example 5.11. Configure content bean with factory
This example's content bean implementation depends directly on the PersistentCacheKeyFactory and can only be used in the CAE Feeder. If you want to use the same implementation in the CAE web application, you should extract the logic to compute the path into a strategy interface.
Getting the Fragment Key Value from the Persistent Cache
and IdPathKey#getPath(Content)
method getCached
to retrieve a fragment value. This method uses in-memory
CacheKeys to cache
fragment values. Cached lookup improves performance if lots of keys access the fragment's value. It does not
only avoid the repeated computation of the fragment but it also avoids database queries to check whether newly
computed values have changed since the last computation.
In-memory cache keys created by the method getCached
have the default cache class
and a default cache weight equal to one. You must configure
a reasonable cache capacity for that cache class, for example:
<bean id="objectClassCacheCapacityConfigurer" class="com.coremedia.cache.CacheCapacityConfigurer" init-method="init"> <property name="cache" ref="cache"/> <property name="capacities"> <map> <entry key="java.lang.Object" value="10000"/> </map> </property> </bean>
If you forget to configure the cache capacity, the value is not cached and the cache will log warnings about
an unreasonable cache size. If you want to use a different cache class or weight, you can still create an
in-memory CacheKey
yourself which then calls PersistentCache#get(PersistentCacheKey)
in its evaluate
Be careful to not introduce cycles when calling
from another fragment key's evaluate
method. Simple cycles on the same thread will result in an
, for example if
which in turn gets
again. But code might still hang if
multiple threads are involved, for example if one thread gets
which gets
while another thread gets
which gets