5.4.5. Using Revalidating Fragments

When computing the data for a feedable, dependencies on accessed objects are tracked and recorded by the CAE Feeder. Modifications of recorded dependencies will lead to the invalidation of the feedable. The CAE Feeder will then construct a new feedable with recomputed data and send it to the search engine. For example, a content bean will be reindexed after changing some content that was used to compute the feedable for that content bean.

In some cases, however, the invalidation of a dependency does not necessarily lead to a different value for feeding and the overhead of reindexing could be avoided for better performance.

For example, an indexed bean property gets its data from a document with global settings. Such a document may contain lots of different settings in different properties or in a single struct property. Imagine, that a single setting S1 from this document is accessed during the construction of each indexed feedable. Because of this, each indexed bean will depend on the properties of the settings document. Now, if somebody changes the document, for example by changing setting S2, all indexed beans will be invalidated and reindexed. This can take some time. And the data did not even change.

Of course, you want to avoid such situations. One possibility is to disable such expensive dependencies by wrapping the code that creates them with the methods disableDependencies() and enableDependencies() of the class com.coremedia.cache.Cache. But often this is not possible, because sometimes an invalid dependency really indicates changed data and the index must be updated. To solve this problem, the CAE Feeder supports fragment keys, which can be used to revalidate an unchanged result of a computation after some of its dependencies became invalid. Revalidation means that the CAE Feeder recognizes that an invalidation of a dependency does not change the result so that expensive reindexing can be skipped.

[Caution]Caution

Revalidating fragment keys should be used when it's possible to encapsulate a fragment that is used for the computation of many feedables, and if dependencies get invalidated without changing the feedable's data.

You should not use fragment keys, if each fragment is used in just one feedable instance. The overhead of maintaining a lot of fragment keys in the CAE Feeder can be much higher than reindexing a few content beans. The number of fragment keys should be lower than the number of indexed content beans, for which the fragment keys are used.

This section continues with an example how to use revalidating fragments to avoid unnecessary reindexing.

Example: Using Revalidating Fragments for the Repository Path

In the following example, users should be able to search for articles below a given repository path. Therefore, the CAE Feeder is configured to feed the repository path into the field folderpath. The path is indexed as path of numeric IDs. For example for a document that resides in folder /foo/bar the value /1/41/43/ will be indexed if foo's ID is 41 and bar's ID is 43. /1 represents the root folder here. The advantage of this approach is that folders can be renamed without the need to reindex documents. To find all articles below the folder /foo, the search application can simply use foo's ID in a query.

The CAE Feeder is configured to index the folder path for content beans of type Article by setting the following property:

feeder.contentSelector.contentTypes=Article

and customizing the bean caeFeederBeanPropertiesByClass:

<customize:append id="caeFeederBeanPropertiesByClassCustomizer"
                  bean="caeFeederBeanPropertiesByClass">
  <map>
    <entry key="com.customer.example.beans.Article"
           value="folderpath"/>
  </map>
</customize:append>

Without fragment keys the implementation of the Article's bean property might look like:

public String getFolderPath() {
  Content content =  getContent().getParent();
  StringBuilder sb = new StringBuilder();
  while (content != null) {
    sb.insert(0, "/" + IdHelper.parseContentId(content.getId()));
    content = content.getParent();
  }
  return sb.toString();
}

Content#getParent creates a dependency on the place of the content, which is invalidated if either the name or the parent of the content changes. If the name of a parent folder changes, the article will be reindexed, even though the indexed value has not changed. You can avoid this by using revalidating fragments. Using revalidating fragments in this example consists of the following steps:

  1. Implement a fragment key that encapsulates the part of the computation that can be revalidated when collecting data for the feedable.

  2. Implement a fragment key factory that returns a fragment key from a serialized version of the key.

  3. Register your factory in the Spring context.

  4. Inject the factory into the content bean and use the factory to get the fragment key's value.

  5. Configure the capacity of the internally used cache.

Implementing a Fragment Key

First, implement a fragment key class that extends RevalidatingFragmentPersistentCacheKey. This key encapsulates the computation of the repository path in its evaluate() method. The computed path constitutes a fragment of the overall computation of the feedable's data. The implementation uses the Persistent Cache, which is an internal component of the CAE Feeder, to recursively get the fragment value for the parent folder.

package com.customer.example;
import com.coremedia.cap.content.*;
import com.coremedia.cap.common.IdHelper;
import com.coremedia.cap.persistentcache.*;
import java.io.UnsupportedEncodingException;

public class IdPathKey 
       extends RevalidatingFragmentPersistentCacheKey<String> {

  static final String PREFIX = "idpath:";
  private final PersistentCache2 persistentCache;
  private final ContentRepository contentRepository;
  private final String contentId;

  public IdPathKey(PersistentCache2 persistentCache,
                   ContentRepository contentRepository,
                   String contentId) {
    this.persistentCache = persistentCache;
    this.contentRepository = contentRepository;
    this.contentId = contentId;
  }

  @Override
  public String getSerialized() {
    return PREFIX + contentId;
  }

  @Override
  public String evaluate() throws Exception {
    Content content = contentRepository.getContent(contentId);
    if (content==null) {
      String s = getSerialized();
      throw new InvalidPersistentCacheKeyException(s);
    }
    return getPath(content.getParent()) + '/' + IdHelper.parseContentId(contentId);
  }

  private String getPath(Content content) {
    if (content == null) {
      return "";
    }
    IdPathKey key = new IdPathKey(persistentCache, contentRepository, content.getId());
    return (String)persistentCache.getCached(key);
  }

  @Override
  public byte[] getBytesForHashing(String value) {
    try {
      return String.valueOf(value).getBytes("UTF-8");
    } catch (UnsupportedEncodingException e) {
      throw new RuntimeException("UTF-8 not supported", e);
    }
}

Example 5.7. Example of a fragment key implementation


To implement a fragment key, the methods getSerialized(), evaluate() and getBytesForHashing(String) are implemented. In the following, the methods are described in general.

evaluate()

Method evaluate() computes the fragment value. It does not take any parameters that specify the source data for the computation. Such parameters are part of the key's identity and are passed to its constructor. In the example, the contentId is such a key parameter.

Method calls on com.coremedia.cap.content.Content objects in the implementation of evaluate() implicitly trigger all relevant dependencies. These content dependencies are automatically invalidated after corresponding content changes.

There may be situations where you want to avoid content dependencies. To this end, you can use the following pattern to disable dependency tracking for a code block by calling static methods of class com.coremedia.cache.Cache:

Cache.disableDependencies();
try {
  // dependencies are disabled for this code block
  ...
} finally {
  Cache.enableDependencies();
} 

Additional dependencies may be triggered explicitly by calling the following static methods from inside the evaluate() method:

getSerialized()

Method getSerialized() returns the key's serialized form as java.lang.String as it is stored in the database of the CAE Feeder. The returned string contains all parameters that are needed to reconstruct the fragment key instance. It is good practice to use different prefixes for different types of fragment keys. In the example, the prefix "idpath:" and the Content ID are used to create serialized keys such as idpath:coremedia:///cap/content/41.

Keep in mind, that the serialized key is stored in the database when making the dependencies persistent. Thus, using short keys will result in less disk space usage.

getBytesForHashing(String value)

Method getBytesForHashing(String) returns a byte representation for a computed value. The CAE Feeder computes a hash from these bytes and stores it in its database. The hash is used to detect if a fragment value has changed after it was recomputed. The CAE Feeder avoids reindexing if nothing has changed.

Implementing a Factory for Fragment Keys

Next, you need a PersistentCacheKeyFactory, which is used to create fragment key instances based on the keys' serialized representations. Its method createKey(String) is the inverse function for the fragment key's method getSerializedKey().

In an environment where several types of fragment keys and therefore several PersistentCacheKeyFactory instances are used, a mechanism for selecting the right factory needs to be provided. As a convention, a PersistentCacheKeyFactory may answer null to signal that it is not responsible for a given serialized key. The CAE Feeder sequentially asks all known PersistentCacheKeyFactories until a factory returns a non null result.

In case that the PersistentCacheKeyFactory is asked to reconstruct a key whose resources are no longer available, it nevertheless must return a fragment key. This returned key should throw an com.coremedia.cap.persistentcache.InvalidPersistentCacheKeyException when its evaluate() method is called. You may use the static method InvalidPersistentCacheKeyException.wrap(String serializedKey) for creating such an instance.

In the example, the PersistentCacheKeyFactory just creates an instance of IdPathKey with the Content ID extracted from the serialized key. It returns null if the serialized key does not start with the correct prefix:

package com.customer.example;
import com.coremedia.cap.content.*;
import com.coremedia.cap.persistentcache.*;

public class IdPathKeyFactory 
      implements PersistentCacheKeyFactory {
  private PersistentCache2 persistentCache;
  private ContentRepository contentRepository;

  public void setPersistentCache(PersistentCache2 pc) {
    this.persistentCache = pc;
  }

  public void setContentRepository(ContentRepository cr) {
    this.contentRepository = cr;
  }

  public PersistentCacheKey createKey(String serializedKey) {
    if (serializedKey.startsWith(IdPathKey.PREFIX)) {
      int l = IdPathKey.PREFIX.length();
      String contentId = serializedKey.substring(l);
      return keyForContent(contentId);
    }
    return null;
  }

  private PersistentCacheKey keyForContent(String contentId) {
    return new IdPathKey(persistentCache, contentRepository,
                         contentId);
  }

  public String get(Content content) {
    String contentId = content.getId();
    PersistentCacheKey key = keyForContent(contentId);
    return (String)persistentCache.getCached(key);
  }
}

Example 5.8. Example of a PersistenCacheKeyFactory implementation


The PersistentCacheKeyFactory for creating fragment keys must be defined in the Spring application context and registered as a fragment key factory. Note, that the key factory is initialized with the persistentDependencyCache bean for the persistentCache property. It's important to always use the persistentDependencyCache bean to get fragment keys.

<bean id="idPathKeyFactory"
      class="com.coremedia.amaro.feeder.beans.IdPathKeyFactory">
  <property name="persistentCache" 
            ref="persistentDependencyCache"/>
  <property name="contentRepository" 
            ref="contentRepository"/>
</bean>

<customize:append id="idPathKeyFactoryCustomizer"
                  bean="fragmentPersistentCacheKeyFactory"
                  property="keyFactories">
  <list>
    <ref local="idPathKeyFactory"/>
  </list>
</customize:append>

Example 5.9. Define and register the factory in the Spring context


Using the Fragment Key Value in a Content Bean

The IdPathKeyFactory example class contains the convenience method get(Content), which can be used in the content bean implementation to get the path for a Content:

package com.customer.example.beans;

public class ArticleImpl extends ArticleBase implements Article {
  private IdPathKeyFactory factory;

  public void setIdPathKeyFactory(IdPathKeyFactory factory) {
    this.factory = factory;
  }

  public String getFolderPath() {
    Content parent = getContent().getParent();
    if (parent == null) {
      return "";
    }
    return factory.get(parent);
  }
}

Example 5.10. Using the fragment key in the content bean


The content bean definition for the article bean must be configured with the key factory:

<bean name="contentBeanFactory:Article"
      class="com.customer.example.beans.ArticleImpl"
      scope="prototype" parent="abstractContentBean">
  <property name="idPathKeyFactory" ref="idPathKeyFactory"/>
</bean>

Example 5.11. Configure content bean with factory


This example's content bean implementation depends directly on the PersistentCacheKeyFactory and can only be used in the CAE Feeder. If you want to use the same implementation in the CAE web application, you should extract the logic to compute the path into a strategy interface.

Getting the Fragment Key Value from the Persistent Cache

IdPathKeyFactory#get(Content) and IdPathKey#getPath(Content) use method getCached of com.coremedia.cap.persistentcache.PersistentCache2 to retrieve a fragment value. This method uses in-memory CacheKeys to cache fragment values. Cached lookup improves performance if lots of keys access the fragment's value. It does not only avoid the repeated computation of the fragment but it also avoids database queries to check whether newly computed values have changed since the last computation.

In-memory cache keys created by the method getCached have the default cache class java.lang.Object and a default cache weight equal to one. You must configure a reasonable cache capacity for that cache class, for example:

<bean id="objectClassCacheCapacityConfigurer"
      class="com.coremedia.cache.CacheCapacityConfigurer"
      init-method="init">
  <property name="cache" ref="cache"/>
  <property name="capacities">
    <map>
      <entry key="java.lang.Object" value="10000"/>
    </map>
  </property>
</bean>

If you forget to configure the cache capacity, the value is not cached and the cache will log warnings about an unreasonable cache size. If you want to use a different cache class or weight, you can still create an in-memory CacheKey yourself which then calls PersistentCache#get(PersistentCacheKey) in its evaluate method.

Be careful to not introduce cycles when calling PersistentCache#get or PersistentCache2#getCached from another fragment key's evaluate method. Simple cycles on the same thread will result in an IllegalStateException, for example if key:1 gets key:2 which in turn gets key:1 again. But code might still hang if multiple threads are involved, for example if one thread gets key:1 which gets key:2 while another thread gets key:2 which gets key:1.