CKEditor 5 Data-Processing By Example - CMCC 12

Last updated 3 months ago

You will be guided through the process of creating data-processing rules required to store data from CKEditor 5 as CoreMedia Rich Text 1.0 in CoreMedia CMS. At the end, you will find the complete rule, that you may copy, paste, and adapt to your needs.

What you'll learn

Data-proceeding rules for CKEditor 5

Prerequisites

CoreMedia Blueprint, Studio development experience

Time matters

Should I read this?

This guide is for Developers.

Table of Contents

The Use Case
Bijective Mapping Design
Analyzing Existing Mapping
Save Figure State
The Complete Rule
Applying the Rule
Conclusion
Compatibility Information

The magic bridge between HTML edited in CKEditor 5 (represented in the so-called data-view) and CoreMedia Rich Text 1.0 (called data within the CKEditor 5 architecture) is a mechanism called "data-processing". It is responsible for providing an (in general) bijective mapping from HTML to Rich Text and vice versa.

First insights for a rather simple (but typical) replacement is described in our manual:

2401 Studio Developer Manual / 10.3.11 Customizing ckeditorDefault.ts By Example

In the section about data-processing, you will see how to represent an HTML element unknown to CoreMedia Rich Text 1.0 as a supported element with some "marker class" applied (thus, <mark> represented as <span class="mark"> within Rich Text).

In this article, we will show a somewhat more complex use-case, that also helps diving a little more into the magic of data-processing.

The Use Case

We have a pre-fabricated plugin, we want to integrate, that applies styling to tables by adding classes to the <figure> element, that surrounds tables by default in CKEditor 5 in its view layers:

<figure class="table align-left">
<table/>
</figure>

The "table" class comes from the default view representation within CKEditor 5. The "align-left" class is added by our plugin.

Now, we need to ensure a bijective mapping that persists this class in the data layer.

Bijective Mapping Design

Prior to thinking about the implementation, we have to identify a good representation in the data layer (CoreMedia Rich Text 1.0), that:

to data: Is able to hold the relevant information from the data view layer.
to (data) view: Enables us to restore the view representation later on when transforming incoming data to HTML again (similar as we may want to do in rendering within the CAE).

In this case, we design the mapping to CoreMedia Rich Text 1.0 as follows (toData-processing):

Take any class from the <figure> element and apply it to the nested <table> element, that is supported by CoreMedia Rich Text 1.0.
Prefix these classes with a marker, thus "align-left" will become "figure-align-left" to know, that we need to reapply this class later on to the <figure> element again.

Thus, for the example above we decide to map it to:

<table class="figure-table figure-align-left"/>

Keep "figure-table"?

The "figure-table" class may be omitted here, as it is not required at least in CKEditor 5 context later on to restore the data view layer from incoming data (we just always assume to add a <figure> around any incoming table). Still, it may make our implementation more straightforward, and we may benefit from this extra information in other contexts like the CAE.

Analyzing Existing Mapping

CoreMedia CKEditor 5 Plugins ship with a default mapping of HTML to CoreMedia Rich Text 1.0. Sometimes, like here, it may be important to understand the default mapping, to know where to intervene.

This can be achieved by starting any application (like CoreMedia Studio) with a hash parameter named #ckdebug as described in our manual: 2401 Studio Developer Manual / 10.4 Debugging CKEditor 5.

It will tell us, for any table created in CKEditor 5, that the surrounding <figure> element is removed in data-processing. There is an extra rule, that does this instead of leaving it to some later sanitation process, that prevents any invalid XML to be tried to be stored to the server. This again is best practice, as we should take care of any incoming elements from the data view layer and make a conscious decision on what to do with them.

In the default to-data-view mapping, we do not adapt the incoming <table> element by default by wrapping it with a <figure>. We just rely on CKEditor 5 doing it for us in later stages (thus, in corresponding up- and downcast-handlers).

Save Figure State

Processing Stages and Priorities

Now it is important, that we intervene in the existing removal of the <figure> element, to first store data, that are relevant to us, at the <table> element. We have several ways to control the "before" step. We can increase the priority of a rule, or we can use some earlier processing stage.

The processing stages are (for to-data as well as for to-data-view mapping):

prepare
imported
importedWithChildren
appended

More details on these stages are available in the documentation of the CoreMedia CKEditor 5 Plugins (for package @coremedia/ckeditor5-coremedia-richtext).

The second order applied is the priorities for each rule, which are:

highest
high
normal (the default for rules, we ship in CoreMedia CKEditor 5 Plugins)
low
lowest

Combined, stages and priorities work in this order:

prepare
1. highest
2. high
3. normal
4. low
5. lowest
imported
1. highest
2. …
…

The default rule that is responsible for removing the <figure> element operates in stage imported with normal priority in to-data-processing, while no corresponding rule exists in to-data-view-processing.

Thus, we may apply our adaptations, the propagation of classes from <figure> to <table> either with a higher priority in imported stage or use the prepare stage.

toData Processing

If to start the design with the toData or toView processing is a question of personal taste. As HTML provides the richer API, the more challenging part is often to map this to the restricted CoreMedia Rich Text 1.0 structure, which is why we prefer to first start with the toData processing design – just keeping in mind, that we need to generate some data that allow bijective mapping (thus, to restore the view layer later on).

Given the options above, to intervene prior to the removal of the <figure> element, we decide to apply our changes within the prepare stage for several reasons:

We have both required elements (<figure> and <table>) available.
We do not modify the hierarchy, thus, there is no danger of colliding with the processing order.
We operate on the HTML DOM API rather than the limited XML Element API, that, for example, does not provide access to the classList property.

A typical rule starts validating if the rule is applicable. We decide that the following guard is enough for our intention:

(node: Node): void => {
  if (!(node instanceof HTMLElement) ||
      node.localName !== "figure" ||
      !node.classList.contains("table")) {
    return;
  }
  // ...
}

Thus, we have an HTMLElement, it is a figure (unfortunately, not represented by an extra class) and it has the marker class "table" applied.

Next, we transform the DOMTokenList returned by classList property to a string array and apply our prefix:

const tableFigureClassNames = [...node.classList]
  .map((className) => `figure-${classname}`);

As stated above, we could skip the "table" class, as we do not really need it. But this approach is more straightforward and keeping this information does no harm to our subsequent processing.

We may also decide to remove these now applied classes from the <figure> element, to signal some "already processed" state. But as we remove the element later on anyway, we ignore that option.

Next, we step through all direct child elements and if it is a table, we apply the additional classes:

[...node.children].forEach((figureChild) => {
  if (figureChild instanceof HTMLTableElement) {
    figureChild.classList.add(...tableFigureClassNames);
  }
});

Again, we benefit from the HTML DOM API to robustly identify the table element. Note, that we expect only one table element in general, but the code is robust if any yet unexpected DOM structure ships with upcoming CKEditor 5 releases. Some robustness is always recommended here.

This is all we need for the toData rule. You will find the complete rule below.

toView Processing

While in the default setup we do not re-apply the <figure> element, we need to do it now, as our plugin expects the classes at the <figure> element, not the <table> element.

We decide to add the element in stage "importedWithChildren" for these reasons:

We already benefit from the rich HTML DOM API, while in the "prepare" stage we have only the XML Element API.
We require the children to be transformed already, as we are going to wrap the <table> element into the <figure>. Doing it in the earlier "imported" stage would have caused our child nodes to be added directly to the <figure> (thus, table rows, for example), instead to the <table>.
Using the alternative "appended" stage is discouraged, as the to be inserted <figure> may collide with its sibling nodes. Thus, it is better to leave the insertion of the node to the underlying framework.

First, we decide, that our processing rule also requires access to the conversion API that will help us to create new nodes for the proper document:

(node, { api }): Node => { ... }

Next, similar to the processing above, we validate, if the rule is applicable to the current node:

(node, { api }): Node => {
  if (!(node instanceof HTMLTableElement)) {
    return node;
  }
  // ...
}

Thus, we are at a <table> element, and we want to wrap it into a <figure>. As a more defensive approach, we may have also scanned, if our expected marker classes are available.

Now, we scan for the classes, that we want to re-apply to the to-be-created <figure> element:

const rawFigureClasses = [...node.classList]
  .filter((className) => className.startsWith("figure-"));

We need to remember the raw class names, as in subsequent processing we remove them from the <table> element:

removeClass(node, ...rawFigureClasses);

Instead of the standard classList.remove() API, we use a utility function provided by @coremedia/ckeditor5-dom-support: The method also ensures, that a possibly left-over empty class-attribute is eventually removed. This is not required for CKEditor 5, but is good practice when doing something similar in other rules within the toData processing, to reduce the generated XML code size. Otherwise, we would see elements having class attributes like this: class="".

Next, we remove the "figure-" prefix from the class names:

const figureClasses = rawFigureClasses
  .map((className) => className.replace(/^figure-/, ""));

If CKEditor 5 is our only client manipulating CoreMedia Rich Text, we could directly apply these classes to the to-be-created <figure> element. But in the CoreMedia CMS ecosystem, there are multiple clients that may change the rich text, including some generic Unified API client, that removes the required "table" class name. Because of this, we also re-apply the required "table" class:

figureClasses.push("table");

There is no need to remove possible duplicate entries, as the HTML DOM API will do that for us, when adding the classes via classList property.

Now, it is about time to create our <figure> DOM element. We use the conversion API that grants using the proper Document element for creating the nodes:

const figureElement = api.createElement("figure");

And we apply our propagates classes:

figureElement.classList.add(...figureClasses);

The last step, prior to returning the new element, is adding the original <table> element as a child, thus, we wrap the <table> element with our just generated <figure>:

figureElement.append(node);

The Complete Rule

And this is how our complete bijective mapping rule looks like:

const tableFigureClassSupport: RuleConfig = {
  id: "table-figure-class-support",
  toData: {
    id: "table-figure-class-support-toData",
    prepare: (node) => {
      if (!(node instanceof HTMLElement) ||
          node.localName !== "figure" ||
          !node.classList.contains("table")) {
        return;
      }
      const tableFigureClassNames = [...node.classList]
        .map((className) => `figure-${className}`);
      [...node.children].forEach((figureChild) => {
        if (figureChild instanceof HTMLTableElement) {
          figureChild.classList.add(...tableFigureClassNames);
        }
      });
    },
  },
  toView: {
    id: "table-figure-class-support-toView",
    importedWithChildren: (node, { api }): Node => {
      if (!(node instanceof HTMLTableElement)) {
        return node;
      }
      const rawFigureClasses = [...node.classList]
        .filter((className) => className.startsWith("figure-"));
      removeClass(node, ...rawFigureClasses);
      const figureClasses = rawFigureClasses
        .map((className) => className.replace(/^figure-/, ""));
      figureClasses.push("table");
      const figureElement = api.createElement("figure");
      figureElement.classList.add(...figureClasses);
      figureElement.append(node);
      return figureElement;
    },
  },
};

Applying the Rule

As the first option, we may directly adapt the configuration for the created CKEditor 5 instance within the configuration key "coremedia:richtext", similar to this:

ClassicEditor.create(document.querySelector('.editor'), {
  plugins: [ CoreMediaRichText, ... ],
  "coremedia:richtext": {
    rules: [ tableFigureClassSupport ],
  },
}

As an alternative to this, we may move our rule to a custom plugin. This may also aggregate (or provide) the plugin that is required for applying additional classes to the <figure> element.

Here is an example, how this may look like:

export class RichTextDataProcessorIntegration extends Plugin {
  static readonly pluginName: string = "RichTextDataProcessorIntegration";
  static readonly requires = [CoreMediaRichText, MyTableFigurePlugin];

  afterInit(): void {
    const { editor } = this;
    const { processor } = editor.data;

    if (isRichTextDataProcessor(processor)) {
      processor.addRule(tableFigureClassSupport);
    }
  }
}

Conclusion

While several utility-methods exist for well-paved paths in data-processing, also a lot more complex operations can be applied, like manipulating the complete DOM. It requires some careful design though, not only to provide a proper bijective mapping but also, to intervene at the required stages and priorities. For this, it is always a good idea to come back to the #ckdebug flag, which may help you understand the existing and your adapted processing better.

We hope that you found the information provided here useful and that it may assist you in creating your own data-processing rules.

Compatibility Information

The adaptations shown here have been tested with CMCC v12.2401.3, CKEditor 5 41.1.0, and CoreMedia CKEditor 5 Plugins 17.0.0. They should work for any releases of CoreMedia 5 Plugins since 11.x.

Is this page useful?