Working with the Ingest Service

Last updated 2 months ago

Learn how to work with the Ingest Service

What you'll learn

How to import content via the Ingest Service

Prerequisites

Access to _CoreMedia CloudManager_ (https://{client}.coremedia.cloud)
One or more "Studio Sandboxes" (https://first.sandbox.{client}.coremedia.cloud)
One or more higher environments (UAT, pre-prod or prod)

Time matters

Reading time: 10 to 15 minutes

Should I read this?

This guide is for Developers, Administrators.

Table of Contents

Overview
Getting the Ingest Service
Performing a Basic Content Import
Handling Binary Objects
Bulk Repository Operations
Increasing Throughput
OpenAPI UI for Testing and Limitations

Overview

The CoreMedia Ingest Service is a cloud only service enabling you to develop your own importer-like applications for the CoreMedia Content Cloud.

It offers an API with basic methods to create and modify content items. In addition, the content lifecycle is supported with bulk publication, unpublication and deletion methods (see Ingest Service API Reference).

This how-to document introduces you to the following tasks:

Learn how to get the Ingest Service
Learn the basic steps to import content, see Performing a Basic Content Import.
Learn about different ways to import blob data, see Handling Binary Objects.
Learn how to use asynchronous calls for bulk operations such as publication or deletion of content, see Bulk Repository Operations.
Learn how to use concurrent calls to improve content creation speed and how to size content sets for bulk operations, see Increasing Throughput.
Learn about the OpenAPI UI for testing and its limitation, see OpenAPI UI for Testing and Limitations.

The CoreMedia Ingest Service is located behind a gateway which limits the maximum response time of an API call to 30 seconds by default. If this limit is exceeded, clients will receive a Service Unavailable (503) response from the gateway even though the request may successfully finish on the service side.

There are two types of API calls that are prone to request timeouts, running synchronous bulk operations (see Bulk Operations and using UrlBlobProperty models for very large (in the range of some hundred megabytes) binary objects (see Handling Binary Objects).

Getting the Ingest Service

Contact the CoreMedia support at support@coremediaoncloud.com to get the Ingest Service activated for your account.

Performing a Basic Content Import

Figure Import Sequence shows a schematic sequence of API calls for importing a single content to CoreMedia (handling of binary data left out, see Handling Binary Objects).

Figure 1. Import Sequence

Additional checks, like verifying the content type before trying to update a content, could be added.
If the target location is guaranteed to be free of already existing content, which is the case when importing to a new subfolder structure for the first time, the initial check for content at the target path can be omitted.

Handling Binary Objects

There are currently three different ways to import CoreMedia content items with blob properties:

createTempBlob: The recommended way for productive use.
UrlBlobProperty: For an easier client, but not recommended for productive use.
postBlobData: For an update of an existing blob property.

createTempBlob (recommended)

Upload the binary data upfront by creating a Temporary Server Side BLOB (createTempBlob).
Then use the property model from the call’s response in the following API call to create or update the content.
This is the recommended way to handle binary objects, since it avoids request timeouts and leaves the client in complete control of handling the data source.

UrlBlobProperty

Upload the binary data using a UrlBlobProperty model.

Passing only a URL for the data source, the service will have to read the binary data when creating the document.

This method has drawbacks:

Depending on the size of the binary data, the requests are prone to timeouts. Depending on the available network throughput this can take some time. For the gateway it will appear as if the service is not available, and it will terminate the request with a 503 response.
The data source is limited to HTTP(S) or S3 URLs which must be accessible from the service. This also means, the source cannot be protected by any sophisticated authentication mechanism.

postBlobData

Update a single BLOB property after a content has been created (postBlobData).

This is only recommended for existing content if and only if a single blob has to be updated. Otherwise, there would be two or more transactions involved:

First create or update the content with all non blob properties.
Afterwards update every single blob in the content in a separate transaction.

This is not only slower than the other methods, it also creates superfluous intermediate content versions with possibly invalid state.

Bulk Repository Operations

Bulk repository operations can be executed in two modes:

synchronous
asynchronous (recommended)

Using synchronous mode may at first seem easier from a client perspective, but is prone to request timeouts if either the number of content items for the bulk operation is too large, or the publisher queue is overloaded with too many requests.

Asynchronous bulk repository operations are therefore recommended.

Figure Sequence for asynchronous bulk requests shows a simplified schematic sequence for bulk operations.

Figure 2. Sequence for asynchronous bulk requests

When doing bulk operations, keep the following points in mind:

Synchronous and asynchronous bulk operations may be rejected with a response of Too Many Requests (429) if the service’s worker or queue limits are reached. When you get this response, let your client wait some time and retry.
A response code of 200 does not mean a bulk operation completed successfully (synchronous) or has finished (asynchronous). You always have to examine the state contained in the response object.
Bulk operations on folders are executed on every transitive child of the folder. So always know the number of children and avoid the operation when the number is too high.
You do not need to run a withdraw operation before a delete. Unlike in Studio a delete operation will automatically try to withdraw all content items before moving them to the recycle bin.

Increasing Throughput

In order to optimize the throughput you have to distinct between different operations:

Single-item operations like content creation and updates
Bulk operations like publication and deletion of content

Single-item operations

Content creation and updates are single item operations, which you can run concurrently in order to increase throughput.

Practical experience has shown 10-15 concurrent requests to be an upper limit after which scaling is greatly reduced. Also keep in mind not to overload the system if it is used for productive work.

Bulk operations

Bulk repository operations, that is publish, unpublish, and delete, don’t profit similarly from concurrent invocations. The first phase of a bulk operation, like setting the content approval flags, can run concurrently with other bulk operations, but the final publications are executed strictly sequential.

For bulk operations you can influence the throughput with the size of the passed-in content set. The recommended range lies between 100 and 200 content items. Larger sets do not improve the speed anymore but block the publisher for longer. They can also improve the risk of network problems for synchronous calls (which are not recommended, see Bulk operations).

OpenAPI UI for Testing and Limitations

On cloud environments with the Ingest Service available, logged in administrators and developers can access the OpenAPI UI to check out the online documentation and try out the API.

For the following API calls, which use direct data streaming, the Try Out mode will not work:

Create a temporary BLOB (createTempBlob)
Update data of a content’s BLOB property (postBlobData)

Is this page useful?

campaigns

contact center and analytics studio

engagement cloud

event hub service

image transformation

ingest service