close

Filter

loading table of contents...

Solution Overview for Business Users / Version 2506.0

Table Of Contents

7.23 Sitemap

Requirements

If you run a public website, you want to get listed by search engines and therefore give web crawlers hints about the pages they should crawl. http://www.sitemaps.org/ declares an XML format for such sitemaps which is supported by many search engines, especially from Google and Microsoft.

"Sitemap" in terms of http://www.sitemaps.org/ is not to be mistaken with a human readable sitemap which visualizes the structure of a website . It is rather a complete index of all pages of a site.

The size of a sitemap is limited to 50,000 URLs. Larger sites must be split into several sitemap files and a sitemap index file which aggregates the sitemap files.

Solution

A sitemap consists of multiple entities (the index and the sitemap files) and has dependencies on almost the whole repository. If a new content is created, which "coincidentally" occurs in the first sitemap file, the entries of all subsequent sitemap files are shifted.

In border cases even the number of sitemap files may change, which affects the sitemap index file. So you cannot generate single sitemap entities on crawler demand, asynchronously and independent of each other, but you must generate a complete sitemap which represents a snapshot of the repository. Moreover, the exhaustive dependencies make sitemaps practically uncacheable, and the generation is expensive. For these reasons Blueprint does not render sitemaps on demand but pregenerates them periodically. So you must distinguish between sitemap generation and sitemap service. Both are handled by the live web application, though.

CoreMedia Blueprint features separated sitemaps for each site. Sitemap generation depends on some site specific configuration, like the content types to include or paths to exclude, amongst others. This configuration is specified by SitemapSetup Spring beans.

Search Results

Table Of Contents
warning

Your Internet Explorer is no longer supported.

Please use Mozilla Firefox, Google Chrome, or Microsoft Edge.