3.2. Solr Home Directory

In addition to the actual web application, Solr uses a special directory called Solr Home for configuration files, additional libraries and index files. It is configured either via JVM system property solr.solr.home or via JNDI lookup of java:comp/env/solr/home and needs to be writable by the Solr process. It has the following general structure:

<solr-home>/
    solr.xml
    configsets/
        <configset1>/
            conf/
                schema.xml
                solrconfig.xml
                ...
        <configset2>/
        ...
    cores/
        <core1>/
            core.properties
            data/
                index/
                    <index files>
                tlog/
                    <transaction log files>
        <core2>/
        ...
    lib/
        <additional jar files>

The Solr server manages multiple indices with possibly different configurations. Each of these indices is stored as a Lucene index on disk. An index managed by a Solr server is called a Solr Core (or shortly a core) in Solr terminology.

solr.xml

The file solr.xml is the central Solr configuration file. It contains only few settings, which you do not need to change. Most of Solr's configuration is placed in other configuration files. It however enables core discovery mode for Solr, which means that available Solr Cores are automatically discovered. In earlier versions of Solr available cores were listed explicitly in the file solr.xml. This legacy mode is not used in the CoreMedia CMS.

You can find more information about the solr.xml file in the Solr Reference Guide at https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml.

Config Sets

Index-specific configuration files are organized as named config sets, which are subdirectories of the configsets directory. A config set defines an index schema with index fields and types in conf/schema.xml and lots of configuration options for indexing, searching and additional features in conf/solrconfig.xml. The latter file for example contains search request handler definitions with default settings such as the default index field to search in.

The CoreMedia Search Engine comes with three config sets content for Content Feeder indices, cae for CAE Feeder indices and elastic for Elastic Social indices. They configure different index fields and Solr features such as search request handlers as required. Projects may customize these files or create additional config sets according to their needs. Note that some index fields are required for operation. See the comments in the configuration files for details.

Cores

The cores directory contains the actual Solr Cores, which are the indices used by your applications. Solr automatically discovers cores by looking for core.properties files below the Solr Home directory. Each directory with a core.properties file represents a Solr Core. The CoreMedia Search Engine comes with three predefined cores:

  • studio: an index of CoreMedia documents used for searching in Studio and Site Manager, which gets its data from the Content Feeder.

  • preview: an index of CoreMedia content beans used for searching in the Content Application Engine of the Content Management Environment (aka preview), which gets its data from the CAE Preview Feeder.

  • live: an index of CoreMedia content beans used for searching in the Content Application Engine of the Content Delivery Environment (aka live), which gets its data from the CAE Live Feeder.

The file core.properties contains Solr core configuration properties, most importantly the name of the used config set with the configSet property. The predefined core studio uses the content config set, the predefined cores preview and live use the cae config set.

Elastic Social applications create Solr Cores for users and comments automatically when they are started the first time. With CoreMedia Blueprint and tenant media, you will see additional directories blueprint_media_comments and blueprint_media_users for these cores below <solr-home>/cores. These Solr cores use the elastic config set, if not configured otherwise with Elastic Social configuration property elastic.solr.indexConfig.

[Note]Note

Earlier version of CoreMedia CMS used a single shared index for Content Feeder and CAE Feeder applications. Using separate Solr cores has a number of advantages:

  • It becomes possible to use Solr's runtime administration capabilities such as reloading existing cores after configuration changes, adding new cores and even replacing existing cores.

  • Separate cores provide better performance and use less memory. Solr caches work more efficiently because they only need to store data for the searched index and not for a larger shared index. Also, caches won't be invalidated after changing documents of other indices.

  • It avoids problems with relevancy scoring. Index statistics such as the term frequency are used to compute the relevancy of search results. In a shared index, unrelated documents may change the scoring unintentionally.

  • It becomes possible to back up and restore indexes independently from one another.

  • It becomes possible to move a single index to another Solr installation.

  • Different indices can use different configurations and index schema.

Index Data

Each Solr core has its own data directory with index files and transaction log. The actual index files are written to the directory data/index. In addition to the index, Solr maintains a transaction log with latest and/or pending changes for the index files. The transaction log is stored in the directory data/tlog.

Lib directory

The directory <solr-home>/lib contains some additional libraries that can be used by all Solr cores and are not available in the Solr web application. This includes some required CoreMedia extensions.