Contentful CMS – overcoming content search limitations

3 years ago
CMS - Content Management System

Contentful is one of the most popular headless Content Management Systems. It offers a clean interface for Content Managers, which includes visualization tools and content publication scheduling mechanisms. Each content has a list of usages and relations, allowing you to strongly define what quality and kind of content your managers can put in each place. Based on our own experience, we are confident to say that your developers will be satisfied. It offers many possible integrations, an easy API, roles-management systems, migrations, environment features, and much more. Due to the nature of headless CMS, consumers have much more flexibility in creating presentation layers on different platforms.

However, each Content Management System is focused obviously on the content itself. What does it mean? Focusing on increasing the quality of the particular area is often a blocker. It can also be something else that may have different fundamentals meaning that they are hard to go hand-in-hand together.

Searching for perfect content

While Contentful offers a base mechanism of searching within your content that covers all fundamentals usages, like in most other CMS, your developers will start to do acrobatics when adding complex (although looking trivial) searches.

Let’s consider the scenario below:

Each product has different variations, each of those has a particular SKU (Stock Keeping Unit) and other fields, like size, price, or even maybe administrative zones that are available or could be applied.

Where is the snag? 

A reliable CMS in speed focused on content experience, won’t let you increase the complexity of searches that will consume too much of its resources. It applies to Contentful as well. While it allows to include complex criteria on fields for related Content when it is a single element, your developers and product will be stuck for the scenarios like above.

You will be unable to present personalized “top picks” for your customer who you know has a particular size of shoes. Or knowing (s)he lives in California, you will keep selecting the wrong type of clothes. Or show only high-priced products instead of cheaper counterparts when you know his wealth.

Are such criteria even available? Forget about them in any CMS if you want to connect it to other parts of your product-oriented areas, like the warehouse or supply chain.

Solution

We faced the issue as well, so we know the pain. In short, loading your denormalized data from various sources into the ElasticSearch solution allowed us to personalize the search of products in every way we described above – and even more, more than we imagined.

The graph presents simplified architecture. We also added Redis cache to remember user-specific searches. They are based on the administrative area in which the user was located and selected interests in certain product groups. Once one of those changed, we were automatically invalidating the cache and searched once again. As for safety purposes, we kept the whole CMS data in Redis cache as well to act quickly in case we needed to perform searches if one of the criteria changed. For example, one of the available interests was removed or added to a given administrative area on the content side. Also, to have a faster rebuild of indexes or existing searches.

Of course, there are more ways to solve the issue. One of them is to “flatten” the content to avoid references to another one. However, this approach makes Content Managers sweat. Using the example above: they need to provide the same data for every Product and Variation in a single Content-type. It results in less structured data of your inventory meaning that it is more likely to make a human mistake to make product data no longer synchronized).

(Simplified) Real life example

Let’s assume we have following Content Types:

Each product may have multiple Variations. Each Variation includes SKU, available sizes, and color.

Time to jump to Contentful CMS’s GraphQL playground to see some content – for brevity, showing only dummy content:

show how searching via REST fails

Let’s take a look at what we can do with ElasticSearch! For brevity, we skip the process of ETL of data (from CMS into ES) and caching.

Consider the case of one of our precious customers who has a shoe size of 36 and likes red color. Let’s look for shoes that may match his preferences! The query below requires the size of 36 and discards other results.

{
	"query": {
		"bool": {
			"must": [
				{
					"terms": {
						"sizes": [36]
					}
				},
				{
					"bool": {
						"should": [
							{"match": {
								"color": "red"
							}}
						]
					}
				}
			]
		}
	},
	"fields": ["name", "variation"],
  "_source": false
}

Great! However, we found really few matches, as a precise search found only a few ideal candidates. As the shoe size cannot be adjusted because it is a forced preference, we shall find other suggestions of other available shoes which still match his shoe size!

There are a million possibilities to figure out the score/rating of each finding – again, for brevity – we will use some simple formula to rate search results. 

"query": {
		"function_score": {
			"boost": 5,
			"functions": [
				{
					"filter": { "match": { "color": "red" } },
					"weight": 3
				},
				{
					"filter": { "terms": { "size": [36] } },
					"weight": 20
				}
			]
		}
	},

That’s just the tip of the iceberg. Real-life tailoring of similar products that a user would like is getting more complex relatively quickly – the more items we have and the more properties they have.

Summary

Does this make your software infrastructure grow in cost as well? Naturally. However, if you consider this a boost to your sales and personalization, which every customer likes, it will be a marginal cost compared to the potential outcome.

About the author

Kamil

NestJS Developer

Hungry for knowledge in Clean Architecture and DDD. Love to learn himself and mentor others. NestJS believer. Fierce fan of board games.