Guides / Sending and managing data / Manage your indices

Generate a Sitemap from an Algolia Index

Having great content and UX is only useful if people can find it. Search Engine Optimization (SEO) is a crucial traction strategy for most websites, and sitemaps play a significant role. A sitemap is a file that describes all the pages of your website, so that search engine bots can easily index your content. Sitemaps provide valuable information such as which pages to prioritize, or how often a page updates.

Sitemaps are particularly useful with sites or applications that load content asynchronously. That’s the case of most JavaScript-powered single-page applications and progressive web apps. That’s also the case when you’re using Algolia on the front-end.

Thanks to the flexibility of facets, Algolia can power navigation in addition to search result pages, which lets you implement dynamic category pages based on the data in your index. These are great candidates to add to your sitemap.

Prerequisites

Familiarity with Node.js

This tutorial assumes you’re familiar with Node.js, how it works, and how to create and run Node.js scripts. Make sure to install Node.js (v6 or above) in your environment.

If you want to learn more before going further, we recommend you start with the following resources.

Have an Algolia account

For this tutorial, we assume that you already have an Algolia account. If not, you can create an account before getting started.

Dataset

For this tutorial, we’ll use an e-commerce dataset where each result is a product. All records have a categories attribute containing one or more categories.

To follow along, you can download the dataset and import it in your Algolia application.

Install dependencies

Before starting, you need to install algolia-sitemap in your project. This open-source wrapper for algoliasearch lets you dynamically generate sitemaps from your Algolia indices.

1
npm install algolia-sitemap

Create a sitemap of all the records in your index

We’ll start by creating a sitemap with all our catalog products to make sure search engines know where to find them.

First, you need to provide your Algolia credentials (application ID and search-only API key). Make sure that the key has the browse permission. You can generate one from the API keys tab of your Algolia dashboard.

1
2
3
4
5
6
7
8
9
10
11
const algoliaSitemap = require('algolia-sitemap');

const algoliaConfig = {
  appId: 'YourApplicationID',
  apiKey: 'YourSearchOnlyAPIKey', // Must have a `browse` permission
  indexName: 'your_index_name',
};

algoliaSitemap({
  algoliaConfig,
});

Then, you need to provide a hitToParams callback. We call this function for each record in your index, allowing you to map a record to a sitemap entry. The return value of your callback must be an object whose attributes are the same as those of a <url> entry in a sitemap.xml file.

  • loc (required): The URL of the detail page
  • lastmod: The last modified date (ISO 8601)
  • priority: The priority of this page compared to other pages in your site (between 0 and 1)
  • changefreq: Describes how frequently the page is likely to change
  • alternates: Alternate versions of this link
  • alternates.languages: An array of enabled languages for this link
  • alternates.hitToURL: A function to transform a language into a URL

In our case, we’ll keep it simple and only output the loc property for each product. Make sure to modify the hitToParams function to match the content of your records. You also need to create a /sitemaps directory to output all generated sitemaps.

1
2
3
4
5
6
7
8
9
10
function hitToParams({ url }) {
  return { loc: url };
}

algoliaSitemap({
  algoliaConfig,
  hitToParams,
  sitemapLoc: 'https://example.com/sitemaps',
  outputFolder: 'sitemaps',
});

We can now run our script with Node.js, which will create sitemaps in the /sitemaps directory. There are two types of sitemap files:

  • the sitemap-index file with a link to each sitemap,
  • and the sitemaps files with links to your products.

To ensure the generated sitemaps are correct, you can use any sitemap validator online such XML Sitemap Checker. Note that Algolia doesn’t run this website, so we can’t provide support for it.

Create a sitemap for categories

Let’s now see how we can generate entries for category pages. Our records have a categories attribute that looks like the following:

1
2
3
{
  "categories": ["Mobile Phones", "Phones & Tablets"]
}

Here, the product belongs to two categories, so let’s assume that we can access each of them at https://example.com/CATEGORY_NAME.

We want to modify our hitToParams function, so it returns an array of all the categories that belong to the given hit. Since categories likely apply to many records, we need to make sure not to add them to our sitemaps more than once.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
const alreadyAdded = {};

function hitToParams({ categories }) {
  const newCategories = categories.filter(
    (category) => !alreadyAdded[category]
  );

  if (!newCategories.length) {
    return false;
  }

  const locs = [];

  newCategories.forEach((category) => {
    alreadyAdded[category] = category;

    locs.push({
      loc: `https://example.com/${category}`,
    });
  });

  return locs;
}

For each hit, we check if they contain categories that we didn’t add to the sitemap yet, and we add them. This lets us save all our category pages to our sitemap.

Create a sitemap for both products and categories

We can edit our script to generate a sitemap for both our products and categories. To do so, all we need to do is push the current product along with its categories.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function hitToParams({ categories, url }) {
  // ...

  newCategories.forEach((category) => {
    alreadyAdded[category] = category;
    alreadyAdded[url] = url;

    locs.push(
      ...[
        {
          loc: `https://example.com/${category}`,
        },
        { loc: url },
      ]
    );
  });

  // ...
}

Notify search engines of sitemap changes

Finally, we can let search engines know that our sitemap changed. Most search engines have a ping mechanism to inform them of a new sitemap so that we can perform this directly from our script.

For Google and Bing, all we need to do is send a GET request to a specific endpoint.

1
2
3
4
5
6
7
8
const endpoints = [
  'http://www.google.com/webmasters/sitemaps/ping?sitemap=http://example.com/sitemap.xml',
  'http://www.bing.com/webmaster/ping.aspx?siteMap=http://example.com/sitemap.xml',
];

Promise.all(endpoints.map((endpoint) => fetch(endpoint))).then(() => {
  console.log('Done');
});

Did you find this page helpful?