Generate a Sitemap from an Algolia Index
On this page
Having great content and UX is only useful if people can find it. Search Engine Optimization (SEO) is a crucial traction strategy for most websites, and sitemaps play a significant role. A sitemap is a file that describes all the pages of your website, so that search engine bots can easily index your content. Sitemaps provide valuable information such as which pages to prioritize, or how often a page updates.
Sitemaps are particularly useful with sites or applications that load content asynchronously. That’s the case of most JavaScript-powered single-page applications and progressive web apps. That’s also the case when you’re using Algolia on the front-end.
Thanks to the flexibility of facets, Algolia can power navigation in addition to search result pages, which lets you implement dynamic category pages based on the data in your index. These are great candidates to add to your sitemap.
Prerequisites
Familiarity with Node.js
This tutorial assumes you’re familiar with Node.js, how it works, and how to create and run Node.js scripts. Make sure to install Node.js (v6 or above) in your environment.
If you want to learn more before going further, we recommend you start with the following resources.
Have an Algolia account
For this tutorial, we assume that you already have an Algolia account. If not, you can create an account before getting started.
Dataset
For this tutorial, we’ll use an e-commerce dataset where each result is a product. All records have a categories
attribute containing one or more categories.
To follow along, you can download the dataset and import it in your Algolia application.
Install dependencies
Before starting, you need to install algolia-sitemap in your project. This open-source wrapper for algoliasearch lets you dynamically generate sitemaps from your Algolia indices.
1
npm install algolia-sitemap
Create a sitemap of all the records in your index
We’ll start by creating a sitemap with all our catalog products to make sure search engines know where to find them.
First, you need to provide your Algolia credentials (application ID and search-only API key). Make sure that the key has the browse
permission. You can generate one from the API keys tab of your Algolia dashboard.
1
2
3
4
5
6
7
8
9
10
11
const algoliaSitemap = require('algolia-sitemap');
const algoliaConfig = {
appId: 'YourApplicationID',
apiKey: 'YourSearchOnlyAPIKey', // Must have a `browse` permission
indexName: 'your_index_name',
};
algoliaSitemap({
algoliaConfig,
});
Then, you need to provide a hitToParams
callback. We call this function for each record in your index, allowing you to map a record to a sitemap entry. The return value of your callback must be an object whose attributes are the same as those of a <url>
entry in a sitemap.xml
file.
loc
(required): The URL of the detail pagelastmod
: The last modified date (ISO 8601)priority
: The priority of this page compared to other pages in your site (between 0 and 1)changefreq
: Describes how frequently the page is likely to changealternates
: Alternate versions of this linkalternates.languages
: An array of enabled languages for this linkalternates.hitToURL
: A function to transform a language into a URL
In our case, we’ll keep it simple and only output the loc
property for each product.
Make sure to modify the hitToParams
function to match the content of your records. You also need to create a /sitemaps
directory to output all generated sitemaps.
1
2
3
4
5
6
7
8
9
10
function hitToParams({ url }) {
return { loc: url };
}
algoliaSitemap({
algoliaConfig,
hitToParams,
sitemapLoc: 'https://example.com/sitemaps',
outputFolder: 'sitemaps',
});
We can now run our script with Node.js, which will create sitemaps in the /sitemaps
directory. There are two types of sitemap files:
- the
sitemap-index
file with a link to each sitemap, - and the sitemaps files with links to your products.
To ensure the generated sitemaps are correct, you can use any sitemap validator online such XML Sitemap Checker. Note that Algolia doesn’t run this website, so we can’t provide support for it.
Create a sitemap for categories
Let’s now see how we can generate entries for category pages. Our records have a categories
attribute that looks like the following:
1
2
3
{
"categories": ["Mobile Phones", "Phones & Tablets"]
}
Here, the product belongs to two categories, so let’s assume that we can access each of them at https://example.com/CATEGORY_NAME
.
We want to modify our hitToParams
function, so it returns an array of all the categories that belong to the given hit. Since categories likely apply to many records, we need to make sure not to add them to our sitemaps more than once.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
const alreadyAdded = {};
function hitToParams({ categories }) {
const newCategories = categories.filter(
(category) => !alreadyAdded[category]
);
if (!newCategories.length) {
return false;
}
const locs = [];
newCategories.forEach((category) => {
alreadyAdded[category] = category;
locs.push({
loc: `https://example.com/${category}`,
});
});
return locs;
}
For each hit, we check if they contain categories that we didn’t add to the sitemap yet, and we add them. This lets us save all our category pages to our sitemap.
Create a sitemap for both products and categories
We can edit our script to generate a sitemap for both our products and categories. To do so, all we need to do is push the current product along with its categories.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function hitToParams({ categories, url }) {
// ...
newCategories.forEach((category) => {
alreadyAdded[category] = category;
alreadyAdded[url] = url;
locs.push(
...[
{
loc: `https://example.com/${category}`,
},
{ loc: url },
]
);
});
// ...
}
Notify search engines of sitemap changes
Finally, we can let search engines know that our sitemap changed. Most search engines have a ping
mechanism to inform them of a new sitemap so that we can perform this directly from our script.
For Google and Bing, all we need to do is send a GET
request to a specific endpoint.
1
2
3
4
5
6
7
8
const endpoints = [
'http://www.google.com/webmasters/sitemaps/ping?sitemap=http://example.com/sitemap.xml',
'http://www.bing.com/webmaster/ping.aspx?siteMap=http://example.com/sitemap.xml',
];
Promise.all(endpoints.map((endpoint) => fetch(endpoint))).then(() => {
console.log('Done');
});