MultiClusters API Client
List of methods
Assign or Move userID |
Assign or Move a userID to a cluster. |
Get top userID |
Get the top 10 userIDs with the highest number of records per cluster. |
Get userID |
Returns the userID data stored in the mapping. |
List clusters |
List the clusters available in a multi-clusters setup for a single appID. |
List userIDs |
List the userIDs assigned to a multi-clusters appID. |
Remove userID |
Remove a userID and its associated data from the multi-clusters. |
Search userID |
Search for userIDs. |
Batch assign userIDs |
Assign multiple userIDs to a cluster. |
Has pending mappings |
Get the status of your clusters’ migrations or user creations. |
It’s recommended to use the Kotlin API client, which is better suited for Android development.
A Brief Technical Overview
How to split the data (Logical Split)
The data is split logically. We decided not to go with a hash-based split, which requires the aggregation of answers from multiple servers and adds network latency to the response time. Normally, the data will be user-partitioned - split according to a user-id.
Uses a single appID
If we were to follow the logic of using one appID per cluster, multi-clusters would require many appIDs. However, this would be difficult to manage, especially when moving data from one cluster to another in order to balance the load. Our API therefore relies on a single appID: the engine routes requests to a specific destination cluster, using a new HTTP header, X-ALGOLIA-USER-ID
, and a mapping that associates a userID
to a cluster.
What MCM doesn’t do
As mentioned, the data is broken up logically. The split is done in such a way that requires only one server to perform a complete search. This API doesn’t aggregate the response from multiple clusters. We designed the multi-clusters feature in order to stay fast even with a lot of clusters in multiple regions.
Shared configuration
With MCM, all the settings, rules, synonyms and api keys operations are replicated on all the machine in order to have the same configuration inside the clusters. Only the records stored in the index are different between two clusters.
Shared data
For some use cases, there are two types of data:
- Public data
- Private user data
The public data can be searched at the same time as private user data. With MCM, it’s possible to create public records with the multi-clusters using the special userID value * in order to replicate the record on all the clusters and make it available for search.
ObjectIDs
The objectIDs need to be unique from the userIDs to avoid a record of one userID to override the record of another userID. The objectID needs to be unique also because of the shared data which can be retrieved at the same time as the data of one specific customer. We recommend appending to the objectID, the userID of the specific user to be sure the objectID is unique.
Number of indices
MCM is design to work on a small number of indices (< 100). This limitation is mainly here to preserve the performance of the user migration. To migrate a user from one cluster to another, the engine needs to enumerate all the records of this specific user in order to send it to the destination cluster and so loop on all the indices, the cost of the operation is directly linked to the number of indices.
A small number of indices also allow the engine to optimize more the indexing operations by batching the operation of one index together.
Check out our Tutorial
Perhaps the best way to understand the MultiClusters API is to check out our MCM tutorial, where we explain, with code samples, the most important endpoints.
Limitation v0.1
For v0.1, the assignment of users to clusters won’t be automatic: if a user is not properly assigned, or not found, the call will be rejected.
As you will notice, the documentation is actually using the REST API endpoints directly. We will soon be rolling out our API clients methods.
How to get the feature
MCM needs to be enabled on your cluster. You can contact support@algolia.com for more information.
MultiCluster usage
With a multi-cluster setup, the userID needs to be specified for each of the following methods:
- Search index
- Search multiple indices
- Search for facet values
- Browse index
- Add objects
- Delete objects
- Delete by
- Partial update objects
- Get objects
- Custom batch
- Wait for operations
Each of these methods allows you to pass any extra header
to the request. We’ll make use of the X-Algolia-User-ID
header.
Here is an example of the search
method, but the principle is the same for all the methods listed above:
1
2
3
4
5
$index = $client->initIndex('your_index_name');
$res = $index->search('query string', [
'X-Algolia-User-ID' => 'user123'
]);
You can find an example of how to pass extra headers
for the other methods in their respective documentation.