Algolia and other search engines default to giving preference for exact matches. In some use cases, this can be used to take advantage of people’s typing mistakes and get ranked high on a popular search query.
For example, imagine we want to index Twitter users. A good example of typosquatting is the account @BarakObama, who has 15.8k followers, but isn’t @BarackObama (Barack Obama’s official account). Because Algolia prioritizes exact matches, typing “BarakObama” would return the “BarakObama” record first, regardless of custom ranking.
Not all use cases need to prevent typosquatting. However, if this is your case, which often happens when you have to deal with user-generated content, you may need to put a strategy in place.
Dataset Example
Back to our Twitter example. Let’s say we have an index called twitter_accounts
that looks like this:
1
2
3
4
5
6
7
8
9
10
| [
{
"twitter_handle": "BarackObama",
"nb_followers": 103500000
},
{
"twitter_handle": "BarakObama",
"nb_followers": 15800
}
]
|
Even if we set descending custom ranking on nb_followers
, because Algolia prioritizes exact results, the @BarakObama account would benefit from traffic coming from users making a typo when searching for the official Barack Obama account.
We can short-circuit this issue by leveraging Algolia’s sort-by feature.
Updating the dataset
The recommended solution is to add a boolean attributes that separates popular records from the rest. For example, you could add something like is_verified_account = true
, or is_popular = true
, and sort on that attribute.
For this approach to work well, the number of records with is_popular
or is_verified_account
set to true
should be a small subset of the dataset (around 1% of the dataset maximum).
We have a popularity metric (nb_followers
), so we can use it to define a rule that determines if a record is popular or not. In this example, we could say that a user is popular if they have more than a million followers.
We can use the browse
method to update the index:
1
2
3
4
5
6
7
8
| $records = [];
foreach ($index->browseObjects() as $hit) {
$hit['is_popular'] = ($hit['nb_followers'] > 1000000);
$records[] = $hit;
}
$index->saveObjects($records);
|
1
2
3
4
5
6
7
8
| records = []
index.browse do |hit|
hit['is_popular'] ||= hit['nb_followers'] > 1_000_000
records << hit
end
index.save_objects(records)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| index
.browseObjects({
batch(hits) {
records = records.concat(hits.map(hit => {
return {
...hit,
is_popular: hit.nb_followers > 1000000
};
}));
}
})
.then(() => index.saveObjects(records))
.then(({ objectIDs }) => {
console.log(objectIDs);
});
|
1
2
3
4
5
6
| records = []
for hit in index.browse_objects():
hit['is_popular'] = hit['nb_followers'] > 1000000
records.append(hit)
index.save_objects(records)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| var hits = index.BrowseAll(new Query());
List<JObject> records = new List<JObject>();
foreach (var hit in hits)
{
hit["is_popular"] = (long)hit["nb_followers"] > 1000000;
records.Add(hit);
if (records.Count > 1000)
{
index.AddObjects(hits);
records.Clear();
}
}
index.AddObjects(records);
|
1
2
3
4
5
6
7
8
9
10
11
12
| SearchIndex<Record> index = client.initIndex("YourIndexName", Record.class);
IndexIterable<Record> iterator = index.browseObjects(new BrowseIndexQuery());
ArrayList<Record> records = new ArrayList<>();
iterator.forEach(record -> {
boolean isPopular = record.getNbFollowers() > 1000000;
record.setPopular(isPopular);
records.add(record);
});
index.saveObjects(records);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| type User struct {
ObjectID string `json:"objectID"`
NbFollowers int `json:"nb_followers"`
IsPopular bool `json:"is_popular"`
}
var users []User
var user User
it, _ := index.BrowseObjects()
for {
_, err := it.Next(&user)
if err != nil {
if err == io.EOF {
break
}
// error handling
}
user.IsPopular = user.NbFollowers > 1000000
users = append(users, user)
}
res, err := index.SaveObjects(users)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| case class User(objectID: String, nb_followers: Int, is_popular: Option[Boolean]) extends ObjectID
implicit val ec: ExecutionContextExecutor = ExecutionContext.global
implicit val awaitDuration = 10 seconds
val syncHelper = AlgoliaSyncHelper(client)
val futures =
syncHelper
.browse[User]("myIndex", Query())
.map { hits =>
hits
.map { hit =>
User(
objectID = hit.objectID,
nb_followers = hit.nb_followers,
is_popular = Some(hit.nb_followers > 1000000),
)
}
}
.map { batch =>
client.execute {
index into "myIndex" objects batch
}
}
Await.ready(Future.sequence(futures), awaitDuration)
System.exit(0)
|
1
2
3
4
5
6
7
8
9
10
11
| val records = index.browseObjects().flatMap { response ->
response.hits.map {
val map = it.toMutableMap()
val nbFollowers = it.getValue("nb_followers").primitive.long
map["is_popular"] = JsonLiteral(nbFollowers > 1000000)
JsonObject(map)
}
}
index.saveObjects(records)
|
Once updated, our dataset would look like this:
1
2
3
4
5
6
7
8
9
10
11
12
| [
{
"twitter_handle": "BarackObama",
"nb_followers": 103500000,
"is_popular": true
},
{
"twitter_handle": "BarakObama",
"nb_followers": 15800,
"is_popular": false
}
]
|
By default, the first rule in Algolia’s ranking formula is typo
(which, for the vast majority of use cases, is a sane default value). To prevent typosquatting, you need to add another ranking signal that’s higher than the typo
rule. This is what Algolia commonly refers to as a sort-by attribute.
When it’s done, searching for “BarakObama” will first return the “BarackObama” record.
Using the API
To set a sort-by attribute, you need to use the ranking
with the setSettings
method.
1
2
3
4
5
6
7
8
9
10
11
12
13
| $index->setSettings([
'ranking' => [
"desc(is_popular)",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"
]
]);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| index.set_settings({
ranking: [
'desc(is_popular)',
'typo',
'geo',
'words',
'filters',
'proximity',
'attribute',
'exact',
'custom'
]
})
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| index.setSettings({
ranking: [
"desc(is_popular)",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"
]
}).then(() => {
// done
});
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| index.set_settings({
'ranking': [
'desc(is_popular)',
'typo',
'geo',
'words',
'filters',
'proximity',
'attribute',
'exact',
'custom'
]
})
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| index.setSettings([
"ranking": [
"desc(is_popular)",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"
]
])
|
1
2
3
4
5
6
7
8
9
10
11
12
| List<String> ranking = Arrays.asList(
"desc(is_popular)",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"
);
index.setSettings(new JSONObject().put("ranking", new JSONArray(ranking)), forwardToReplicas, null);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| IndexSettings settings = new IndexSettings
{
CustomRanking = new List<string>
{
"desc(is_popular)",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"
}
};
index.SetSettings(settings);
// Asynchronous
await index.SetSettingsAsync(settings);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| index.setSettings(
new IndexSettings()
.setRanking(
Arrays.asList(
"desc(is_popular)",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom"
)
)
);
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| index.SetSettings(search.Settings{
Ranking: opt.Ranking(
"desc(is_popular)",
"typo",
"geo",
"words",
"filters",
"proximity",
"attribute",
"exact",
"custom",
),
})
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| client.execute {
setSettings of "myIndex" `with` IndexSettings(
ranking = Some(Seq(
Ranking.desc("is_popular"),
typo,
geo,
words,
filters,
proximity,
attribute,
exact,
custom,
)),
)
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| val settings = settings {
ranking {
+Desc("is_popular")
+Typo
+Geo
+Words
+Filters
+Proximity
+Attribute
+Exact
+Custom
}
}
index.setSettings(settings)
|
Using the Dashboard
You can also set a sort-by attribute in your Algolia dashboard.
- Go to your dashboard and select your index.
- Click the Ranking tab.
- In the Ranking Formula & Custom Ranking section, click the Add sort-by attribute button and select
is_popular
.
- Don’t forget to save your changes.