API Reference / API Parameters / distinct
Type: integer | boolean
Engine default: 0 (no distinct)
Parameter syntax
'distinct' => 0|1|2|3

Can be used in these methods:

About this parameter

Enables de-duplication or grouping of results.

Distinct functionality is based on one attribute, as defined in attributeForDistinct. Using this attribute, you can limit the number of returned records that contain the same value in that attribute.

For example, if the distinct attribute is show_name and several hits (episodes) have the same value for show_name (for example “game of thrones”)

  • if distinct is set to 1 (de-duplication):

    • then only the most relevant episode is kept (with respect to the ranking formula); the others are not returned. The direct effect of this is to remove redundant records from your results.
  • if distinct is set to N > 1 (grouping):

    • then the N most relevant episodes for every show are kept, with similar consequences.

Usage notes:

  • For this setting to work, you need to set the distinct attribute in attributeForDistinct.

  • When set to 1, you enable de-duplication, in which only the most relevant result is returned for all records that have the same value in the distinct attribute. This is similar to the SQL distinct keyword.

  • When set to N (where N > 1), you enable grouping, in which most N hits will be returned with the same value for the distinct attribute.

  • If no distinct attribute is configured, distinct will be accepted at query time but silently ignored.

  • A 0 value disables de-duplication and grouping.

  • When using grouping (when distinct > 1):

    • the hitsPerPage parameter controls the number of groups that are returned. In the case of jobs and associated companies, if hitsPerPage=10 and distinct=3, up to 30 records will be returned - 10 companies and at most 3 jobs per company. This behavior makes it easy to implement pagination with grouping.
    • the nbHits attribute in the response contains the number of groups.
  • Keep in mind

    • Distinct is a computationally expensive operation on large data sets, especially if distinct > 1
    • distinct(true) is the same as distinct(1)
    • distinct(false) is the same as distinct(0), which is the same thing as not specifying distinct at all

Examples

Set default distinct mode

1
2
3
4
5
$index->setSettings([
  'distinct' => 0
  // 'distinct' => 1
  // 'distinct' => 2
]);
1
2
3
4
5
$results = $index->search('query', [
  'distinct' => 1
  // 'distinct' => 0
  // 'distinct' => 2
]);

Did you find this page helpful?