java - Elasticsearch : Disable IDF completely for search result scoring -
this sample data in elasticsearch
{ "_index": "12_index", "_type": "skill_strings", "_id": "avkv-km4axmy3feczw9t", "_source": { "str": "php php php" } }, { "_index": "12_index", "_type": "skill_strings", "_id": "avkv-knfaxmy3feczw9u", "_source": { "str": "javascript php javascript javascript" } } "bool":{ "must":[ // conditions {"match_phrase":{"str":"php"}} ], "should":[ {"match_phrase":{"sentences":"javascript"}} ] }
norms disable
in result set, php (with 16 occurrences) gets score of 13.65 (rounded off) whereas javascript same number of occurrences in doc gets lower score of 9.58
as per use case irrespective of how rare word or how short/long field is, want same score same term frequency.
how can ?
if literally want first document score 3.0 str:php
(before score normalization), , second score 3.0 str:javascript
(before score normalization), [you should script_score
][1] , using [tf()
function][2].
this bypass (1) length-normalization, (2) consideration of 'rarity' (idf), , (3) normalization of (tf)
Comments
Post a Comment