Indexed full-text search count in MongoDB is exremely slow... and what is "TEXT" stage? -


so have mongodb 3.0.7 running on amazon r3.xlarge box (4 cores, 30gb of ram, 80gb ssd) , have stored 40gb worth of semi-complex json data, 7 million documents. looks this:

{   "_id": objectid("5665f1fef8adb3ee597af375"),   "some_other_id": "xxxxxxxxxxxxxxxx",   "meta": {     ...   },   "raw": {     ...     "text": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",     ...   },   "__v": 0,   "created_at": isodate("2015-12-07t20:54:10z") } 

i've indexed 1 text field using following command:

db.messages.createindex({"raw.text":"text"}) 

this index looks like:

    {             "v" : 1,             "key" : {                     "_fts" : "text",                     "_ftsx" : 1             },             "name" : "raw.text_text",             "ns" : "dbdb.messages",             "weights" : {                     "raw.text" : 1             },             "default_language" : "english",             "language_override" : "language",             "textindexversion" : 2     } 

this yielded index size of 3gb 40gb data. in addition, have 4 more indexes on collection, numeric. in total have 5 indexes on collection , total size of index 4gb.

i'm performing count() operation on query such as:

db.messages.find({$text:{$search:"mouse"}}).count() 

what i've been finding time takes count return more or less proportional number of jsons match find() criteria. condition returns bunch of results, 5 million, takes @ least 30 minutes! again, i'm not returning result. i'm trying find out how many of documents have single keyword in it.

this explain(). according link (https://docs.mongodb.org/v3.0/reference/explain-results/) "text" not 1 of stages. don't know means , google search getting me nothing. mention text index i'm hoping full-text index being used in count() operation.

> db.messages.explain().find({$text:{$search:"mouse"}}).count() {         "queryplanner" : {                 "plannerversion" : 1,                 "namespace" : "dbdb.messages",                 "indexfilterset" : false,                 "parsedquery" : {                         "$text" : {                                 "$search" : "mouse",                                 "$language" : ""                         }                 },                 "winningplan" : {                         "stage" : "count",                         "inputstage" : {                                 "stage" : "text",                                 "indexprefix" : {                                  },                                 "indexname" : "raw.text_text",                                 "parsedtextquery" : {                                  }                         }                 },                 "rejectedplans" : [ ]         },         "serverinfo" : {                 "host" : "xxxxxxxxxxxx",                 "port" : 27017,                 "version" : "3.0.7",                 "gitversion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd"         },         "ok" : 1 } 

i have 30gb of physical ram on box , indexes total 4gb. see when perform these queries, mongod using 8gb. i'm assuming loads indexes entirely in memory. i'd expect every find().count() operation return within seconds if it's running on index. i'm not performing other database operations inserts or other queries. took else out isolate database.

i'm doing wrong here feel i've done according i've read online. i'm new mongodb, though.

finally, the currentop() output after running query 5 mins or so:

> db.currentop() {         "inprog" : [                 {                         "desc" : "conn1",                         "threadid" : "0x316f1e0",                         "connectionid" : 1,                         "opid" : 94,                         "active" : true,                         "secs_running" : 539,                         "microsecs_running" : numberlong(539798074),                         "op" : "query",                         "ns" : "dbdb.messages",                         "query" : {                                 "count" : "messages",                                 "query" : {                                         "$text" : {                                                 "$search" : "mouse"                                         }                                 },                                 "fields" : {                                  }                         },                         "plansummary" : "text {}",                         "client" : "127.0.0.1:37784",                         "numyields" : 3478,                         "locks" : {                                 "global" : "r",                                 "mmapv1journal" : "r",                                 "database" : "r",                                 "collection" : "r"                         },                         "waitingforlock" : false,                         "lockstats" : {                                 "global" : {                                         "acquirecount" : {                                                 "r" : numberlong(6958)                                         }                                 },                                 "mmapv1journal" : {                                         "acquirecount" : {                                                 "r" : numberlong(3479)                                         }                                 },                                 "database" : {                                         "acquirecount" : {                                                 "r" : numberlong(3479)                                         }                                 },                                 "collection" : {                                         "acquirecount" : {                                                 "r" : numberlong(3479)                                         }                                 }                         }                 }         ] } 

edit: adding "executionstats" output edit: upgraded mongodb 3.2.1. these new execution stats. previous execution stats using different search word because "mouse" took way long. time, let "mouse query finish" , can see took 40 mins.

> db.messages.explain("executionstats").find({$text:{$search:"mouse"}}).count() {         "queryplanner" : {                 "plannerversion" : 1,                 "namespace" : "dbdb.messages",                 "indexfilterset" : false,                 "parsedquery" : {                         "$text" : {                                 "$search" : "mouse",                                 "$language" : "english",                                 "$casesensitive" : false,                                 "$diacriticsensitive" : false                         }                 },                 "winningplan" : {                         "stage" : "count",                         "inputstage" : {                                 "stage" : "text",                                 "indexprefix" : {                                  },                                 "indexname" : "raw.text_text",                                 "parsedtextquery" : {                                         "terms" : [                                                 "mous"                                         ],                                         "negatedterms" : [ ],                                         "phrases" : [ ],                                         "negatedphrases" : [ ]                                 },                                 "inputstage" : {                                         "stage" : "text_match",                                         "inputstage" : {                                                 "stage" : "text_or",                                                 "inputstage" : {                                                         "stage" : "ixscan",                                                         "keypattern" : {                                                                 "_fts" : "text",                                                                 "_ftsx" : 1                                                         },                                                         "indexname" : "raw.text_text",                                                         "ismultikey" : true,                                                         "isunique" : false,                                                         "issparse" : false,                                                         "ispartial" : false,                                                         "indexversion" : 1,                                                         "direction" : "backward",                                                         "indexbounds" : {                                                          }                                                 }                                         }                                 }                         }                 },                 "rejectedplans" : [ ]         },         "executionstats" : {                 "executionsuccess" : true,                 "nreturned" : 0,                 "executiontimemillis" : 2305641,                 "totalkeysexamined" : 5984163,                 "totaldocsexamined" : 5984163,                 "executionstages" : {                         "stage" : "count",                         "nreturned" : 0,                         "executiontimemillisestimate" : 2277920,                         "works" : 11968329,                         "advanced" : 0,                         "needtime" : 11968328,                         "needyield" : 0,                         "savestate" : 179094,                         "restorestate" : 179094,                         "iseof" : 1,                         "invalidates" : 0,                         "ncounted" : 5984163,                         "nskipped" : 0,                         "inputstage" : {                                 "stage" : "text",                                 "nreturned" : 5984163,                                 "executiontimemillisestimate" : 2276450,                                 "works" : 11968329,                                 "advanced" : 5984163,                                 "needtime" : 5984165,                                 "needyield" : 0,                                 "savestate" : 179094,                                 "restorestate" : 179094,                                 "iseof" : 1,                                 "invalidates" : 0,                                 "indexprefix" : {                                  },                                 "indexname" : "raw.text_text",                                 "parsedtextquery" : {                                         "terms" : [                                                 "mous"                                         ],                                         "negatedterms" : [ ],                                         "phrases" : [ ],                                         "negatedphrases" : [ ]                                 },                                 "inputstage" : {                                         "stage" : "text_match",                                         "nreturned" : 5984163,                                         "executiontimemillisestimate" : 2275970,                                         "works" : 11968329,                                         "advanced" : 5984163,                                         "needtime" : 5984165,                                         "needyield" : 0,                                         "savestate" : 179094,                                         "restorestate" : 179094,                                         "iseof" : 1,                                         "invalidates" : 0,                                         "docsrejected" : 0,                                         "inputstage" : {                                                 "stage" : "text_or",                                                 "nreturned" : 5984163,                                                 "executiontimemillisestimate" : 2275550,                                                 "works" : 11968329,                                                 "advanced" : 5984163,                                                 "needtime" : 5984165,                                                 "needyield" : 0,                                                 "savestate" : 179094,                                                 "restorestate" : 179094,                                                 "iseof" : 1,                                                 "invalidates" : 0,                                                 "docsexamined" : 5984163,                                                 "inputstage" : {                                                         "stage" : "ixscan",                                                         "nreturned" : 5984163,                                                         "executiontimemillisestimate" : 11520,                                                         "works" : 5984164,                                                         "advanced" : 5984163,                                                         "needtime" : 0,                                                         "needyield" : 0,                                                         "savestate" : 179094,                                                         "restorestate" : 179094,                                                         "iseof" : 1,                                                         "invalidates" : 0,                                                         "keypattern" : {                                                                 "_fts" : "text",                                                                 "_ftsx" : 1                                                         },                                                         "indexname" : "raw.text_text",                                                         "ismultikey" : true,                                                         "isunique" : false,                                                         "issparse" : false,                                                         "ispartial" : false,                                                         "indexversion" : 1,                                                         "direction" : "backward",                                                         "indexbounds" : {                                                          },                                                         "keysexamined" : 5984163,                                                         "dupstested" : 5984163,                                                         "dupsdropped" : 0,                                                         "seeninvalidated" : 0                                                 }                                         }                                 }                         }                 }         },         "serverinfo" : {                 "host" : "xxxxxxxxxxxxxxxxxx",                 "port" : 27017,                 "version" : "3.2.1",                 "gitversion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2"         },         "ok" : 1 } 

i don't believe query using index.

here how looks on machine.

{     "queryplanner" : {         "plannerversion" : 1,         "namespace" : "test.grades",         "indexfilterset" : false,         "parsedquery" : {             "$text" : {                 "$search" : "homework",                 "$language" : "english",                 "$casesensitive" : false,                 "$diacriticsensitive" : false             }         },         "winningplan" : {             "stage" : "count",             "inputstage" : {                 "stage" : "text",                 "indexprefix" : {                  },                 "indexname" : "scores.type_text",                 "parsedtextquery" : {                     "terms" : [                         "homework"                     ],                     "negatedterms" : [ ],                     "phrases" : [ ],                     "negatedphrases" : [ ]                 },                 "inputstage" : {                     "stage" : "text_match",                     "inputstage" : {                         "stage" : "text_or",                         "inputstage" : {                             "stage" : "ixscan",                             "keypattern" : {                                 "_fts" : "text",                                 "_ftsx" : 1                             },                             "indexname" : "scores.type_text",                             "ismultikey" : true,                             "isunique" : false,                             "issparse" : false,                             "ispartial" : false,                             "indexversion" : 1,                             "direction" : "backward",                             "indexbounds" : {                              }                         }                     }                 }             }         },         "rejectedplans" : [ ]     },     "serverinfo" : {         "host" : "xxxxxxxxxxx",         "port" : 27017,         "version" : "3.2.1",         "gitversion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2"     },     "ok" : 1 } 

you can see it's using ixscan. if possible, please drop index , re-create. i'd use ensureindex perform action.

you may want better picture of what's going on more verbosity using db.messages.explain("executionstats").find({$text:{$search:"mouse"}}).count()

edit: index on machine looks like:

{     "v" : 1,     "key" : {         "_fts" : "text",         "_ftsx" : 1     },     "name" : "scores.type_text",     "ns" : "test.grades",     "weights" : {         "scores.type" : 1     },     "default_language" : "english",     "language_override" : "language",     "textindexversion" : 3 } 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -