Indexed full-text search count in MongoDB is exremely slow... and what is "TEXT" stage? -
so have mongodb 3.0.7 running on amazon r3.xlarge box (4 cores, 30gb of ram, 80gb ssd) , have stored 40gb worth of semi-complex json data, 7 million documents. looks this:
{ "_id": objectid("5665f1fef8adb3ee597af375"), "some_other_id": "xxxxxxxxxxxxxxxx", "meta": { ... }, "raw": { ... "text": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", ... }, "__v": 0, "created_at": isodate("2015-12-07t20:54:10z") }
i've indexed 1 text field using following command:
db.messages.createindex({"raw.text":"text"})
this index looks like:
{ "v" : 1, "key" : { "_fts" : "text", "_ftsx" : 1 }, "name" : "raw.text_text", "ns" : "dbdb.messages", "weights" : { "raw.text" : 1 }, "default_language" : "english", "language_override" : "language", "textindexversion" : 2 }
this yielded index size of 3gb 40gb data. in addition, have 4 more indexes on collection, numeric. in total have 5 indexes on collection , total size of index 4gb.
i'm performing count() operation on query such as:
db.messages.find({$text:{$search:"mouse"}}).count()
what i've been finding time takes count return more or less proportional number of jsons match find() criteria. condition returns bunch of results, 5 million, takes @ least 30 minutes! again, i'm not returning result. i'm trying find out how many of documents have single keyword in it.
this explain(). according link (https://docs.mongodb.org/v3.0/reference/explain-results/) "text" not 1 of stages. don't know means , google search getting me nothing. mention text index i'm hoping full-text index being used in count() operation.
> db.messages.explain().find({$text:{$search:"mouse"}}).count() { "queryplanner" : { "plannerversion" : 1, "namespace" : "dbdb.messages", "indexfilterset" : false, "parsedquery" : { "$text" : { "$search" : "mouse", "$language" : "" } }, "winningplan" : { "stage" : "count", "inputstage" : { "stage" : "text", "indexprefix" : { }, "indexname" : "raw.text_text", "parsedtextquery" : { } } }, "rejectedplans" : [ ] }, "serverinfo" : { "host" : "xxxxxxxxxxxx", "port" : 27017, "version" : "3.0.7", "gitversion" : "6ce7cbe8c6b899552dadd907604559806aa2e9bd" }, "ok" : 1 }
i have 30gb of physical ram on box , indexes total 4gb. see when perform these queries, mongod using 8gb. i'm assuming loads indexes entirely in memory. i'd expect every find().count() operation return within seconds if it's running on index. i'm not performing other database operations inserts or other queries. took else out isolate database.
i'm doing wrong here feel i've done according i've read online. i'm new mongodb, though.
finally, the currentop() output after running query 5 mins or so:
> db.currentop() { "inprog" : [ { "desc" : "conn1", "threadid" : "0x316f1e0", "connectionid" : 1, "opid" : 94, "active" : true, "secs_running" : 539, "microsecs_running" : numberlong(539798074), "op" : "query", "ns" : "dbdb.messages", "query" : { "count" : "messages", "query" : { "$text" : { "$search" : "mouse" } }, "fields" : { } }, "plansummary" : "text {}", "client" : "127.0.0.1:37784", "numyields" : 3478, "locks" : { "global" : "r", "mmapv1journal" : "r", "database" : "r", "collection" : "r" }, "waitingforlock" : false, "lockstats" : { "global" : { "acquirecount" : { "r" : numberlong(6958) } }, "mmapv1journal" : { "acquirecount" : { "r" : numberlong(3479) } }, "database" : { "acquirecount" : { "r" : numberlong(3479) } }, "collection" : { "acquirecount" : { "r" : numberlong(3479) } } } } ] }
edit: adding "executionstats" output edit: upgraded mongodb 3.2.1. these new execution stats. previous execution stats using different search word because "mouse" took way long. time, let "mouse query finish" , can see took 40 mins.
> db.messages.explain("executionstats").find({$text:{$search:"mouse"}}).count() { "queryplanner" : { "plannerversion" : 1, "namespace" : "dbdb.messages", "indexfilterset" : false, "parsedquery" : { "$text" : { "$search" : "mouse", "$language" : "english", "$casesensitive" : false, "$diacriticsensitive" : false } }, "winningplan" : { "stage" : "count", "inputstage" : { "stage" : "text", "indexprefix" : { }, "indexname" : "raw.text_text", "parsedtextquery" : { "terms" : [ "mous" ], "negatedterms" : [ ], "phrases" : [ ], "negatedphrases" : [ ] }, "inputstage" : { "stage" : "text_match", "inputstage" : { "stage" : "text_or", "inputstage" : { "stage" : "ixscan", "keypattern" : { "_fts" : "text", "_ftsx" : 1 }, "indexname" : "raw.text_text", "ismultikey" : true, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "backward", "indexbounds" : { } } } } } }, "rejectedplans" : [ ] }, "executionstats" : { "executionsuccess" : true, "nreturned" : 0, "executiontimemillis" : 2305641, "totalkeysexamined" : 5984163, "totaldocsexamined" : 5984163, "executionstages" : { "stage" : "count", "nreturned" : 0, "executiontimemillisestimate" : 2277920, "works" : 11968329, "advanced" : 0, "needtime" : 11968328, "needyield" : 0, "savestate" : 179094, "restorestate" : 179094, "iseof" : 1, "invalidates" : 0, "ncounted" : 5984163, "nskipped" : 0, "inputstage" : { "stage" : "text", "nreturned" : 5984163, "executiontimemillisestimate" : 2276450, "works" : 11968329, "advanced" : 5984163, "needtime" : 5984165, "needyield" : 0, "savestate" : 179094, "restorestate" : 179094, "iseof" : 1, "invalidates" : 0, "indexprefix" : { }, "indexname" : "raw.text_text", "parsedtextquery" : { "terms" : [ "mous" ], "negatedterms" : [ ], "phrases" : [ ], "negatedphrases" : [ ] }, "inputstage" : { "stage" : "text_match", "nreturned" : 5984163, "executiontimemillisestimate" : 2275970, "works" : 11968329, "advanced" : 5984163, "needtime" : 5984165, "needyield" : 0, "savestate" : 179094, "restorestate" : 179094, "iseof" : 1, "invalidates" : 0, "docsrejected" : 0, "inputstage" : { "stage" : "text_or", "nreturned" : 5984163, "executiontimemillisestimate" : 2275550, "works" : 11968329, "advanced" : 5984163, "needtime" : 5984165, "needyield" : 0, "savestate" : 179094, "restorestate" : 179094, "iseof" : 1, "invalidates" : 0, "docsexamined" : 5984163, "inputstage" : { "stage" : "ixscan", "nreturned" : 5984163, "executiontimemillisestimate" : 11520, "works" : 5984164, "advanced" : 5984163, "needtime" : 0, "needyield" : 0, "savestate" : 179094, "restorestate" : 179094, "iseof" : 1, "invalidates" : 0, "keypattern" : { "_fts" : "text", "_ftsx" : 1 }, "indexname" : "raw.text_text", "ismultikey" : true, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "backward", "indexbounds" : { }, "keysexamined" : 5984163, "dupstested" : 5984163, "dupsdropped" : 0, "seeninvalidated" : 0 } } } } } }, "serverinfo" : { "host" : "xxxxxxxxxxxxxxxxxx", "port" : 27017, "version" : "3.2.1", "gitversion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2" }, "ok" : 1 }
i don't believe query using index.
here how looks on machine.
{ "queryplanner" : { "plannerversion" : 1, "namespace" : "test.grades", "indexfilterset" : false, "parsedquery" : { "$text" : { "$search" : "homework", "$language" : "english", "$casesensitive" : false, "$diacriticsensitive" : false } }, "winningplan" : { "stage" : "count", "inputstage" : { "stage" : "text", "indexprefix" : { }, "indexname" : "scores.type_text", "parsedtextquery" : { "terms" : [ "homework" ], "negatedterms" : [ ], "phrases" : [ ], "negatedphrases" : [ ] }, "inputstage" : { "stage" : "text_match", "inputstage" : { "stage" : "text_or", "inputstage" : { "stage" : "ixscan", "keypattern" : { "_fts" : "text", "_ftsx" : 1 }, "indexname" : "scores.type_text", "ismultikey" : true, "isunique" : false, "issparse" : false, "ispartial" : false, "indexversion" : 1, "direction" : "backward", "indexbounds" : { } } } } } }, "rejectedplans" : [ ] }, "serverinfo" : { "host" : "xxxxxxxxxxx", "port" : 27017, "version" : "3.2.1", "gitversion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2" }, "ok" : 1 }
you can see it's using ixscan. if possible, please drop index , re-create. i'd use ensureindex perform action.
you may want better picture of what's going on more verbosity using db.messages.explain("executionstats").find({$text:{$search:"mouse"}}).count()
edit: index on machine looks like:
{ "v" : 1, "key" : { "_fts" : "text", "_ftsx" : 1 }, "name" : "scores.type_text", "ns" : "test.grades", "weights" : { "scores.type" : 1 }, "default_language" : "english", "language_override" : "language", "textindexversion" : 3 }
Comments
Post a Comment