c++ - Why does the unordered_map perform about 100 times slower than a 2d vector for look-ups? -


  • i tried implementing knapsack algorithm large data sets.

  • the 2d vector solution works medium data sets around 100 items.

  • since 2d vector won't feasible large datasets involving around 1000 items, decided use hashtable , cache values required.

  • i have used hash_value() boost hash std::pair unordered_map.

  • but don't understand why solution works incredibly slower 2d vector solution. aren't hashtables meant super fast ups?

  • both implementations fail process large data set in finite time.

  • i've attached code , both "medium" , "large" data sets.

  • the code has both unordered_map , 2d vector implementations latter commented out.

  • it helpful if point out reason slow performance , suggest optimization able process large dataset.

  • the input file of form. (eg):

6 4 //weight, no of items 3 4 2 3 4 2 4 3

the optimal solution 8.

download link large dataset (1000 items)

download link medium dataset (100 items)

download link source code

//code follows:

//headers, macros , global variables:

#include<iostream> #include<vector> #include<algorithm> #include<fstream> #include<string> #include<sstream> #include<unordered_map> #include<boost/functional/hash.hpp> using namespace std;  typedef vector< vector<long long> > vii; typedef vector< pair<int,int> > vp; typedef pair<int,int> mypair; typedef unordered_map< mypair, long long, boost::hash<mypair> > mymap;  vp elmnts; //vii arr2d; mymap arr; 

//knapsack function:

long long knapsack(int n, int w) { //arr2d.resize(n+1, vector<long long>(w+1)); int vi,wi;  for(int j=0; j<=w; ++j) //  arr2d[0][j] = 0;     arr.emplace(make_pair(0,j), 0);  for(int i=1; i<=n; ++i) {     vi = elmnts[i-1].first;     wi = elmnts[i-1].second;      for(int j=0; j<=w; ++j)     //  arr2d[i][j] = (wi > j) ? arr2d[i-1][j] : max(arr2d[i-1][j], arr2d[i-1][j-wi] + vi);         arr.emplace(make_pair(i,j), (wi > j) ? arr[make_pair(i-1,j)] : max(arr[make_pair(i-1,j)], arr[make_pair(i-1,j-wi)]+ vi)); }  //return arr2d[n][w]; return arr[make_pair(n,w)]; } 

//main fucntion

int main() { ifstream file("/home/tauseef/desktop/daa2/knapsack1.txt"); int n,w; string line; pair<int,int> elmnt;  getline(file, line); stringstream ss(line); ss >> w; ss >> n;  while(getline(file, line)) {     stringstream ss1(line);     ss1 >> elmnt.first;     ss1 >> elmnt.second;     elmnts.push_back(elmnt); }  cout<<"\nthe optimal solution is: "<<knapsack(n,w)<<endl; file.close(); } 

didn't expect difference huge: on machine array version 100 times faster hash_map version. after thinking it...

you should expect map being slower - there lot of overhead: invoking make_pair, creating pair-object, calculating hash, searching in map, constructing return value, copying objects , forth opposed looking-up value!

on other hand don't profit switch map at all because in end, coded right now, have same elements in map in array. change make sense if leave elements out map, don't it.

but bigger problem in code use pseudo-polinomial algorithm wikipedia needs o(n*w) memory. means need 32gb memory bigger test cases, mean swapping memory hard disc, depending on how big system , getting sloooow.

the solution take version of algorithm needs o(w) memory:

long long knapsack(const vector<mypair> &objects, int w) {     vector<long long> bests(w+1, -1l);//-1 = value not reachable     //the possible configuration @ start: empty set, value 0     bests[0]=0;       for(const mypair &object:objects){//try out objects         int v = object.first;         int w = object.second;         //update possible configurations:         for(int cur=w;cur>=0;--cur){//backwards->object @ once in every configuration!           int next=cur+w;           if(bests[cur]!=-1 && next<=w)//consider reachable configurations , discard big weights             bests[next]=std::max(bests[next], bests[cur]+v);        }     }      return *max_element(bests.begin(), bests.end()); } 

the important part go backwards through possible configurations , can update configurations in-place (the updated configurations ones proceeded in current sweep).

i guess version should need less 1 minute bigger case (which pretty reasonable considering how big input is). don't guaranty bug-free, hope can gist of it.


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -