python - Cython dictionary / map -
i have list of element, label pairs this: [(e1, l1), (e2, l2), (e3, l1)]
i have count how many labels 2 element have in common - ie. in list above e1
and e3
have label l1
in common , 1 label in common.
i have python implementation:
def common_count(e_l_list): count = defaultdict(int) l_list = defaultdict(set) e1, l in e_l_list: e2 in l_list[l]: if e1 == e2: continue elif e1 > e2: count[e1,e2] += 1 else: count[e2,e1] += 1 l_list[l].add(e1) return count
it takes list 1 above , computes dictionary of element pairs , counts. result list above should give {(e1, e2): 1}
now have scale millions of elements , labels , though cython solution save cpu time , memory. can't find docs on how use maps in cython.
how implement above in pure cython?
it can asumed elements , labels unsigned integers.
thanks in advance :-)
i think trying on complicate creating pairs of elements , storing common labels value when can create dict element key , have list of values associated element. when want find common labels convert lists set , perform intersection on them, resulting set have common labels between two. average time of intersection, checked ~20000 lists, 0.006 or fast
i tested this code
from collections import * import random import time l =[] in xrange(10000000): #with element range 0-10000000 dictionary creation time increases ~16 seconds l.append((random.randrange(0,50000),random.randrange(0,50000))) start = time.clock() d = defaultdict(list) in l: #o(n) d[i[0]].append(i[1]) #o(n) print time.clock()-start times = [] in xrange(10000): start = time.clock() tmp = set(d[random.randrange(0,50000)]) #picks random list of labels tmp2 = set(d[random.randrange(0,50000)]) #not guaranteed different list more times.append(time.clock()-start) common_elements = tmp.intersection(tmp2) print sum(times)/100.0 18.6747529999 #creation of list 4.17812619876 #creation of dictionary 0.00633531142994 #intersection
note: times change depending on number of labels. creating dict might long situation 1 time operation.
i highly not recommend creating pairs of elements. if have 5,000,000 elements , share @ least 1 label, worst case, looking @ 1.24e+13 pairs or, more bluntly, 12.5 trillion. ~1700 terabytes or ~1.7 petabytes
Comments
Post a Comment