python - Cython dictionary / map -


i have list of element, label pairs this: [(e1, l1), (e2, l2), (e3, l1)]

i have count how many labels 2 element have in common - ie. in list above e1and e3have label l1 in common , 1 label in common.

i have python implementation:

def common_count(e_l_list):     count = defaultdict(int)     l_list = defaultdict(set)      e1, l in e_l_list:         e2 in l_list[l]:             if e1 == e2:                 continue             elif e1 > e2:                 count[e1,e2] += 1             else:                 count[e2,e1] += 1           l_list[l].add(e1)      return count 

it takes list 1 above , computes dictionary of element pairs , counts. result list above should give {(e1, e2): 1}

now have scale millions of elements , labels , though cython solution save cpu time , memory. can't find docs on how use maps in cython.

how implement above in pure cython?

it can asumed elements , labels unsigned integers.

thanks in advance :-)

i think trying on complicate creating pairs of elements , storing common labels value when can create dict element key , have list of values associated element. when want find common labels convert lists set , perform intersection on them, resulting set have common labels between two. average time of intersection, checked ~20000 lists, 0.006 or fast

i tested this code

from collections import * import random import time  l =[] in xrange(10000000):     #with element range 0-10000000 dictionary creation time increases ~16 seconds      l.append((random.randrange(0,50000),random.randrange(0,50000)))  start = time.clock()     d = defaultdict(list) in l: #o(n)     d[i[0]].append(i[1]) #o(n)   print time.clock()-start  times = [] in xrange(10000):     start = time.clock()     tmp = set(d[random.randrange(0,50000)]) #picks random list of labels     tmp2 = set(d[random.randrange(0,50000)]) #not guaranteed different list more     times.append(time.clock()-start)     common_elements = tmp.intersection(tmp2)  print sum(times)/100.0  18.6747529999 #creation of list 4.17812619876 #creation of dictionary 0.00633531142994 #intersection 

note: times change depending on number of labels. creating dict might long situation 1 time operation.

i highly not recommend creating pairs of elements. if have 5,000,000 elements , share @ least 1 label, worst case, looking @ 1.24e+13 pairs or, more bluntly, 12.5 trillion. ~1700 terabytes or ~1.7 petabytes


Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -