python - Finding the most common pair, trio, etc. in a list of 20 random numbers, played 100 times -
so have csv 100 rows, 20 numbers on each row separated comma such:
23, 52, 63, 76, 23, 45, ... 39, 52, 83, 33, 35, 23, ... etc. i write algorithm finds different combinations of pair, , list amount of times has occurred, , same trio, quad, etc.
obviously each line there lot of combinations (since order wouldn't matter)
this answer every line
so 15 combinations per line.. can imagine how many there if 20 instead of 6, let's not worry now.
since there 15, wouldn't want program show me every single combination, "count" higher 1, "count" being number of times have occurred.
so if ran program 2 lines above, 6 numbers instead of 20, program return
23, 52: 2 see how pair occurred more once shown, that's i'd do, easy, not show if count equal 1.
anyway, how started on creating algorithm? don't know start, guess begin scraping every line, , getting every pair, how this?
thank in advance, , no doubt keep hacking @ problem, if find solution post (as code). thank again.
code annotated inline.
import itertools lines = [[23, 52, 63, 76, 23, 45], [39, 52, 83, 33, 35, 23]] # set store unique elements # across lines uniq = set() # list hold dict each line # these dicts contain frequency # of each unique number in line freq = [{} _ in range(len(lines))] index, line in enumerate(lines): j in line: freq[index][j] = freq[index].get(j, 0) + 1 uniq.add(j) # store frequency of each possible pair counter = {} # there k * (k - 1) / 2 combinations, # k number of unique pairs in itertools.combinations(uniq, 2): j in range(len(lines)): freq1 = freq[j].get(i[0], 0) freq2 = freq[j].get(i[1], 0) # multiplying frequencies gives number # of pairs these numbers in line counter[i] = counter.get(i, 0) + (freq1 * freq2) # performs descending sort on pairs sol = sorted(counter.items(), key=lambda value: value[1], reverse=true) print(sol) output:
[((52, 23), 3), ((45, 23), 2), ((23, 63), 2), ((76, 23), 2), ((33, 35), 1), ((76, 45), 1), ((52, 63), 1), ((39, 83), 1), ((76, 52), 1), ((45, 52), 1), ((35, 39), 1), ((39, 23), 1), ((33, 52), 1), ((39, 52), 1), ((33, 23), 1), ((35, 23), 1), ((35, 52), 1), ((76, 63), 1), ((33, 83), 1), ((35, 83), 1), ((33, 39), 1), ((83, 23), 1), ((45, 63), 1), ((83, 52), 1), ((83, 63), 0), ((35, 76), 0), ((33, 76), 0), ((35, 45), 0), ((33, 45), 0), ((45, 83), 0), ((39, 63), 0), ((33, 63), 0), ((39, 76), 0), ((39, 45), 0), ((76, 83), 0), ((35, 63), 0)] beware! algorithm run in ~ o(n^4) if list consists of unique values.
edit: here version should perform better (it maintains uniq set every line, , hence avoids zero-size combinations).
import functools import itertools import operator lines = [[23, 52, 63, 76, 23, 45], [39, 52, 83, 33, 35, 23]] # set variable before use! permlength = 2 uniq = [set() _ in range(len(lines))] freq = [{} _ in range(len(lines))] index, line in enumerate(lines): j in line: freq[index][j] = freq[index].get(j, 0) + 1 uniq[index].add(j) counter = {} print("total number of lines: {}".format(len(lines))) in range(len(lines)): print("line {}...".format(i + 1)) j in itertools.combinations(uniq[i], permlength): freqp = [freq[i].get(j[x], 0) x in range(permlength)] counter[j] = counter.get(j, 0) + functools.reduce(operator.mul, freqp) sol = sorted(counter.items(), key=lambda value: value[1], reverse=true) print(sol) output:
[((52, 23), 3), ((45, 23), 2), ((63, 23), 2), ((76, 23), 2), ((76, 52), 1), ((76, 45), 1), ((52, 63), 1), ((39, 83), 1), ((35, 39), 1), ((45, 52), 1), ((33, 52), 1), ((39, 52), 1), ((33, 23), 1), ((35, 23), 1), ((35, 52), 1), ((39, 23), 1), ((76, 63), 1), ((33, 83), 1), ((33, 39), 1), ((83, 23), 1), ((45, 63), 1), ((83, 52), 1), ((33, 35), 1), ((35, 83), 1)] 
Comments
Post a Comment