pandas - Memory error when running medium sized merge function ipython notebook jupyter -


i'm trying merge around 100 dataframes loop , getting memory error. i'm using ipython jupyter notebook

here sample of data:

    timestamp   namecoin_cap 0   2013-04-28  5969081 1   2013-04-29  7006114 2   2013-04-30  7049003 

each frame around 1000 lines long

here's error in detail, i've include merge function.

my system using 64% of memory

i have searched similar issues seems large arrays >1gb, data relatively small in comparison.

edit: suspicious. wrote beta program before, test 4 dataframes, exported through pickle , 500kb. when try export 100 frames 1 memory error. export file 2gb. suspect somewhere down line code has created kind of loop, creating large file. nb 100 frames stored in dictionary

edit2: have exported scrypt .py

http://pastebin.com/gqahr7xc

this .xlsx cointains asset names script needs

the script fetches data regarding various assets, cleans , saves each asset data frame in dictionary

i'd appreciative if have , see if there's wrong. other wise please advise on tests can run.

edit3: i'm finding hard understand why happening, code worked fine in beta, have done add more assets.

edit4: ran size check on object (dict of dfs) , 1,066,793 bytes

edit5: problem in merge function coin 37

for coin in coins[:37]:     data2['merged'] = pd.merge(left=data2['merged'],right=data2[coin], left_on='timestamp', right_on='timestamp', how='left') 

this when error occurs. for coin in coins[:36]:' doesn't produce error howeverfor coin in coins[:37]:' produces error, ideas ?

edit6: 36th element 'syscoin', did coins.remove('syscoin') memory problem still occurs. seems problem 36th element in coins no matter coin

edit7: gocards suggestions seemed work next part of code:

merged = data2['merged'] merged['total_mc'] = merged.drop('timestamp',axis=1).sum(axis=1) 

produces memory error. i'm stumped

in regard storage, recommend using simple csv on pickle. csv more generic format. human readable,and can check data quality easier data grows.

file_template_string='%s.csv' eachkey in dfdict:     filename = file_template_string%(eachkey)     dfdict[eachkey].to_csv(filename) 

if need date files can put timestamp in filename.

import time datetime import datetime cur = time.time() cur = datetime.fromtimestamp(cur) file_template_string = "%s_{0}.csv".format(cur.strftime("%m_%d_%y_%h_%m_%s")) 

there obvious errors in code.

for coin in coins: #line 61,89 coin in data: #should  df = data2['namecoin'] #line 87 keys = data2.keys() keys.remove('namecoin') coin in keys:     df = pd.merge(left=df,right=data2[coin], left_on='timestamp', right_on='timestamp', how='left') 

Comments

Popular posts from this blog

routing - AngularJS State management ->load multiple states in one page -

python - GRASS parser() error -

Swift game error message -