python 2.7 - Convert from irregular frequency to monthly within groupby objects in pandas dataframe -


i have dataframe df containing userid, date of observation (usually quarterly frequency, irregular) , characteristic value, example:

from util.dates import dates, to_date import pandas pd df = pd.dataframe(dict(      refissuerid=[11590] * 3 + [115948] * 4,      availabledate=[to_date(d) d in (20050613, 20050905, 20051214,                     20040924, 20041101, 20050202,20050516)],       characteristic=[0.06, 0.09, 0.07, 0.13, 0.09, 0.06, 0.04]))  userid     date    characteristic 115950  6/13/2005   0.06 115950  9/5/2005    0.09 115950  12/14/2005  0.07 115948  9/24/2004   0.13 115948  11/1/2004   0.09 115948  2/2/2005    0.06 115948  5/16/2005   0.04 

i trying upsample monthly frequency within userid groups. is, looking smth (sorted userid , date)

userid  date    characteristic  month_date 115950  6/13/2005   0.06    6/30/2005 115950  6/13/2005   0.06    7/31/2005 115950  6/13/2005   0.06    8/31/2005 115950  9/5/2005    0.09    9/30/2005 115950  9/5/2005    0.09    10/31/2005 115950  9/5/2005    0.09    11/30/2005 115950  12/14/2005  0.07    12/31/2005 115950  12/14/2005  0.07    1/31/2006 115950  12/14/2005  0.07    2/28/2006 115948  9/24/2004   0.13    9/30/2004 115948  9/24/2004   0.13    10/31/2004 115948  11/1/2004   0.09    11/30/2004 115948  11/1/2004   0.09    12/31/2004 115948  11/1/2004   0.09    1/31/2005 115948  2/2/2005    0.06    2/28/2005 115948  2/2/2005    0.06    3/31/2005 115948  2/2/2005    0.06    4/30/2005 115948  5/16/2005   0.04    5/31/2005 115948  5/16/2005   0.04    6/30/2005 115948  5/16/2005   0.04    7/31/2005 

note record 115948 9/24/2004 0.13 gets upsampled twice because next available date 11/1/2004, generates month_date of 11/30/2004 in upsampled set.

tried applying resample on groupby dataframe:

newdf=df.groupby(['userid']).resample("m",fill_method='ffill') 

but not produce desired result. guidance/advice appreciated.

you can use resample reset_index:

import pandas pd  df_dg = pd.dataframe(dict(      userid=[11590] * 3 + [115948] * 4,      date=[20050613, 20050905, 20051214,                     20040924, 20041101, 20050202,20050516],       characteristic=[0.06, 0.09, 0.07, 0.13, 0.09, 0.06, 0.04]), columns=['userid','date','characteristic'])   df_dg['date'] = pd.to_datetime(df_dg['date'], format="%y%m%d") print df_dg    userid       date  characteristic 0   11590 2005-06-13            0.06 1   11590 2005-09-05            0.09 2   11590 2005-12-14            0.07 3  115948 2004-09-24            0.13 4  115948 2004-11-01            0.09 5  115948 2005-02-02            0.06 6  115948 2005-05-16            0.04   df_dg['date1'] = df_dg['date']  newdf = df_dg.groupby('userid').apply(lambda x: x.set_index('date').resample('m', how='first',fill_method='ffill')).reset_index(drop=true, level=0).reset_index() newdf = newdf.rename(columns={'date':'month_date', 'date1':'date'}) newdf = newdf[['userid','date','characteristic','month_date']] 
print newdf     userid       date  characteristic month_date 0    11590 2005-06-13            0.06 2005-06-30 1    11590 2005-06-13            0.06 2005-07-31 2    11590 2005-06-13            0.06 2005-08-31 3    11590 2005-09-05            0.09 2005-09-30 4    11590 2005-09-05            0.09 2005-10-31 5    11590 2005-09-05            0.09 2005-11-30 6    11590 2005-12-14            0.07 2005-12-31 7   115948 2004-09-24            0.13 2004-09-30 8   115948 2004-09-24            0.13 2004-10-31 9   115948 2004-11-01            0.09 2004-11-30 10  115948 2004-11-01            0.09 2004-12-31 11  115948 2004-11-01            0.09 2005-01-31 12  115948 2005-02-02            0.06 2005-02-28 13  115948 2005-02-02            0.06 2005-03-31 14  115948 2005-02-02            0.06 2005-04-30 15  115948 2005-05-16            0.04 2005-05-31 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -