python - Computing differences accross rows with multiple index columns -

i have dataframe 1 column representing time, , additional columns representing other parts of key.

df = pd.dataframe(data=[(t, l1, l2, t * t * (1 + l2 + l1))                          t in range(3)                          l1 in [3, 4]                          l2 in [10, 100]],                    columns=['t', 'l1', 'l2', 'x'])      t   l1  l2  x 0   0   3   10  0 1   0   3   100 0 2   0   4   10  0 3   0   4   100 0 4   1   3   10  14 5   1   3   100 104 6   1   4   10  15 7   1   4   100 105 8   2   3   10  56 9   2   3   100 416 10  2   4   10  60 11  2   4   100 420

i'm looking difference in 'x' column row previous value of 't', same values 'l1', , 'l2'.

    t   l1  l2  x   t.1 delta_x 0   0   3   10  0   1   nan 1   0   3   100 0   1   nan 2   0   4   10  0   1   nan 3   0   4   100 0   1   nan 4   1   3   10  14  2   14.0 5   1   3   100 104 2   104.0 6   1   4   10  15  2   15.0 7   1   4   100 105 2   105.0 8   2   3   10  56  3   42.0 9   2   3   100 416 3   312.0 10  2   4   10  60  3   45.0 11  2   4   100 420 3   315.0

i can generate frame following code.

df['t.1'] = df.t + 1 df['delta_x'] = df.x - df.merge(df, left_on=['t', 'l1', 'l2'],                                  right_on=['t.1', 'l1', 'l2'],                                  how='left',                                  suffixes=['','.1'])['x.1']

is there cleaner or more efficient way this?

you must use groupby on l1 , l2 columns want compare difference of x column pair of these values(l1, l2) depending on change in value of t column.

by default, diff computes difference between value of (t=1) , (t=0) grouped l1 & l2 , returns result. so, if want find difference in x values between (t=2) , (t=0), need diff(periods=2).

and, use tranform method return computed diffs within each group of group chunk.

in [3]: df['delta_x'] = df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff())  in [4]: df out[4]:      t  l1   l2    x  delta_x 0   0   3   10    0      nan 1   0   3  100    0      nan 2   0   4   10    0      nan 3   0   4  100    0      nan 4   1   3   10   14     14.0 5   1   3  100  104    104.0 6   1   4   10   15     15.0 7   1   4  100  105    105.0 8   2   3   10   28     14.0 9   2   3  100  208    104.0 10  2   4   10   30     15.0 11  2   4  100  210    105.0

timing constraints:

in [5]: %timeit df['delta_x'] = df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff()) 1000 loops, best of 3: 1.55 ms per loop  in [17]: %timeit df['delta_x'] = df.x - df.merge(df, left_on=['t', 'l1', 'l2'], right_on=['t.1', 'l1', 'l2'],how='left',suffixes=['','.1'])['x.1'] 100 loops, best of 3: 3.33 ms per loop

Search This Blog

Ben

python - Computing differences accross rows with multiple index columns -

Comments

Post a Comment

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

post - imageshack API cURL -

dataset - MPAndroidchart returning no chart Data available -