python - Computing differences accross rows with multiple index columns -
i have dataframe 1 column representing time, , additional columns representing other parts of key.
df = pd.dataframe(data=[(t, l1, l2, t * t * (1 + l2 + l1)) t in range(3) l1 in [3, 4] l2 in [10, 100]], columns=['t', 'l1', 'l2', 'x']) t l1 l2 x 0 0 3 10 0 1 0 3 100 0 2 0 4 10 0 3 0 4 100 0 4 1 3 10 14 5 1 3 100 104 6 1 4 10 15 7 1 4 100 105 8 2 3 10 56 9 2 3 100 416 10 2 4 10 60 11 2 4 100 420
i'm looking difference in 'x' column row previous value of 't', same values 'l1', , 'l2'.
t l1 l2 x t.1 delta_x 0 0 3 10 0 1 nan 1 0 3 100 0 1 nan 2 0 4 10 0 1 nan 3 0 4 100 0 1 nan 4 1 3 10 14 2 14.0 5 1 3 100 104 2 104.0 6 1 4 10 15 2 15.0 7 1 4 100 105 2 105.0 8 2 3 10 56 3 42.0 9 2 3 100 416 3 312.0 10 2 4 10 60 3 45.0 11 2 4 100 420 3 315.0
i can generate frame following code.
df['t.1'] = df.t + 1 df['delta_x'] = df.x - df.merge(df, left_on=['t', 'l1', 'l2'], right_on=['t.1', 'l1', 'l2'], how='left', suffixes=['','.1'])['x.1']
is there cleaner or more efficient way this?
you must use groupby
on l1
, l2
columns want compare difference of x
column pair of these values(l1, l2
) depending on change in value of t
column.
by default, diff
computes difference between value of (t=1
) , (t=0
) grouped l1 & l2
, returns result. so, if want find difference in x
values between (t=2
) , (t=0
), need diff(periods=2)
.
and, use tranform
method return computed diffs within each group of group chunk.
in [3]: df['delta_x'] = df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff()) in [4]: df out[4]: t l1 l2 x delta_x 0 0 3 10 0 nan 1 0 3 100 0 nan 2 0 4 10 0 nan 3 0 4 100 0 nan 4 1 3 10 14 14.0 5 1 3 100 104 104.0 6 1 4 10 15 15.0 7 1 4 100 105 105.0 8 2 3 10 28 14.0 9 2 3 100 208 104.0 10 2 4 10 30 15.0 11 2 4 100 210 105.0
timing constraints:
in [5]: %timeit df['delta_x'] = df.groupby(['l1', 'l2'])['x'].transform(lambda x: x.diff()) 1000 loops, best of 3: 1.55 ms per loop in [17]: %timeit df['delta_x'] = df.x - df.merge(df, left_on=['t', 'l1', 'l2'], right_on=['t.1', 'l1', 'l2'],how='left',suffixes=['','.1'])['x.1'] 100 loops, best of 3: 3.33 ms per loop
Comments
Post a Comment