Python & Pandas: series to timedelta -
m col in dataframe df indicates number of month.
m 1 0 15
i trying find number of days between 2015-01-01 , 2015-01-01 + df.m. following col want get.
daynum 31 0 456
i know how using loop , list:
int((datetime.strptime("2015-01-01", "%y-%m-%d") + relativedelta(months=df.m[i]) - datetime.strptime("2015-01-01", "%y-%m-%d")).days)
is there build-in function in pandas can solve problem easily?
you can use same approach in question, using automatic vectorized operations instead of looping.
first convert series of integers relativedelta's:
in [76]: m = pd.series([1, 0, 15]) in [77]: m2 = m.apply(lambda x: dateutil.relativedelta.relativedelta(months=x)) in [78]: m2 out[78]: 0 relativedelta(months=+1) 1 relativedelta() 2 relativedelta(years=+1, months=+3) dtype: object
then can same calculation:
in [80]: (pd.timestamp('2015-01-01') + m2) - pd.timestamp('2015-01-01') out[80]: 0 31 days 1 0 days 2 456 days dtype: timedelta64[ns]
if want have integer values instead of timedelta above, can .dt.days
:
in [81]: days = (pd.timestamp('2015-01-01') + m2) - pd.timestamp('2015-01-01') in [82]: days.dt.days out[82]: 0 31 1 0 2 456 dtype: int64
reason not use timedelta
in case, cannot work timedelta, not shift date amount of months, appears give kind of mean month length:
in [83]: pd.to_timedelta(1, unit='m') out[83]: timedelta('30 days 10:29:06') in [84]: (pd.timestamp('2015-01-01') + pd.to_timedelta(m, unit='m')) - pd.timestamp('2015-01-01') out[84]: 0 30 days 10:29:06 1 0 days 00:00:00 2 456 days 13:16:30 dtype: timedelta64[ns]
so give different answers. example in case, gives 30 days instead of 31 first element.
the pandas equivalent relativedelta
use dateoffset
. in case eg pd.dateoffset(months=1)
Comments
Post a Comment