pandas - Timestamp String to Seconds Conversion in Python -


i'm beginner in python data science. i'm working on clickstream data , want find out duration of session. find start time , end time of session. on subtraction, i'm getting wrong answer same. here data

        sid                    tstamp     itemid  category 0        1  2014-04-07t10:51:09.277z  214536502         0 1        1  2014-04-07t10:54:09.868z  214536500         0 2        1  2014-04-07t10:54:46.998z  214536506         0 3        1  2014-04-07t10:57:00.306z  214577561         0 4        2  2014-04-07t13:56:37.614z  214662742         0 5        2  2014-04-07t13:57:19.373z  214662742         0 6        2  2014-04-07t13:58:37.446z  214825110         0 7        2  2014-04-07t13:59:50.710z  214757390         0 8        2  2014-04-07t14:00:38.247z  214757407         0 9        2  2014-04-07t14:02:36.889z  214551617         0 10       3  2014-04-02t13:17:46.940z  214716935         0 11       3  2014-04-02t13:26:02.515z  214774687         0 12       3  2014-04-02t13:30:12.318z  214832672         0 

i referred question code- timestamp conversion

here code-

k.columns=['sid','tstamp','itemid','category'] k=k.loc[:,('sid','tstamp')]  #find max timestamp idx=k.groupby(['sid'])['tstamp'].transform(max) == k['tstamp']  ah=k[idx].reset_index()  #find min timestamp idy=k.groupby(['sid'])['tstamp'].transform(min) == k['tstamp'] ai=k[idy].reset_index()  #grouping sid , applying count retain distinct sid values kgrp=k.groupby('sid').count()  i=0 temp1,temp2 in zip(ah['tstamp'],ai['tstamp']):     sv1= datetime.datetime.strptime(temp1, "%y-%m-%dt%h:%m:%s.%fz")     sv2= datetime.datetime.strptime(temp2, "%y-%m-%dt%h:%m:%s.%fz")     d1=time.mktime(sv1.timetuple()) + (sv1.microsecond / 1000000.0)     d2=time.mktime(sv2.timetuple()) + (sv2.microsecond / 1000000.0)     kgrp.loc[i,'duration']= d1-d2        i=i+1 

here output.

  kgrp out[5]:        tstamp  duration sid                    1          4   359.275 2          6   745.378 3          3  1034.468 

for session id 2, duration should close 6 minutes i'm getting 12 minutes. reckon i'm making silly mistake here.

also, i'm grouping sid , applying count on sid column , store each duration separate column. there easier method through can store sid (not 'tstamp' count column) , duration values?

you assigning duration value wrong label. in test data sid starts 1 i starts 0:

# sid 1, == 0 kgrp.loc[i,'duration']= d1-d2 i=i+1 

update

a more pythonic way handle :)

def calculate_duration(dt1, dt2):     # calculation here, return duration in seconds  k = k.loc[:, ('sid', 'tstamp')] result = k.groupby(['sid'])['tstamp'].agg({     'duration': lambda x: calculate_duration(x.max(), x.min()),     'count': lambda x: x.count() }) 

Comments

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

java - No use of nillable="0" in SOAP Webservice -

ubuntu - Laravel 5.2 quickstart guide gives Not Found Error -