python - How to compare group sizes in pandas -

maybe i'm thinking of in wrong way cannot think of easy way in pandas. trying dataframe filtered relation between count values above setpoint compared below it. further complicated

contrived example: let's have dataset of people , test scores on several tests:

 person | day | test score | ---------------------------- bob      1     10 bob      2     40 bob      3     45 mary     1     30 mary     2     35 mary     3     45

i want filter dataframe number of test scores >= 40 compared total each person. let's set threshold 50%. bob have 2/3 of test scores mary 1/3 , excluded.

my end goal have groupby object means/etc. on matched threshold. in case this:

          test score person | above_count | total | score mean | ------------------------------------------- bob      2             3       31.67

i have tried following couldn't figure out groupby object.

df = pd.read_csv("all_data.csv") gb  = df.groupby('person') df2 = df[df['test_score'] >= 40] gb2 = df2.groupby('person')  # me count each person how compare it? gb.size()

import pandas pd  df = pd.dataframe({'person': ['bob'] * 3 + ['mary'] * 4,                     'day': [1, 2, 3, 1, 2, 3, 4],                     'test_score': [10, 40, 45, 30, 35, 45, 55]})  >>> df   person  day  test_score 0    bob    1          10 1    bob    2          40 2    bob    3          45 3   mary    1          30 4   mary    2          35 5   mary    3          45 6   mary    4          55

in groupby operation, can pass different functions perform on same column via dictionary.

result =  df.groupby('person').test_score.agg(               {'total': pd.series.count,                 'test_score_above_mean': lambda s: s.ge(40).sum(),                 'score mean': np.mean}) >>> result         test_score_above_mean  total  score mean person                                           bob                         2      3   31.666667 mary                        2      4   41.250000  >>> result[result.test_score_above_mean.gt(result.total * .5)]         test_score_above_mean  total  score mean person                                           bob                         2      3   31.666667

Search This Blog

Ben

python - How to compare group sizes in pandas -

Comments

Post a Comment

Popular posts from this blog

sublimetext3 - what keyboard shortcut is to comment/uncomment for this script tag in sublime -

dataset - MPAndroidchart returning no chart Data available -

post - imageshack API cURL -