python - How to compare group sizes in pandas -
maybe i'm thinking of in wrong way cannot think of easy way in pandas. trying dataframe filtered relation between count values above setpoint compared below it. further complicated
contrived example: let's have dataset of people , test scores on several tests:
person | day | test score | ---------------------------- bob 1 10 bob 2 40 bob 3 45 mary 1 30 mary 2 35 mary 3 45
i want filter dataframe number of test scores >= 40 compared total each person. let's set threshold 50%. bob have 2/3 of test scores mary 1/3 , excluded.
my end goal have groupby object means/etc. on matched threshold. in case this:
test score person | above_count | total | score mean | ------------------------------------------- bob 2 3 31.67
i have tried following couldn't figure out groupby object.
df = pd.read_csv("all_data.csv") gb = df.groupby('person') df2 = df[df['test_score'] >= 40] gb2 = df2.groupby('person') # me count each person how compare it? gb.size()
import pandas pd df = pd.dataframe({'person': ['bob'] * 3 + ['mary'] * 4, 'day': [1, 2, 3, 1, 2, 3, 4], 'test_score': [10, 40, 45, 30, 35, 45, 55]}) >>> df person day test_score 0 bob 1 10 1 bob 2 40 2 bob 3 45 3 mary 1 30 4 mary 2 35 5 mary 3 45 6 mary 4 55
in groupby
operation, can pass different functions perform on same column via dictionary.
result = df.groupby('person').test_score.agg( {'total': pd.series.count, 'test_score_above_mean': lambda s: s.ge(40).sum(), 'score mean': np.mean}) >>> result test_score_above_mean total score mean person bob 2 3 31.666667 mary 2 4 41.250000 >>> result[result.test_score_above_mean.gt(result.total * .5)] test_score_above_mean total score mean person bob 2 3 31.666667
Comments
Post a Comment