python - 确定谁拥有超过 80% 的给定发货人

我有一个问题。我想调查我的假设是否正确，可以说 receiver X经常收到来自 consignor Y 的 express . 我分组了 receiver 和 consignor 为此目的。

但是，我现在想做的是，我想输出所有receivers 给自己。哪里最频繁consignor包含所有 consignors 的 80% 以上在receiver .

例如我们有receiver c这是从consignor 2, 3收到的什么。来自 consignor 2只有 1 包来自 consignor 3 4 包。因此他总共收到了 5 个数据包。因此 consignor 3接受所有 consignor 的 80%在 receiver c 内. 因此 receiver c应该用表格表示。这应该在下面为所有其他人完成。

简单来说，我想要所有receivers超过 80% 的 consignor .

我该怎么做？

我想要什么

receiver  consignor  count_with_most_consignor  count_all
       c          3                          4          5
       d          2                          4          5

数据框

    consignor receiver
0           1        a
1           2        a
2           1        a
3           2        c
4           3        c
5           1        b
6           3        c
7           3        c
8           3        c
9           2        d
10          2        d
11          2        d
12          1        d
13          2        d

代码

import pandas as pd
d = {'consignor': [1, 2, 1, 2, 3, 1, 3, 3, 3, 2, 2, 2, 1, 2],  
     'receiver': ['a', 'a', 'a', 'c', 'c', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd']}
df = pd.DataFrame(data=d)
print(df)
df_  = df.groupby(['receiver','consignor']).size().reset_index(name='counts').sort_values(by=['counts'], ascending=False)
print(df_)

X = 'c'
print(df_[df_['receiver'] == X].shape[1])
print(df_[df_['receiver'] == X])

出去

[OUT]
3
  receiver  consignor  counts
4        c          3       4
3        c          2       1

最佳答案

`交叉表`

# frequency table
s = pd.crosstab(df['receiver'], df['consignor'])

# agg columns along axis=1
d = {
    'consignor': s.idxmax(1),
    'count_with_most_consignor': s.max(1),
    'count_all': s.sum(1)
}

(
    s.assign(**d) # assign the agg columns
    .drop(s.columns, axis=1) # drop the unwanted columns
    .query('count_with_most_consignor / count_all >= .8') # filter the rows
)

结果

consignor  consignor  count_with_most_consignor  count_all
receiver                                                  
b                  1                          1          1
c                  3                          4          5
d                  2                          4          5

https://stackoverflow.com/questions/74170120/

相关文章：

r - R中，如果有多个TRUE答案，选择第一个TRUE答案

python - 将扁平化列表转换为字典列表

c# - ocelot api网关中的System.InvalidOperationExceptio

c - 在分配内存之前对灵活成员数组的第一个元素使用 sizeof 是未定义的行为？

haskell - 如何声明一个参数可以是 (Int, Int) 或 Maybe (Int, Int

r - 如何合并两个数据框并提取列的唯一组合？

c# - .NET 6 将参数检查替换为 ArgumentNullException.ThrowIf

r - 更快地填充矩阵

r - 选定列乘以给定向量

javascript - 左侧条件分配