我正在尝试将多行中的整数相加,同时维护数字之前的数据。
这是我的原始数据:
"2020-06","28347","Afghanistan","791","anonymous","3128"
"2020-06","28347","Afghanistan","830","anonymous","402"
"2020-06","28347","Afghanistan","10019","anonymous","79"
"2020-06","28347","Afghanistan","10070","anonymous","829"
"2020-06","28347","Afghanistan","10604","anonymous","4319"
"2020-06","28347","Albania","266","anonymous","60"
"2020-06","28347","Albania","824","anonymous","23"
"2020-06","28347","Albania","10163","anonymous","166"
"2020-06","28347","Algeria","267","anonymous","11047"
这是我期望的输出:
28347,Afghanistan,8757
28347,Albania,249
28347,Algeria,11047
到目前为止,我所做的是从数据中提取第二列和第三列,然后尝试使用 grep 遍历每一列并将值加在一起。不幸的是,我得到的是总合并值,而不是每个国家/地区的值。
COUNTRIES=$(awk -F\, '{OFS=",";}{print $2,$3}' file.dat | sort | uniq)
for COUNTRY in "${COUNTRIES[@]}"
do
NUMBER=$(grep $COUNTRY file.dat | awk -F\, '{print $6}' | sed 's/\"//g' | awk '{s+=$1} END {print s}')
echo "$COUNTRY,$NUMBER" | sed 's/\"//g'
done
这给了我
28347,Afghanistan
28347,Albania
28347,Algeria,20053
我不太清楚为什么它会给我全部总数而不是每个国家/地区的总数。有什么想法吗?
最佳答案
你可以使用这个 awk:
awk -F'","' -v OFS=, '{sums[$2 OFS $3] += $NF} END {for (i in sums) print i, sums[i]}' file
28347,Albania,249
28347,Algeria,11047
28347,Afghanistan,8757
如果您想按国家名称的字母顺序排序,请使用此 gnu awk
变体:
awk -F'","' -v OFS=, '
{sums[$2 OFS $3] += $NF}
END {
PROCINFO["sorted_in"]="@ind_str_asc"
for (i in sums)
print i, sums[i]
}' file
28347,Afghanistan,8757
28347,Albania,249
28347,Algeria,11047
https://stackoverflow.com/questions/67939757/