arrays - 跨 Bigquery 数组的非重复计数

我想跨行连接数组,然后进行不同的计数。理想情况下,这会起作用:

WITH test AS
(
  SELECT
  DATE('2018-01-01') as date,
  2 as value,
  [1,2,3] as key
  UNION ALL
  SELECT
  DATE('2018-01-02') as date,
  3 as value,
  [1,4,5] as key
)
SELECT
  SUM(value) as total_value,
  ARRAY_LENGTH(ARRAY_CONCAT_AGG(DISTINCT key)) as unique_key_count
FROM test

很遗憾,ARRAY_CONCAT_AGG 函数不支持 DISTINCT 运算符。我可以取消嵌套数组,但随后我得到一个扇出并且值列的总和是错误的:

WITH test AS
(
  SELECT
  DATE('2018-01-01') as date,
  2 as value,
  [1,2,3] as key
  UNION ALL
  SELECT
  DATE('2018-01-02') as date,
  3 as value,
  [1,4,5] as key
)

SELECT
  SUM(value) as total_value,
  COUNT(DISTINCT k) as unique_key_count

FROM test
  CROSS JOIN UNNEST(key) k

是否有任何我遗漏的东西可以让我避免加入未嵌套的数组?

最佳答案

这里有一个替代方案:

CREATE TEMP FUNCTION DistinctCount(arr ANY TYPE) AS (
  (SELECT COUNT(DISTINCT x) FROM UNNEST(arr) AS x)
);

WITH test AS
(
  SELECT
  DATE('2018-01-01') as date,
  2 as value,
  [1,2,3] as key
  UNION ALL
  SELECT
  DATE('2018-01-02') as date,
  3 as value,
  [1,4,5] as key
)

SELECT
  SUM(value) as total_value,
  DistinctCount(ARRAY_CONCAT_AGG(key)) as unique_key_count
FROM test

这避免了子查询或需要将数组与表连接(导致求和中的重复值)。

https://stackoverflow.com/questions/52485871/

相关文章:

macos - 如何在 Mac OS 中复制文件的路径?

nuget - 无法让 Pipeline 将自定义 NuGet 服务器与 Azure DevOps

html-table - 表中 1.5 的 Colspan

google-colaboratory - 在 Google Colab notebook 上安装

apache-kafka - 如何批量处理最大大小的 KStream 或回退到时间窗口?

reactjs - 保存时 React 和 Visual Studio Code 有问题

laravel - 如何在 Laravel 迁移中的特定列之后订购多个新列

react-native - React native如何改变WebView的背景色?

Angular 6 - 如何在单击其子菜单项时使用 routerLinkActive 将父菜单项设置

php - 如何修复我的 "PHP Warning: A non-numeric value enc