通过


窗口框架子句

适用于:勾选“是” Databricks SQL 勾选“是” Databricks Runtime

指定运行聚合或分析窗口函数的分区中的行的滑动子集。

语法

{ frame_mode frame_start |
  frame_mode BETWEEN frame_start AND frame_end } }

frame_mode
{ RANGE | ROWS }

frame_start
{ UNBOUNDED PRECEDING |
  offset_start PRECEDING |
  CURRENT ROW |
  offset_start FOLLOWING }

frame_end
{ offset_stop PRECEDING |
  CURRENT ROW |
  offset_stop FOLLOWING |
  UNBOUNDED FOLLOWING }

参数

  • frame_mode

    • ROWS

      如果已指定,则滑动窗口框架将以当前行之前或之后的行的形式表示。

    • RANGE

      如果指定,窗口函数必须指定具有ORDER BY单个表达式obExpr的子句。

      然后将滑动窗口的边界表示为从当前行的 obExpr 算起的偏移量。

  • frame_start

    滑动窗口框架相对于当前行的起始位置。

    • 无界前进

      指定窗口框架从分区的开头位置开始。

    • 紧接在offset_start之前的

      如果模式为 ROWS,则 offset_start 是正整数文本数字,用于定义框架从当前行之前的多少行处开始。

      如果模式为 RANGE,则 offset_start 是可以从 obExpr 中减去的正文本类型值。 框架从 obExpr 大于或等于当前行处的 obExpr - offset_start 的分区的第一行处开始。

    • 现行排

      指定框架从当前行处开始。

    • 后面的 offset_start

      如果模式为 ROWS,则 offset_start 是正整数文本数字,用于定义帧从当前行之后的多少行处开始。 如果模式为 RANGE,则 offset_start 是可以与 obExpr 相加的正文本类型值。 框架从 obExpr 大于或等于当前行处的 obExpr + offset_start 的分区的第一行处开始。

  • frame_stop

    滑动窗口框架相对于当前行的结束位置。

    如果未指定,则框架将在当前行处停止。 滑动窗口的结束位置必须大于窗口框架的开始位置。

    • 前面offset_stop

      如果 frame_mode 为 ROWS,则 offset_stop 是正整数文本数字,用于定义框架在当前行之前的多少行处停止。 如果 frame_mode 为 RANGE,则 offset_stop 是与 offset_start 类型相同的正文本值。 框架在 obExpr 小于或等于当前行处的 obExpr - offset_stop 的分区的最后一行处结束。

    • 现行排

      指定框架在当前行处停止。

    • offset_stop关注

      如果 frame_mode 为 ROWS,则 offset_stop 是正整数文本数字,用于定义框架在当前行之后的多少行处结束。 如果 frame_mode 为 RANGE,则 offset_stop 是与 offset_start 类型相同的正文本值。 框架在 obExpr 小于或等于当前行处的 obExpr + offset_stop 的分区的最后一行处结束。

    • 无界追随

      指定窗口框架在分区的末尾处停止。

示例

> CREATE TABLE employees
   (name STRING, dept STRING, salary INT, age INT);
> INSERT INTO employees
   VALUES ('Lisa', 'Sales', 10000, 35),
          ('Evan', 'Sales', 32000, 38),
          ('Fred', 'Engineering', 21000, 28),
          ('Alex', 'Sales', 30000, 33),
          ('Tom', 'Engineering', 23000, 33),
          ('Jane', 'Marketing', 29000, 28),
          ('Jeff', 'Marketing', 35000, 38),
          ('Paul', 'Engineering', 29000, 23),
          ('Chloe', 'Engineering', 23000, 25);

-- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: cumulative sum of salary within each department.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
    FROM employees;
  Fred Engineering  21000  21000
 Chloe Engineering  23000  44000
   Tom Engineering  23000  67000
  Paul Engineering  29000  96000
  Jane   Marketing  29000  29000
  Jeff   Marketing  35000  64000
  Lisa       Sales  10000  10000
  Alex       Sales  30000  40000
  Evan       Sales  32000  72000

-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING: moving average over three adjacent rows.
> SELECT name,
         dept,
         salary,
         ROUND(AVG(salary) OVER (PARTITION BY dept ORDER BY salary
                                 ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)) AS moving_avg
    FROM employees;
  Fred Engineering  21000  22000
 Chloe Engineering  23000  22333
   Tom Engineering  23000  25000
  Paul Engineering  29000  26000
  Jane   Marketing  29000  32000
  Jeff   Marketing  35000  32000
  Lisa       Sales  10000  20000
  Alex       Sales  30000  24000
  Evan       Sales  32000  31000

-- ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING: sum from the current row to the end of the partition.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS remaining_total
    FROM employees;
  Fred Engineering  21000  96000
 Chloe Engineering  23000  75000
   Tom Engineering  23000  52000
  Paul Engineering  29000  29000
  Jane   Marketing  29000  64000
  Jeff   Marketing  35000  35000
  Lisa       Sales  10000  72000
  Alex       Sales  30000  62000
  Evan       Sales  32000  32000

-- RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: cumulative sum using value-based range.
-- Unlike ROWS mode, RANGE groups rows with equal `ORDER BY` values together.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_total
    FROM employees;
  Fred Engineering  21000  21000
 Chloe Engineering  23000  67000
   Tom Engineering  23000  67000
  Paul Engineering  29000  96000
  Jane   Marketing  29000  29000
  Jeff   Marketing  35000  64000
  Lisa       Sales  10000  10000
  Alex       Sales  30000  40000
  Evan       Sales  32000  72000

-- RANGE BETWEEN 5000 PRECEDING AND 5000 FOLLOWING: sum of salaries within +/- 5000 of the current row's salary.
> SELECT name,
         dept,
         salary,
         SUM(salary) OVER (PARTITION BY dept ORDER BY salary
                           RANGE BETWEEN 5000 PRECEDING AND 5000 FOLLOWING) AS nearby_total
    FROM employees;
  Fred Engineering  21000  67000
 Chloe Engineering  23000  67000
   Tom Engineering  23000  67000
  Paul Engineering  29000  75000
  Jane   Marketing  29000  64000
  Jeff   Marketing  35000  35000
  Lisa       Sales  10000  10000
  Alex       Sales  30000  92000
  Evan       Sales  32000  62000

-- Comparing ROWS vs RANGE: the difference is visible when there are duplicate `ORDER BY` values.
-- With ROWS, `Chloe` and `Tom` have different running totals because each row is counted individually.
-- With RANGE, `Chloe` and `Tom` have the same total because they share the same salary value.
> SELECT name,
         salary,
         SUM(salary) OVER (ORDER BY salary
                           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)  AS rows_total,
         SUM(salary) OVER (ORDER BY salary
                           RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_total
    FROM employees
    WHERE dept = 'Engineering';
  Fred  21000  21000  21000
 Chloe  23000  44000  67000
   Tom  23000  67000  67000
  Paul  29000  96000  96000