不积跬步,无以至千里;不积小流,无以成江海。

Dean's blog

  • Join Us on Facebook!
  • Follow Us on Twitter!
  • LinkedIn
  • Subcribe to Our RSS Feed

Spark SQL 获取前一条不为空和后一条不为空的值

WITH special_date AS(
  SELECT '元旦' AS activity_content, 'A001' AS type_code, '2022-1-1' AS begin_date, '2022-1-3' AS end_date
  UNION ALL SELECT '上班' AS activity_content, 'A002' AS type_code, '2022-1-29' AS begin_date, '2022-1-30' AS end_date
  UNION ALL SELECT '春节' AS activity_content, 'A001' AS type_code, '2022-1-31' AS begin_date, '2022-2-6' AS end_date
  UNION ALL SELECT '上班' AS activity_content, 'A002' AS type_code, '2022-4-2' AS begin_date, '2022-4-2' AS end_date
  UNION ALL SELECT '清明节' AS activity_content, 'A001' AS type_code, '2022-4-3' AS begin_date, '2022-4-5' AS end_date
  UNION ALL SELECT '上班' AS activity_content, 'A002' AS type_code, '2022-4-24' AS begin_date, '2022-4-24' AS end_date
  UNION ALL SELECT '劳动节' AS activity_content, 'A001' AS type_code, '2022-4-30' AS begin_date, '2022-5-4' AS end_date
  UNION ALL SELECT '上班' AS activity_content, 'A002' AS type_code, '2022-5-7' AS begin_date, '2022-5-7' AS end_date
  UNION ALL SELECT '端午节' AS activity_content, 'A001' AS type_code, '2022-6-3' AS begin_date, '2022-6-5' AS end_date
  UNION ALL SELECT '中秋节' AS activity_content, 'A001' AS type_code, '2022-9-10' AS begin_date, '2022-9-12' AS end_date
  UNION ALL SELECT '国庆节' AS activity_content, 'A001' AS type_code, '2022-10-1' AS begin_date, '2022-10-7' AS end_date
  UNION ALL SELECT '上班' AS activity_content, 'A002' AS type_code, '2022-10-8' AS begin_date, '2022-10-9' AS end_date
)
, minDate AS(
  SELECT MAKE_DATE(YEAR(MIN(begin_date)), 1, 1) AS mindate
  FROM special_date
)
, maxDate AS(
  SELECT MAKE_DATE(YEAR(MAX(end_date)), 12, 31) AS maxdate 
  FROM special_date
)
, date_range AS (
  SELECT date_add(nd.mindate, int(diff.id)) as date
  FROM minDate nd
    , maxDate xd
    , RANGE(365*50) diff
  WHERE date_add(nd.mindate, int(diff.id)) <= xd.maxdate
)
, days AS(
  SELECT dr.date
    , year(dr.date) AS year
    , month(dr.date) AS month
    , day(dr.date) AS day
    , dayofweek(dr.date) AS dayofweek
    , dayofyear(dr.date) AS dayofyear
    , CASE 
        WHEN sd.type_code = 'A001' THEN 1        --日期配置表-节假日
        WHEN sd.type_code = 'A002' THEN 0        --日期配置表-工作日
        WHEN dayofweek(dr.date) IN (1, 7) THEN 1 --周末(7周六、1周日)
        ELSE 0                                   --工作日(周一至五)
      END AS is_holiday
  FROM date_range dr
    LEFT JOIN special_date sd ON dr.date BETWEEN sd.begin_date AND sd.end_date
)
, workday AS(
  SELECT date
    , year
    , month
    , ROW_NUMBER() OVER(PARTITION BY year, month ORDER BY date) AS workdayofmonth
    , ROW_NUMBER() OVER(PARTITION BY year ORDER BY date) AS workdayofyear
  FROM days
  WHERE is_holiday = 0
)
, calendar AS(
  SELECT d.*
    , w.workdayofmonth
    , NVL(last_value(w.workdayofmonth, true)  OVER (PARTITION BY d.year, d.month ORDER BY d.date ), 0) as workdayofmonth_previous
    , NVL(FIRST_VALUE(w.workdayofmonth, true) OVER (PARTITION BY d.year, d.month ORDER BY d.date ROWS BETWEEN -1 PRECEDING AND UNBOUNDED FOLLOWING), 999) workdayofmonth_next
    , w.workdayofyear
    , NVL(last_value(w.workdayofyear, true)  OVER (PARTITION BY d.year ORDER BY d.date), 0) as workdayofyear_previous
    , NVL(FIRST_VALUE(w.workdayofyear, true) OVER (PARTITION BY d.year ORDER BY d.date ROWS BETWEEN -1 PRECEDING AND UNBOUNDED FOLLOWING), 999) workdayofyear_next
  FROM days d
    LEFT JOIN workday w ON d.date = w.date
)
SELECT *
FROM calendar
ORDER BY date

LAST_VALUE(w.workdayofmonth, true),当设置为true时,则会排除空值。

FIRST_VALUE的默认范围是  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING,默认所有记录拿到的都是相同的第一条,这里设置为BETWEEN -1 PRECEDING AND UNBOUNDED FOLLOWING,则从当前行的下一条记录拿,则可以实现取到紧接不为空的下一条记录了。

不允许评论
粤ICP备17049187号-1