HiveSQL实战积累_groupby&unionall与leftjoin效率比较

1.比较sql实例

分别依赖流量明细、关注明细、加购明细三张源表，加工得到店铺粒度的uv、pv、关注、加购四个指标数据。使用groupby&unionall和leftjoin两种写法得到的结果数据是一样的。

1.1 groupby&unionall

使用union all合并不同源表加工数据然后group by sum聚合到一起的sql实例：

SELECT
	a.shop_id,
	SUM(a.uv) uv,
	SUM(a.pv) pv,
	SUM(a.concern) concern,
	SUM(a.cart) cart
FROM
	(
		SELECT
			shop_id,
			COUNT(DISTINCT brws_uniq_id) uv,
			COUNT(brws_uniq_id) pv,
			0 concern,
			0 cart
		FROM
			adm.adm_sch_d14_zs_all_chan_shop_traffics_di
		WHERE
			dt = '2023-03-18'
			AND itm_sku_id > 0
		GROUP BY
			shop_id
		
		UNION ALL
		
		SELECT
			shop_id,
			0 uv,
			0 pv,
			COUNT(DISTINCT itm_sku_id) concern,
			0 cart
		FROM
			gdm.gdm_m13_zs_shop_product_follow_det
		WHERE
			dt = '2023-03-18'
			AND itm_sku_id > 0
		GROUP BY
			shop_id
		
		UNION ALL
		
		SELECT
			shop_id,
			0 uv,
			0 pv,
			0 concern,
			SUM(add_pro_qty) cart
		FROM
			gdm.gdm_m14_zs_shop_product_add_cart_det
		WHERE
			dt = '2023-03-18'
			AND itm_sku_id > 0
			AND add_cart_typ = 1
		GROUP BY
			shop_id
	)
	a
GROUP BY
	a.shop_id

1.2 leftjoin

使用full join合并不同源表加工数据的sql实例：

SELECT
	COALESCE(a.shop_id, b.shop_id, c.shop_id) shop_id,
	a.uv uv,
	a.pv pv,
	b.concern concern,
	c.cart cart
FROM
	(
		SELECT
			shop_id,
			COUNT(DISTINCT brws_uniq_id) uv,
			COUNT(brws_uniq_id) pv
		FROM
			adm.adm_sch_d14_zs_all_chan_shop_traffics_di
		WHERE
			dt = '2023-03-18'
			AND itm_sku_id > 0
		GROUP BY
			shop_id
	)
	a
FULL JOIN
	(
		SELECT
			shop_id,
			COUNT(DISTINCT itm_sku_id) concern
		FROM
			gdm.gdm_m13_zs_shop_product_follow_det
		WHERE
			dt = '2023-03-18'
			AND itm_sku_id > 0
		GROUP BY
			shop_id
	)
	b
ON
	a.shop_id = b.shop_id
FULL JOIN
	(
		SELECT
			shop_id,
			SUM(add_pro_qty) cart
		FROM
			gdm.gdm_m14_zs_shop_product_add_cart_det
		WHERE
			dt = '2023-03-18'
			AND itm_sku_id > 0
			AND add_cart_typ = 1
		GROUP BY
			shop_id
	)
	c
ON
	a.shop_id = c.shop_id

2.explain执行计划比较

2.1 groupby&unionall

groupby&unionall执行计划：

Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1, Stage-3, Stage-4
  Stage-3 is a root stage
  Stage-4 is a root stage
  Stage-0 depends on stages: Stage-2

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: adm_sch_d14_zs_all_chan_shop_traffics_di
            Statistics: Num rows: 3215094512 Data size: 372950963416 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: ((dt = '2023-03-18') and (itm_sku_id > 0)) (type: boolean)
              Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: shop_id (type: bigint), brws_uniq_id (type: string)
                outputColumnNames: shop_id, brws_uniq_id
                Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count(DISTINCT brws_uniq_id), count(brws_uniq_id)
                  keys: shop_id (type: bigint), brws_uniq_id (type: string)
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2, _col3
                  Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint), _col1 (type: string)
                    sort order: ++
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col3 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(DISTINCT KEY._col1:0._col0), count(VALUE._col1)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 535849085 Data size: 62158493864 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), UDFToLong(0) (type: bigint), 0 (type: int)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4
            Statistics: Num rows: 535849085 Data size: 62158493864 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            Union
              Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: sum(_col1), sum(_col2), sum(_col3), sum(_col4)
                  keys: _col0 (type: bigint)
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint)
                    sort order: +
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint)
          TableScan
            Union
              Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: sum(_col1), sum(_col2), sum(_col3), sum(_col4)
                  keys: _col0 (type: bigint)
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint)
                    sort order: +
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint)
          TableScan
            Union
              Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: _col0 (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4
                Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: sum(_col1), sum(_col2), sum(_col3), sum(_col4)
                  keys: _col0 (type: bigint)
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2, _col3, _col4
                  Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint)
                    sort order: +
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 555371524 Data size: 70055386943 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col1 (type: bigint), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0), sum(VALUE._col1), sum(VALUE._col2), sum(VALUE._col3)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2, _col3, _col4
          Statistics: Num rows: 277685762 Data size: 35027693471 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 277685762 Data size: 35027693471 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: gdm_m13_zs_shop_product_follow_det
            Statistics: Num rows: 6867732 Data size: 2916252794 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: ((dt = '2023-03-18') and (itm_sku_id > 0)) (type: boolean)
              Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: shop_id (type: bigint), itm_sku_id (type: bigint)
                outputColumnNames: shop_id, itm_sku_id
                Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count(DISTINCT itm_sku_id)
                  keys: shop_id (type: bigint), itm_sku_id (type: bigint)
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint), _col1 (type: bigint)
                    sort order: ++
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(DISTINCT KEY._col1:0._col0)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1144622 Data size: 486042132 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), UDFToLong(0) (type: bigint), UDFToLong(0) (type: bigint), _col1 (type: bigint), 0 (type: int)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4
            Statistics: Num rows: 1144622 Data size: 486042132 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-4
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: gdm_m14_zs_shop_product_add_cart_det
            Statistics: Num rows: 220533812 Data size: 88930214596 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (((dt = '2023-03-18') and (itm_sku_id > 0)) and (add_cart_typ = 1)) (type: boolean)
              Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: shop_id (type: bigint), add_pro_qty (type: bigint)
                outputColumnNames: shop_id, add_pro_qty
                Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: sum(add_pro_qty)
                  keys: shop_id (type: bigint)
                  mode: hash
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint)
                    sort order: +
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col1 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 18377817 Data size: 7410850947 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0 (type: bigint), UDFToLong(0) (type: bigint), UDFToLong(0) (type: bigint), UDFToLong(0) (type: bigint), _col1 (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4
            Statistics: Num rows: 18377817 Data size: 7410850947 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

2.2 leftjoin

fulljoin执行计划：

Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1, Stage-3, Stage-4
  Stage-3 is a root stage
  Stage-4 is a root stage
  Stage-0 depends on stages: Stage-2

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: adm_sch_d14_zs_all_chan_shop_traffics_di
            Statistics: Num rows: 3215094512 Data size: 372950963416 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: ((dt = '2023-03-18') and (itm_sku_id > 0)) (type: boolean)
              Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: shop_id (type: bigint), brws_uniq_id (type: string)
                outputColumnNames: shop_id, brws_uniq_id
                Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count(DISTINCT brws_uniq_id), count(brws_uniq_id)
                  keys: shop_id (type: bigint), brws_uniq_id (type: string)
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2, _col3
                  Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint), _col1 (type: string)
                    sort order: ++
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 1071698170 Data size: 124316987728 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col3 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(DISTINCT KEY._col1:0._col0), count(VALUE._col1)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1, _col2
          Statistics: Num rows: 535849085 Data size: 62158493864 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
    Map Reduce
      Map Operator Tree:
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint)
              sort order: +
              Map-reduce partition columns: _col0 (type: bigint)
              Statistics: Num rows: 535849085 Data size: 62158493864 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: bigint), _col2 (type: bigint)
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint)
              sort order: +
              Map-reduce partition columns: _col0 (type: bigint)
              Statistics: Num rows: 1144622 Data size: 486042132 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: bigint)
          TableScan
            Reduce Output Operator
              key expressions: _col0 (type: bigint)
              sort order: +
              Map-reduce partition columns: _col0 (type: bigint)
              Statistics: Num rows: 18377817 Data size: 7410850947 Basic stats: COMPLETE Column stats: NONE
              value expressions: _col1 (type: bigint)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Outer Join 0 to 1
               Outer Join 0 to 2
          keys:
            0 _col0 (type: bigint)
            1 _col0 (type: bigint)
            2 _col0 (type: bigint)
          outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
          Statistics: Num rows: 1178868012 Data size: 136748689464 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: COALESCE(_col0,_col3,_col5) (type: bigint), _col1 (type: bigint), _col2 (type: bigint), _col4 (type: bigint), _col6 (type: bigint)
            outputColumnNames: _col0, _col1, _col2, _col3, _col4
            Statistics: Num rows: 1178868012 Data size: 136748689464 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 1178868012 Data size: 136748689464 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.TextInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: gdm_m13_zs_shop_product_follow_det
            Statistics: Num rows: 6867732 Data size: 2916252794 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: ((dt = '2023-03-18') and (itm_sku_id > 0)) (type: boolean)
              Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: shop_id (type: bigint), itm_sku_id (type: bigint)
                outputColumnNames: shop_id, itm_sku_id
                Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: count(DISTINCT itm_sku_id)
                  keys: shop_id (type: bigint), itm_sku_id (type: bigint)
                  mode: hash
                  outputColumnNames: _col0, _col1, _col2
                  Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint), _col1 (type: bigint)
                    sort order: ++
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 2289244 Data size: 972084264 Basic stats: COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(DISTINCT KEY._col1:0._col0)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1144622 Data size: 486042132 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-4
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: gdm_m14_zs_shop_product_add_cart_det
            Statistics: Num rows: 220533812 Data size: 88930214596 Basic stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (((dt = '2023-03-18') and (itm_sku_id > 0)) and (add_cart_typ = 1)) (type: boolean)
              Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: shop_id (type: bigint), add_pro_qty (type: bigint)
                outputColumnNames: shop_id, add_pro_qty
                Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
                Group By Operator
                  aggregations: sum(add_pro_qty)
                  keys: shop_id (type: bigint)
                  mode: hash
                  outputColumnNames: _col0, _col1
                  Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
                  Reduce Output Operator
                    key expressions: _col0 (type: bigint)
                    sort order: +
                    Map-reduce partition columns: _col0 (type: bigint)
                    Statistics: Num rows: 36755635 Data size: 14821702298 Basic stats: COMPLETE Column stats: NONE
                    value expressions: _col1 (type: bigint)
      Reduce Operator Tree:
        Group By Operator
          aggregations: sum(VALUE._col0)
          keys: KEY._col0 (type: bigint)
          mode: mergepartial
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 18377817 Data size: 7410850947 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

3.比较结论

比较上述两个sql的执行计划，还有将fulljoin换成leftjoin或innerjoin的执行计划，可以发现groupby&unionall与各种join的执行计划都是一样的，多个unionall的sql是并行，多个left的sql也是并行的，所以在效率上其实是差不多的。

那么在实际项目中具体采用哪种写法，主要考虑的就是key的包含关系，以上述例子中的流量数据和关注数据为例：如果存在无流量但是有关注店铺，也存在有流量但是无关注店铺，而最终我们希望只要有流量或者关注其一的店铺就进入结果表，那么这种场景下使用groupby&unionall会方便快捷很多。如果我们能确认有流量店铺的范围一定完全包含了有关注店铺，那么使用groupby&unionall或者流量结果leftjoin关注结果都可以，只不过leftjoin写法会更简洁一些。