Hash Join

名称

散列连接算子

介绍

Hash Join算子下层包含 Hash 算子,通过哈希值的匹配获取连接数据,先使用哈希函数对内表数据按哈希值分区,并创建哈希表,映射哈希键值与分区,然后逐条扫描外表的数据,

使用相同的哈希函数进行探测,找到对应的哈希分区,然后再用分区中的数据进行精准匹配,将匹配的行进行投影操作,然后将投影后的数据返回。

举例

--清理环境
DROP TABLE t1 CASCADE;
DROP TABLE t2 CASCADE;

--创建环境
CREATE TABLE t1(id int, a int);
INSERT INTO t1 SELECT generate_series(1,10),random();
ANALYZE t1;

CREATE TABLE t2(id int,b int);
INSERT INTO t2 SELECT generate_series(1,10),random();
ANALYZE t2;

EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t1.a!=t2.b;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Hash Join(Inner Join)  (cost=7.22..14.49 rows=9 width=16) (actual time=0.06..0.07 rows=10 loops=1)
   Hash Cond: ("outer".ID = "inner".ID)
   Join Filter: ("outer".A <> "inner".B)
   ->  Seq Scan on T1  (cost=0.00..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=10 loops=1)
   ->  Hash  (cost=7.10..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=0 loops=1)
         ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
 Planning Time: 0.46 msec
 Execution Time: 0.27 msec
(8 rows)

EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE RANDOM()>1;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Hash Join(Left Join)  (cost=7.22..14.51 rows=10 width=16) (actual time=0.05..0.07 rows=5 loops=1)
   Hash Cond: ("outer".ID = "inner".ID)
   Filter: (RANDOM() > 1::double precision)
   ->  Seq Scan on T1  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
   ->  Hash  (cost=7.10..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=0 loops=1)
         ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
 Planning Time: 0.36 msec
 Execution Time: 0.21 msec
(8 rows)

EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE a IN (SELECT b FROM t2) AND RANDOM()>1;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Hash Join(Left Semi Join)  (cost=14.45..21.82 rows=3 width=16) (actual time=0.09..0.09 rows=0 loops=1)
   Hash Cond: ("outer".A = "inner".B)
   Startup Filter: (RANDOM() > 1::double precision)
   ->  Hash Join(Left Join)  (cost=7.22..14.46 rows=10 width=16) (actual time=0.04..0.05 rows=10 loops=1)
         Hash Cond: ("outer".ID = "inner".ID)
         ->  Seq Scan on T1  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
         ->  Hash  (cost=7.10..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=0 loops=1)
               ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.01 rows=10 loops=1)
   ->  Hash  (cost=7.10..7.10 rows=10 width=4) (actual time=0.03..0.03 rows=0 loops=1)
         ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=4) (actual time=0.02..0.02 rows=10 loops=1)
 Planning Time: 1.09 msec
 Execution Time: 0.60 msec
(12 rows)
  • Hash Cond 限定条件,表示哈希的探测条件
  • Join Filter 限定条件,连接限定条件
  • Filter 限定条件,一些其他的限定
  • Startup Filter 限定条件,对外表进行限定
  • Startup Anti Filter 限定条件,对内表进行限定