Hash Join¶
名称¶
散列连接算子
介绍¶
Hash Join算子下层包含 Hash 算子,通过哈希值的匹配获取连接数据,先使用哈希函数对内表数据按哈希值分区,并创建哈希表,映射哈希键值与分区,然后逐条扫描外表的数据,
使用相同的哈希函数进行探测,找到对应的哈希分区,然后再用分区中的数据进行精准匹配,将匹配的行进行投影操作,然后将投影后的数据返回。
举例¶
--清理环境
DROP TABLE t1 CASCADE;
DROP TABLE t2 CASCADE;
--创建环境
CREATE TABLE t1(id int, a int);
INSERT INTO t1 SELECT generate_series(1,10),random();
ANALYZE t1;
CREATE TABLE t2(id int,b int);
INSERT INTO t2 SELECT generate_series(1,10),random();
ANALYZE t2;
EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE t1.a!=t2.b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Hash Join(Inner Join) (cost=7.22..14.49 rows=9 width=16) (actual time=0.06..0.07 rows=10 loops=1)
Hash Cond: ("outer".ID = "inner".ID)
Join Filter: ("outer".A <> "inner".B)
-> Seq Scan on T1 (cost=0.00..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=10 loops=1)
-> Hash (cost=7.10..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=0 loops=1)
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
Planning Time: 0.46 msec
Execution Time: 0.27 msec
(8 rows)
EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE RANDOM()>1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Hash Join(Left Join) (cost=7.22..14.51 rows=10 width=16) (actual time=0.05..0.07 rows=5 loops=1)
Hash Cond: ("outer".ID = "inner".ID)
Filter: (RANDOM() > 1::double precision)
-> Seq Scan on T1 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
-> Hash (cost=7.10..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=0 loops=1)
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
Planning Time: 0.36 msec
Execution Time: 0.21 msec
(8 rows)
EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id = t2.id WHERE a IN (SELECT b FROM t2) AND RANDOM()>1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Hash Join(Left Semi Join) (cost=14.45..21.82 rows=3 width=16) (actual time=0.09..0.09 rows=0 loops=1)
Hash Cond: ("outer".A = "inner".B)
Startup Filter: (RANDOM() > 1::double precision)
-> Hash Join(Left Join) (cost=7.22..14.46 rows=10 width=16) (actual time=0.04..0.05 rows=10 loops=1)
Hash Cond: ("outer".ID = "inner".ID)
-> Seq Scan on T1 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
-> Hash (cost=7.10..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=0 loops=1)
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.01 rows=10 loops=1)
-> Hash (cost=7.10..7.10 rows=10 width=4) (actual time=0.03..0.03 rows=0 loops=1)
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=4) (actual time=0.02..0.02 rows=10 loops=1)
Planning Time: 1.09 msec
Execution Time: 0.60 msec
(12 rows)
- Hash Cond 限定条件,表示哈希的探测条件
- Join Filter 限定条件,连接限定条件
- Filter 限定条件,一些其他的限定
- Startup Filter 限定条件,对外表进行限定
- Startup Anti Filter 限定条件,对内表进行限定