Merge Join¶
名称¶
合并连接算子
介绍¶
合并已排序的数据并进行连接或者通过 Sort 算子对外表和内表分别排序,进行连接。
通常情况下, Hash Join 的效果比 Merge Join 要好,然而如果数据有索引或者已经被排过序时,Merge Join性能会更好。
Merge Join适合大表并且索引列进行关联的情况。
举例¶
--清理环境
DROP TABLE t1 CASCADE;
DROP TABLE t2 CASCADE;
--创建环境
CREATE TABLE t1(id int, a int);
INSERT INTO t1 SELECT generate_series(1,10),random();
ANALYZE t1;
CREATE TABLE t2(id int,b int);
INSERT INTO t2 SELECT generate_series(1,10),random();
ANALYZE t2;
EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id>t2.id WHERE t1.a!=t2.b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Merge Join(Inner Join) (cost=14.53..15.16 rows=30 width=16) (actual time=0.13..0.16 rows=45 loops=1)
Merge Cond: ("outer".ID > "inner".ID)
Join Filter: ("outer".A <> "inner".B)
-> Sort (cost=7.27..7.29 rows=10 width=8) (actual time=0.08..0.08 rows=10 loops=1)
Sort Key: ID
-> Seq Scan on T1 (cost=0.00..7.10 rows=10 width=8) (actual time=0.03..0.04 rows=10 loops=1)
-> Sort (cost=7.27..7.29 rows=10 width=8) (actual time=0.03..0.03 rows=10 loops=1)
Sort Key: ID
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
Planning Time: 0.29 msec
Execution Time: 0.27 msec
(11 rows)
EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id>t2.id WHERE RANDOM()>1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Merge Join(Left Join) (cost=14.53..15.24 rows=11 width=16) (actual time=0.08..0.11 rows=23 loops=1)
Merge Cond: ("outer".ID > "inner".ID)
Filter: (RANDOM() > 1::double precision)
-> Sort (cost=7.27..7.29 rows=10 width=8) (actual time=0.04..0.04 rows=10 loops=1)
Sort Key: ID
-> Seq Scan on T1 (cost=0.00..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=10 loops=1)
-> Sort (cost=7.27..7.29 rows=10 width=8) (actual time=0.02..0.03 rows=10 loops=1)
Sort Key: ID
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.01 rows=10 loops=1)
Planning Time: 0.41 msec
Execution Time: 0.22 msec
(11 rows)
SET enable_hashjoin=0;
EXPLAIN ANALYZE SELECT * FROM t1 left JOIN t2 ON t1.id>10 WHERE a IN (SELECT b FROM t1) AND RANDOM()=1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Merge Join(Left Semi Join) (cost=23.89..24.09 rows=1 width=16) (actual time=0.11..0.11 rows=0 loops=1)
Merge Cond: ("outer".A = "inner".B)
Startup Filter: (RANDOM() = 1::double precision)
-> Sort (cost=16.63..16.65 rows=10 width=16) (actual time=0.09..0.10 rows=19 loops=1)
Sort Key: A
-> Nested Loop(Left Join) (cost=7.11..16.46 rows=10 width=16) (actual time=0.04..0.07 rows=19 loops=1)
Join Filter: ("outer".ID = 1)
-> Seq Scan on T1 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
-> Materialize (cost=7.11..7.21 rows=10 width=8) (actual time=0.00..0.00 rows=10 loops=10)
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
-> Sort (cost=7.27..7.29 rows=10 width=4) (never executed)
Sort Key: B
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=4) (never executed)
Planning Time: 0.81 msec
Execution Time: 0.33 msec
(15 rows)
EXPLAIN ANALYZE SELECT * FROM t1 left JOIN t2 ON t1.id=1 WHERE a NOT IN (SELECT b FROM t2) AND RANDOM()=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Nested Loop(Left Anti Semi Join) (cost=14.22..25.82 rows=10 width=16) (actual time=0.06..0.11 rows=19 loops=1)
Join Filter: ("outer".A = "inner".B)
Startup Anti Filter: (RANDOM() = 1::double precision)
-> Nested Loop(Left Join) (cost=7.11..16.46 rows=10 width=16) (actual time=0.06..0.09 rows=19 loops=1)
Join Filter: ("outer".ID = 1)
-> Seq Scan on T1 (cost=0.00..7.10 rows=10 width=8) (actual time=0.03..0.04 rows=10 loops=1)
-> Materialize (cost=7.11..7.21 rows=10 width=8) (actual time=0.00..0.00 rows=10 loops=10)
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
-> Materialize (cost=7.11..7.21 rows=10 width=4) (never executed)
-> Seq Scan on T2 (cost=0.00..7.10 rows=10 width=4) (never executed)
Planning Time: 0.55 msec
Execution Time: 0.23 msec
(12 rows)
- Inner Join 算子属性,表示 连接类型
- Merge Cond 限定条件,用于Merge连接的条件
- Join Filter 限定条件,连接限定条件
- Filter 限定条件,一些其他的限定
- Startup Filter 限定条件,对外表进行限定
- Startup Anti Filter 限定条件