Merge Join

名称

合并连接算子

介绍

合并已排序的数据并进行连接或者通过 Sort 算子对外表和内表分别排序,进行连接。

通常情况下, Hash Join 的效果比 Merge Join 要好,然而如果数据有索引或者已经被排过序时,Merge Join性能会更好。

Merge Join适合大表并且索引列进行关联的情况。

举例

--清理环境
DROP TABLE t1 CASCADE;
DROP TABLE t2 CASCADE;

--创建环境
CREATE TABLE t1(id int, a int);
INSERT INTO t1 SELECT generate_series(1,10),random();
ANALYZE t1;

CREATE TABLE t2(id int,b int);
INSERT INTO t2 SELECT generate_series(1,10),random();
ANALYZE t2;

EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id>t2.id WHERE t1.a!=t2.b;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Merge Join(Inner Join)  (cost=14.53..15.16 rows=30 width=16) (actual time=0.13..0.16 rows=45 loops=1)
   Merge Cond: ("outer".ID > "inner".ID)
   Join Filter: ("outer".A <> "inner".B)
   ->  Sort  (cost=7.27..7.29 rows=10 width=8) (actual time=0.08..0.08 rows=10 loops=1)
         Sort Key: ID
         ->  Seq Scan on T1  (cost=0.00..7.10 rows=10 width=8) (actual time=0.03..0.04 rows=10 loops=1)
   ->  Sort  (cost=7.27..7.29 rows=10 width=8) (actual time=0.03..0.03 rows=10 loops=1)
         Sort Key: ID
         ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
 Planning Time: 0.29 msec
 Execution Time: 0.27 msec
(11 rows)

EXPLAIN ANALYZE SELECT * FROM t1 LEFT JOIN t2 ON t1.id>t2.id WHERE RANDOM()>1;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Merge Join(Left Join)  (cost=14.53..15.24 rows=11 width=16) (actual time=0.08..0.11 rows=23 loops=1)
   Merge Cond: ("outer".ID > "inner".ID)
   Filter: (RANDOM() > 1::double precision)
   ->  Sort  (cost=7.27..7.29 rows=10 width=8) (actual time=0.04..0.04 rows=10 loops=1)
         Sort Key: ID
         ->  Seq Scan on T1  (cost=0.00..7.10 rows=10 width=8) (actual time=0.02..0.02 rows=10 loops=1)
   ->  Sort  (cost=7.27..7.29 rows=10 width=8) (actual time=0.02..0.03 rows=10 loops=1)
         Sort Key: ID
         ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.01 rows=10 loops=1)
 Planning Time: 0.41 msec
 Execution Time: 0.22 msec
(11 rows)

SET enable_hashjoin=0;
EXPLAIN ANALYZE SELECT * FROM t1 left JOIN t2 ON t1.id>10 WHERE a IN (SELECT b FROM t1) AND RANDOM()=1;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Merge Join(Left Semi Join)  (cost=23.89..24.09 rows=1 width=16) (actual time=0.11..0.11 rows=0 loops=1)
   Merge Cond: ("outer".A = "inner".B)
   Startup Filter: (RANDOM() = 1::double precision)
   ->  Sort  (cost=16.63..16.65 rows=10 width=16) (actual time=0.09..0.10 rows=19 loops=1)
         Sort Key: A
         ->  Nested Loop(Left Join)  (cost=7.11..16.46 rows=10 width=16) (actual time=0.04..0.07 rows=19 loops=1)
               Join Filter: ("outer".ID = 1)
               ->  Seq Scan on T1  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
               ->  Materialize  (cost=7.11..7.21 rows=10 width=8) (actual time=0.00..0.00 rows=10 loops=10)
                     ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
   ->  Sort  (cost=7.27..7.29 rows=10 width=4) (never executed)
         Sort Key: B
         ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=4) (never executed)
 Planning Time: 0.81 msec
 Execution Time: 0.33 msec
(15 rows)

EXPLAIN ANALYZE SELECT * FROM t1 left JOIN t2 ON t1.id=1 WHERE a NOT IN (SELECT b FROM t2) AND RANDOM()=1;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Nested Loop(Left Anti Semi Join)  (cost=14.22..25.82 rows=10 width=16) (actual time=0.06..0.11 rows=19 loops=1)
   Join Filter: ("outer".A = "inner".B)
   Startup Anti Filter: (RANDOM() = 1::double precision)
   ->  Nested Loop(Left Join)  (cost=7.11..16.46 rows=10 width=16) (actual time=0.06..0.09 rows=19 loops=1)
         Join Filter: ("outer".ID = 1)
         ->  Seq Scan on T1  (cost=0.00..7.10 rows=10 width=8) (actual time=0.03..0.04 rows=10 loops=1)
         ->  Materialize  (cost=7.11..7.21 rows=10 width=8) (actual time=0.00..0.00 rows=10 loops=10)
               ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=8) (actual time=0.01..0.02 rows=10 loops=1)
   ->  Materialize  (cost=7.11..7.21 rows=10 width=4) (never executed)
         ->  Seq Scan on T2  (cost=0.00..7.10 rows=10 width=4) (never executed)
 Planning Time: 0.55 msec
 Execution Time: 0.23 msec
(12 rows)
  • Inner Join 算子属性,表示 连接类型
  • Merge Cond 限定条件,用于Merge连接的条件
  • Join Filter 限定条件,连接限定条件
  • Filter 限定条件,一些其他的限定
  • Startup Filter 限定条件,对外表进行限定
  • Startup Anti Filter 限定条件