深度解析查询执行计划

云贝教育 · 2021-11-22 709 PostgreSQL

分享收藏

作者：崔鹏

执行计划

成本计算

索引概述

Auto_explain

目标

1.explian SQL分析工具的使用, 理解explain 的代价计算原理.

2.掌握explain 输出的含义 (如组合行集 , 节点处理 , 合并连接, 哈希连接)

EXPLAIN

EXPLAIN [ ( option [, ...] ) ] statement
n EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
     ANALYZE [ boolean ] -- 执行statement, 得到真实的运行时间以及统计信息
n VERBOSE [ boolean ] -- 输出详细信息
n COSTS [ boolean ] -- 输出cost值, 默认打开
n BUFFERS [ boolean ] -- 输出本次QUERY shared 或 local buffer的信息. 命中,未命中,脏, 写 n
      TIMING [ boolean ] -- 输出时间开销
n FORMAT { TEXT | XML | JSON | YAML } -- 输出格式

EXPLAIN 输出关键字解读

cost=0.00..22.32，0.00代表启动成本，22.32代表返回所有数据的成本。
rows=1032：表示返回多少行。
width=56，表示每行平均宽度。
actual time=0.060..1.167，实际花费的时间。
loops=1，循环的次数
Output，输出的字段名
Buffers，缓冲命中数
shared read，代表数据来自disk(磁盘)而并非cache(缓存)，当再次执行sql，会发现变成shared hit，说明数据已经在cache中
Planning Time，生成执行计划的时间
Execution Time，执行执行计划的时间

EXPLAIN 输出举例说明 1

create table t1(id int primary key,name varchar(20),age int);
insert into t1 values(1,'c1',1);
insert into t1 values(2,'c2',2);
insert into t1 values(3,'c3',3);
insert into t1 values(4,'c4',4);
insert into t1 values(5,'c5',5);
insert into t1 values(6,'c6',6);
explain (analyze, verbose, costs, buffers, timing) select count(*) from t1;

EXPLAIN 输出举例说明 2 - 全表扫描

postgres=# explain (analyze, verbose, costs, buffers, timing) select count(*) from t1;
                                                 QUERY PLAN                                                
-------------------------------------------------------------------------------
Aggregate  (cost=20.75..20.76 rows=1 width=8) (actual time=0.017..0.018 rows=1 loops=1)
这个节点的输出, 聚合, 输出第一行前的开销是20.75
聚合的开销=20.75..20.76
   Output: count(*)
   Buffers: shared hit=1 这个节点以及下级节点的BUFFER统计项
   ->  Seq Scan on public.t1  (cost=0.00..18.60 rows=860 width=0) (actual time=0.011..0.012 rows=6 loops=1)
  这个节点的路径(全表扫描)
  0.00表示输出第一行前的成本, 如这里输出第一行前不需要排序为0.00. 后面是这个节点真实的时间
         Output: id, name, age这个节点输出的列
         Buffers: shared hit=1 这个节点的shared buffer命中1个page
Planning Time: 0.050 ms   SQL解析时间
Execution Time: 0.102 ms 总的执行时间
(8 rows)

EXPLAIN 输出举例说明 3 - union

postgres=# explain (analyze, verbose, costs, buffers, timing) select 2 union select 2; -- union去重复
                                              QUERY PLAN                                             
-------------------------------------------------------------------------------
Unique  (cost=0.06..0.07 rows=2 width=4) (actual time=0.010..0.013 rows=1 loops=1)
  Output: (2)
  ->  Sort  (cost=0.06..0.07 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
        Output: (2)
        Sort Key: (2)
        Sort Method: quicksort  Memory: 25kB  内存排序 25kB
        ->  Append  (cost=0.00..0.05 rows=2 width=4) (actual time=0.003..0.004 rows=2 loops=1)
        ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
                     Output: 2
            ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
                     Output: 2
Planning Time: 0.202 ms
Execution Time: 0.147 ms
 
 
EXPLAIN 输出举例说明 4 - 索引扫描
create index idx_t1_age on t1(age);
postgres=#  explain (analyze, verbose, costs, buffers, timing) select  * from t1 where age = 1;
                                             QUERY PLAN                                            
-------------------------------------------------------------------------------
Seq Scan on public.t1  (cost=0.00..1.07 rows=1 width=11) (actual time=0.013..0.015 rows=1 loops=1)
  Output: id, name, age
  Index Cond: (age = 1)
Planning Time: 0.060 ms
Execution Time: 0.080 ms
(5 rows)

索引失效

在表t1.age字段创建btree索引，使用age=1索引生效，但是下面的例子运算、函数、类型转换却导致索引失效了。
where age + 1 = 1
where power(age,2) = 1
where age::varchar = '1'
如何解决呢,可参考前面的表达式索引解决：
create index idx_tl_age on tl ((age+1));
create index idx_t1_age on tl((power(age,2)));
create index idx_tl_age on tl((age::varchar));
 
 
EXPLAIN 输出举例说明 5 - 连接查询 1
create table t2(id int primary key,name varchar(20),age int);
insert into t2 values(1,'c1',1);
insert into t2 values(2,'c2',2);
insert into t2 values(3,'c3',3);
insert into t2 values(4,'c4',4);
insert into t2 values(5,'c5',5);
insert into t2 values(6,'c6',6);

EXPLAIN 输出举例说明 5 - 连接查询 2

postgres=# explain (analyze, verbose, costs, buffers, timing) select * from t1,t2 where t1.id=t2.id;
                                                   QUERY PLAN                                                  
-------------------------------------------------------------------------------
Hash Join  (cost=1.14..22.00 rows=6 width=77) (actual time=0.033..0.037 rows=6 loops=1)
  Output: t1.id, t1.name, t1.age, t2.id, t2.name, t2.age
  Inner Unique: true
  Hash Cond: (t2.id = t1.id)
  Buffers: shared hit=2 dirtied=1
  ->  Seq Scan on public.t2  (cost=0.00..18.60 rows=860 width=66) (actual time=0.007..0.008 rows=6 loops=1)
        Output: t2.id, t2.name, t2.age
        Buffers: shared hit=1 dirtied=1
  ->  Hash  (cost=1.06..1.06 rows=6 width=11) (actual time=0.017..0.017 rows=6 loops=1)
        Output: t1.id, t1.name, t1.age
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        Buffers: shared hit=1
        ->  Seq Scan on public.t1  (cost=0.00..1.06 rows=6 width=11) (actual time=0.003..0.004 rows=6 loops=1)
              Output: t1.id, t1.name, t1.age
              Buffers: shared hit=1
Planning Time: 0.186 ms
Execution Time: 0.114 ms
(17 rows)

PG13新特性 - EXPLAIN (WAL) support 概述

EXPLAIN can now track WAL usage information along with auto_explain, autovacuum, and pg_stat_statements.
EXPLAIN现在可以跟踪 WAL 使用信息以及auto_explain、autovacuum和pg_stat_statements。
注:EXPLAIN的wal选型是PostgreSQL13版本中新增加选项。
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
where option can be one of:
    ANALYZE [ boolean ]
    VERBOSE [ boolean ]
    COSTS [ boolean ]
    SETTINGS [ boolean ]
    BUFFERS [ boolean ]
    WAL [ boolean ]#PG13中新增选项
    TIMING [ boolean ]
    SUMMARY [ boolean ]
    FORMAT { TEXT | XML | JSON | YAML }

PG13新特性 - EXPLAIN (WAL) support 概念1

EXPLAIN的WAL选项

Include information on WAL record generation. Specifically, include the number of records, number of full page images (fpi) and amount of WAL bytes generated. In text format, only non-zero values are printed. This parameter may only be used when ANALYZE is also enabled. It defaults to FALSE.

包括有关WAL记录生成的信息。具体来说，包括记录数、整页图像数（fpi）和生成的WAL字节数。在文本格式中，仅打印非零值。此参数只能在同时启用ANALYZE时使用。它默认为FALSE。

WAL

事务日志,WAL可以显著降低磁盘的写次数，因为只有日志文件需要被刷出到磁盘以保证事务被提交，而被事务改变的每一个数据文件则不必被刷出。日志文件被按照顺序写入，因此同步日志的代价要远低于刷写数据页面的代价。在处理很多影响数据存储不同部分的小事务的服务器上这一点尤其明显。此外，当服务器在处理很多小的并行事务时，日志文件的一个fsync可以提交很多事务。

PG13新特性 - EXPLAIN (WAL) support 概念2

FPI

FPI(Full Page Image)产生于checkpoint之后第一次变脏的page，在下次checkpoint到了之前，已经输出过PFI的page是不需要再次输出FPI。

因此checkpoint时间间隔越长，FPI产生的频度会越低。如果是checkpoint之后第一次修改页面，则输出整个page的内容(即full page image，简称FPI)。

FPI是full-page-write中xlog记录的状态

举例说明

在T1，数据库成功执行checkpoint；

在T2，执行DML语句，这时候相关的数据会写入到WAL中（此处忽略了WAL buffer）

在T3，提交该事务；

在T4，bgwriter把dirty pages写入到Data file中，但在写入过程中机器出现故障导致Crash（如掉电等），出现了部分写的情况。

为了应对这种情况，PG在T2写入WAL的时候，会把出现变化的page整页写入到WAL中，而不仅仅是tuple data。在数据库重启执行恢复的时候，在Redo point开始回放WAL时，如发现XLOG Record是FPI（full-page-image），则整页替换，通过这种机制解决了部分写的问题。

PG13新特性 - EXPLAIN (WAL) support 新增

postgres=# create table test1
postgres-# (userid int,name text,birthday date,crt_time timestamp without time zone);
新增
postgres=# explain(analyze,wal,verbose,costs) insert into test1 (userid,name,birthday,crt_time)select generate_series(1,100000),'abcdef','2015-08-10',clock_timestamp();
                                                          QUERY PLAN                                                         
-------------------------------------------------------------------------------
Insert on public.test1  (cost=0.00..1750.02 rows=100000 width=48) (actual time=309.442..309.444 rows=0 loops=1)
   WAL: records=100000 bytes=7900000 (产生wal大小和记录数)
   ->  Subquery Scan on "*SELECT*"  (cost=0.00..1750.02 rows=100000 width=48) (actual time=0.027..83.242 rows=100000 loops=1)
         Output: "*SELECT*".generate_series, 'abcdef'::text, '2015-08-10'::date, "*SELECT*".clock_timestamp
         ->  ProjectSet  (cost=0.00..500.02 rows=100000 width=76) (actual time=0.008..29.592 rows=100000 loops=1)
               Output: generate_series(1, 100000), NULL::unknown, NULL::unknown, clock_timestamp()
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
Planning Time: 0.186 ms
Execution Time: 309.546 ms
(9 rows)

PG13新特性 - EXPLAIN (WAL) support 删除 1

删除

postgres=# explain(analyze,wal,verbose,costs) delete from test1 where userid=1;
                                                   QUERY PLAN                                                  
-------------------------------------------------------------------------------
Delete on public.test1  (cost=0.00..1887.00 rows=1 width=6) (actual time=15.638..15.640 rows=0 loops=1)
   WAL: records=1 fpi=1 bytes=8247 (有1个FPI类型的wal)
   ->  Seq Scan on public.test1  (cost=0.00..1887.00 rows=1 width=6) (actual time=0.022..15.574 rows=1 loops=1)
         Output: ctid
         Filter: (test1.userid = 1)
         Rows Removed by Filter: 99999
Planning Time: 0.178 ms
Execution Time: 15.922 ms
(8 rows)

PG13新特性 - EXPLAIN (WAL) support 删除 2

postgres=# explain(analyze,wal,verbose,costs) delete from test1 where userid>=1 and userid<=100;
                                                    QUERY PLAN                                                   
-------------------------------------------------------------------------------
Delete on public.test1  (cost=0.00..2137.00 rows=86 width=6) (actual time=16.287..16.289 rows=0 loops=1)
   WAL: records=100 bytes=5402(没产生FPI类型的wal)
   ->  Seq Scan on public.test1  (cost=0.00..2137.00 rows=86 width=6) (actual time=0.072..16.136 rows=99 loops=1)
         Output: ctid
         Filter: ((test1.userid >= 1) AND (test1.userid <= 100))
         Rows Removed by Filter: 99900
         WAL: records=1 bytes=56
Planning Time: 0.223 ms
Execution Time: 16.333 ms
(9 rows)

PG13新特性 - EXPLAIN (WAL) support 修改 1

修改

postgres=# explain(analyze,wal,verbose,costs) update test1 set name='123' where userid=101;
                                                   QUERY PLAN                                                   
-----------------------------------------------------------------------------------------------------------------
Update on public.test1  (cost=0.00..1887.00 rows=1 width=54) (actual time=16.967..16.969 rows=0 loops=1)
   WAL: records=1 fpi=1 bytes=3505(第一次更新page一定会产生1个fpi类型的wal)
   ->  Seq Scan on public.test1  (cost=0.00..1887.00 rows=1 width=54) (actual time=0.032..16.925 rows=1 loops=1)
         Output: userid, '123'::text, birthday, crt_time, ctid
         Filter: (test1.userid = 101)
         Rows Removed by Filter: 99899
Planning Time: 0.106 ms
Execution Time: 17.018 ms
(8 rows)

PG13新特性 - EXPLAIN (WAL) support 修改 2

postgres=# explain(analyze,wal,verbose,costs) update test1 set name='123' where userid=101;
                                                   QUERY PLAN                                                    
-------------------------------------------------------------------------------
Update on public.test1  (cost=0.00..1887.00 rows=1 width=54) (actual time=14.997..14.999 rows=0 loops=1)
   WAL: records=1 bytes=68((非第一次更新page不会产生fpi类型的wal))
   ->  Seq Scan on public.test1  (cost=0.00..1887.00 rows=1 width=54) (actual time=0.041..14.962 rows=1 loops=1)
         Output: userid, '123'::text, birthday, crt_time, ctid
         Filter: (test1.userid = 101)
         Rows Removed by Filter: 99899
Planning Time: 0.100 ms
Execution Time: 15.045 ms
(8 rows)

PG13新特性 - EXPLAIN (WAL) support 查询

查询(不生成wal变更过)

postgres=# explain(analyze,wal,verbose,costs) select * from test1;
                                                     QUERY PLAN                                                    
-------------------------------------------------------------------------------
Seq Scan on public.test1  (cost=0.00..1637.00 rows=100000 width=23) (actual time=0.012..9.671 rows=100000 loops=1)
   Output: userid, name, birthday, crt_time
Planning Time: 0.185 ms
Execution Time: 14.538 ms
(4 rows)

成本计算

查看表的字节数大小

db1=# select pg_relation_size('t1');

pg_relation_size

------------------

8192

(1 row)

为每个要依次读取的块添加成本点。如果知道每个块都包含了8kb，那么就可以计算从表中读取的顺序块的成本值。

block_size = 8192 # block size in bytes

relation_size = 8192

blocks = relation_size / block_size 1

成本计算

找出PostgreSQL为每个块读取分配多少个成本点

postgres=# \c db1
You are now connected to database "db1" as user "postgres".
db1=# SHOW seq_page_cost;
 seq_page_cost
---------------
 1
(1 row)
db1=#

成本计算

PostgreSQL为每个块分配一个成本点。这就需要 1个成本点从表中读取数据。

从磁盘读取值并不是PostgreSQL需要做的。它必须将这些值发送给CPU并应用一个WHERE子句过滤。

db1=# SHOW cpu_tuple_cost;
 cpu_tuple_cost
----------------
 0.01
(1 row)
db1=# SHOW cpu_operator_cost;
 cpu_operator_cost
-------------------
 0.0025
(1 row)

成本计算

用所有的值来计算在explain 语句中得到的值。

number_of_records = 1000000
block_size    = 8192     # block size in bytes
relation_size = 8192
blocks = relation_size / block_size # 1
seq_page_cost   = 1
cpu_tuple_cost  = 0.01
cpu_filter_cost = 0.0025;
cost = blocks * seq_page_cost +
   number_of_records * cpu_tuple_cost +
   number_of_recordsj记录数 * cpu_filter_cost
(1*1) + (2*0.01) + (2*0.0025) = 1.025

btree

应用场景

b-tree适合所有的数据类型，支持排序，支持大于、小于、等于、大于或等于、小于或等于的搜索。

索引与递归查询结合，还能实现快速的稀疏检索。

hash

应用场景

hash索引存储的是被索引字段VALUE的哈希值，只支持等值查询。

hash索引特别适用于字段VALUE非常长（不适合b-tree索引，因为b-tree一个PAGE至少要存储3个ENTRY，所以不支持特别长的VALUE）的场景，例如很长的字符串，并且用户只需要等值搜索，建议使用hash index。

gin

原理

gin是倒排索引，存储被索引字段的VALUE或VALUE的元素，以及行号的list或tree。

（ col_val:(tid_list or tid_tree) ， col_val_elements:(tid_list or tid_tree) ）

应用场景

1、当需要搜索多值类型内的VALUE时，适合多值类型，例如数组、全文检索、TOKEN。（根据不同的类型，支持相交、包含、大于、在左边、在右边等搜索）

2、当用户的数据比较稀疏时，如果要搜索某个VALUE的值，可以适应btree_gin支持普通btree支持的类型。（支持btree的操作符）

3、当用户需要按任意列进行搜索时，gin支持多列展开单独建立索引域，同时支持内部多域索引的bitmapAnd, bitmapOr合并，快速的返回按任意列搜索请求的数据。

gist

应用场景

GiST是一个通用的索引接口，可以使用GiST实现b-tree, r-tree等索引结构。

不同的类型，支持的索引检索也各不一样。例如：

1、几何类型，支持位置搜索（包含、相交、在上下左右等），按距离排序。

2、范围类型，支持位置搜索（包含、相交、在左右等）。

3、IP类型，支持位置搜索（包含、相交、在左右等）。

4、空间类型（PostGIS），支持位置搜索（包含、相交、在上下左右等），按距离排序。

5、标量类型，支持按距离排序。

sp-gist

应用场景

1、几何类型，支持位置搜索（包含、相交、在上下左右等），按距离排序。

2、范围类型，支持位置搜索（包含、相交、在左右等）。

3、IP类型，支持位置搜索（包含、相交、在左右等）。

auto_explain简介

auto_explain的目的是给数据库中执行的SQL语句一个执行时间阈值, 超过阈值的话,

记录下当时这个SQL的执行计划到日志中, 便于未来查看这个SQL执行计划有没有问题。

auto_explain编译安装

root@pgexp1 contrib]# cd auto_explain
[root@pgexp1 auto_explain]# pwd
/opt/soft_bak/postgresql-12.4/contrib/auto_explain
[root@pgexp1 auto_explain]# ll
总用量 172
-rw-r--r--. 1 1107 1107 10938 8月  11 2020 auto_explain.c
-rw-r--r--. 1 root root 94464 6月  24 23:30 auto_explain.o
-rwxr-xr-x. 1 root root 60168 6月  24 23:30 auto_explain.so
-rw-r--r--. 1 1107 1107   405 8月  11 2020 Makefile
[root@pgexp1 auto_explain]# make && make install
make -C ../../src/backend generated-headers
make[1]: 进入目录“/opt/soft_bak/postgresql-12.4/src/backend”
make[2]: 对“generated-header-symlinks”无需做任何事。
make[2]: 离开目录“/opt/soft_bak/postgresql-12.4/src/backend/utils”
make[1]: 离开目录“/opt/soft_bak/postgresql-12.4/src/backend”
/usr/bin/mkdir -p '/opt/pgsql/lib'
/usr/bin/install -c -m 755  auto_explain.so '/opt/pgsql/lib/auto_explain.so'

auto_explain配置 1

加载插件

shared_preload_libraries = 'auto_explain'

auto_explain.log_min_duration 指的是最小语句执行时间，单位是毫秒，设置之后语句计划就会被记录。如果设置为0，表示记录所有的计划。设置为-1（默认情况），表示不记录任何计划。

auto_explain.log_analyze 这个参数的作用是，如果一个执行计划被记录，那么EXPLAIN ANALYZE的结果，而不仅仅是EXPLAIN 的结果会被打印出来。

auto_explain.log_buffers 这个参数控制着当一个执行计划被记录时，缓冲区用量统计信息是否被打印出来。这个参数只有当开启auto_explain.log_analyze后才有作用，默认为off。

auto_explain.log_timing 这个参数控制着当一个执行计划被记录时，是否每个节点的时间信息都被打印出来。只有当开启auto_explain.log_analyze后才有作用，默认为on。

auto_explain.log_triggers 这个参数使得当一个执行计划被记录时，触发器的执行统计信息也会被包含进去。只有开启auto_explain.log_analyze后才有作用，默认为off。

auto_explain配置 2

auto_explain.log_verbose 控制着当执行计划被记录时，是否打印细节信息。默认为off。

auto_explain.log_format 选择要使用的EXPLAIN输出格式。默认为text格式，可选的格式有：text, xml, json 和yaml。

auto_explain.log_nested_statements 使嵌套语句被记录下来。默认为off。

auto_explain.sample_rate 使auto_explain只解释每个会话中的一部分语句。默认为1，表示

解释所有的查询。

配置postgresql.conf文件：

shared_preload_libraries = 'auto_explain'
# auto_explain
#auto_explain.log_min_duration = '1s'  #记录执行超过1s的sql
auto_explain.log_min_duration = 100   #记录执行超过100ms的sql
#auto_explain.log_min_duration = 0      #记录全部执行的sql
auto_explain.log_timing = on
auto_explain.log_verbose = on

点击加载更多

深度解析查询执行计划

热门视频更多

腾讯云 CloudLite 认证-分布式数据库TDSQL产品认证

腾讯云分布式数据库 TBase 产品认证

“国产数据库硬核技术”之TDSQL新敏态引擎技术详解

腾讯云数据库TDSQL精英挑战赛公开课第三期

热门文章更多

常用软件镜像站& Red Hat系统镜像文件下载

RedHat 8.4 安装&VMware Workstation 安装 Redhat8.4

TDSQL分布式事务实现机制

关于docker应用之一二

分布式数据库核心原理 Zookeeper+Mysql

【喜报】恭喜"云贝教育"21名同学顺利通过PG考试