query optimization | The Blog Pros

November 7, 2021

Auto_Explain: How to Log Slow Postgres Query Plans Automatically

Do you want to know why a PostgreSQL query is slow? Then EXPLAIN ANALYZE is a great starting point. But query plans can depend on other server activity, can take a while to run, and can change over time, so if you want to see the actual execution plans of your slowest queries, auto_explain is the tool you need. In this post, we’ll look into what it does, how to configure it, and how to use those logs to speed up your queries.

What Is Auto_Explain?

Auto_explain is a PostgreSQL extension that allows you to log the query plans for queries slower than a (configurable) threshold. This is incredibly useful for debugging slow queries, especially those that are only sometimes problematic. It is one of the contribution modules, so it can be installed and configured easily on regular PostgreSQL.

June 13, 2021

Memoization in Cost-based Optimizers

Query optimization is an expensive process that needs to explore multiple alternative ways to execute the query. The query optimization problem is NP-hard, and the number of possible plans grows exponentially with the query’s complexity. For example, a typical TPC-H query may have up to several thousand possible join orders, 2–3 algorithms per join, a couple of access methods per table, some filter/aggregate pushdown alternatives, etc. Combined, this could quickly explode the search space to millions of alternative plans.

This blog post will discuss memoization — an important technique that allows cost-based optimizers to consider billions of alternative plans in a reasonable time.

May 8, 2021

Rule-Based Query Optimization

The goal of the query optimizer is to find the query execution plan that computes the requested result efficiently. In this blog post, we discuss rule-based optimization - a common pattern to explore equivalent plans used by modern optimizers. Then we explore the implementation of several state-of-the-art rule-based optimizers. Then we analyze the rule-based optimization in Apache Calcite, Presto, and CockroachDB.

Transformations

A query optimizer must explore the space equivalent execution plans and pick the optimal one. Intuitively, plan B is equivalent to plan A if it produces the same result for all possible inputs.

May 6, 2021

Assembling a Query Optimizer with Apache Calcite

Introduction

Apache Calcite is a dynamic data management framework with SQL parser, optimizer, executor, and JDBC driver.

Many examples of Apache Calcite usage demonstrate the end-to-end execution of queries using JDBC driver, some built-in optimization rules, and the Enumerable executor. Our customers often have their own execution engines and JDBC drivers. So how to use Apache Calcite for query optimization only, without its JDBC driver and Enumerable executor?

May 1, 2021

Inside Presto Optimizer

Abstract

Presto is an open-source distributed SQL query engine for big data. Presto provides a connector API to interact with different data sources, including RDBMSs, NoSQL products, Hadoop, and stream processing systems. Created by Facebook, Presto received wide adoption by the open-source world (Presto, Trino) commercial companies (e.g., Ahana, Qubole).

Presto comes with a sophisticated query optimizer that applies various rewrites to the query plan. In this blog post series, we investigate the internals of Presto optimizer. In the first part, we discuss the optimizer interface and the design of the rule-based optimizer.

April 24, 2021

Custom Traits in Apache Calcite

Abstract

Physical properties are an essential part of the optimization process that allows you to explore more alternative plans.

Apache Calcite comes with convention and collation (sort order) properties. Many query engines require custom properties. For example, distributed and heterogeneous engines that we often see in our daily practice need to carefully plan the movement of data between machines and devices, which requires a custom property to describe data location.

December 31, 2020

Oracle SQL Performance Plan Review Automation

Why Do We Need a SQL Performance Review?

The current code review process is manual and doesn’t capture the Explain Plan for all modified queries.
Currently, lead devs, along with developers, run Explain Plans manually in Toad/SQL Developer.
To build an automated tool to capture problematic queries from an Explain Plan perspective and reduce manual oversight.
To provide performance audits with data points.

Solution

Oracle stores all the SQL database in-page memory and indexes it by SQL ID in gv$sqltext
During development (PLSQL/Java/OA Framework/XML Publisher), tag all the desired SQL queries with a code comment (Release Name/User Story Number).
Develop a PLSQL program with below features:
- Analyze the Execution Plan of queries executed by a concurrent program/Java program/OAF code/forms/reports/BI publisher reports.
- Generate a report with queries which could impact performance. For example, queries with FULL TABLE SCAN, MERGE CARTESIAN JOIN, FULL INDEX SCAN.
- Capability to categorize queries at User Story, Sprint, Release, and Scrum Team levels based on program input parameters.
- Capability to analyze queries executed by a concurrent program/package/Java modules.
- Capture data of relevant queries analyzed in a table for future analysis and dashboards.
- Store the generated Explain Plan in a database table for audit purposes.
In the Oracle E-Business Suite world, this program can be registered as an executable of concurrent programs and assigned to a request group.
Before a release migration, the SysAdmin can put together a business process to execute this concurrent program to review the Explain Plan for the SQL queries for that release and catch any trouble making queries well in advance.

Sample Query

    SQL
   
 

     
    

      

     

      

     

     

    
x

          

         

          

         

            
          
SELECT DISTINCT obj.object_name
                        ,program_line#
                        , cpu_time / 1000000 AS cpu_time_in_secs
                        , elapsed_time / 1000000 AS elapsed_time_in_secs
                        ,buffer_gets
                        ,disk_reads
                        ,end_of_fetch_count AS rows_fetched_per_execution
                        ,executions
                        ,optimizer_cost
                        ,vsql.sql_id
                        ,NULL operation
                        ,NULL options
                        ,vsplan.plan_hash_value
                        ,sql_text
                        , ( SUBSTR ( DBMS_LOB.SUBSTR ( sql_fulltext
                                                      ,4000
                                                      ,1
                                                     )
                                    , INSTR ( DBMS_LOB.SUBSTR ( sql_fulltext
                                                               ,4000
                                                               ,1
                                                              ), '/*' ) + 2
                                    , INSTR ( DBMS_LOB.SUBSTR ( sql_fulltext
                                                               ,4000
                                                               ,1
                                                              ), '*/' ) - INSTR ( DBMS_LOB.SUBSTR ( sql_fulltext
                                                                                                   ,4000
                                                                                                   ,1
                                                                                                  ), '/*' ) - 2
                                   )
                          ) release_string
                        ,NULL error_flag
                        ,NULL error_message
                        ,loads
                        ,first_load_time
                        ,user_io_wait_time
                        ,rows_processed
                        ,last_load_time
                        ,vsql.module
                        ,fnd_global.user_id created_by
                        ,SYSDATE creation_date
                        ,fnd_global.user_id last_updated_by
                        ,SYSDATE last_update_date
                        ,'-1' last_update_login
         FROM            gv$sql vsql
                        ,gv$sql_plan vsplan
                        ,all_objects obj
         WHERE           1 = 1
         AND             vsql.sql_id = vsplan.sql_id
         AND             vsql.sql_fulltext NOT LIKE '%sql_text%'
         AND             vsql.program_id = obj.object_id(+)
         AND             vsql.sql_fulltext LIKE lv_pattern
         AND             TO_DATE ( last_load_time, 'YYYY-MM-DD/HH24:MI:SS' ) >= ( SYSDATE - p_hours_from / 24 )
         AND             nvl(obj.object_name, '-') = NVL(p_program,nvl(obj.object_name, '-'))
         AND             nvl(vsql.module, '-') = NVL(p_module,nvl(vsql.module, '-'))
         ORDER BY        last_load_time DESC;

      

     

fnd_file.put_line(fnd_file.output, 'String..') – This can be used to print the SQL plan in a concurrent program output file.

Sample Output

December 7, 2020

7 Database Optimization Hacks for Web Developers

Optimizing your database comes with great rewards. Higher performance and increased query efficiency are just a few examples of these benefits.

However, the means aren’t always straightforward and may require changing the rules altogether within a developer team. Furthermore, the examples listed here might not work for your database, based on the system you use. In that case, try to follow the core principle and translate the action into the means your system allows.

May 12, 2020

SQL Plan Management With TiDB: A Review

The SQL execution plan is a critical factor that affects SQL statement performance. The stability of the SQL execution plans heavily influences the entire cluster's performance. If a relational database's optimizer chooses a wrong execution plan for a query, it usually has a negative impact on the system; for example, operations might take longer to respond or the database might get overloaded.

We've done a lot of work on optimizer stability for TiDB. However, SQL execution plans are affected by various factors. The execution plan may encounter unanticipated changes. As a result, the execution time might be too long.

January 15, 2020June 17, 2020

Index Advisor Service for Couchbase N1QL (SQL for JSON)

Couchbase N1QL is a SQL-like language for JSON data. To retrieve and manipulate JSON data effectively, we need appropriate indexes. The rules for creating these indexes can be read here. But that involves too much reading, hence we now have an Index Advisor service that accepts a query and gives out an index recommendation that would meet the expectations of the Couchbase query engine — all without downloading the latest Couchbase server.

This service will provide index recommendations to help DBAs, developers, and architects optimize query performance and meet the SLAs.