InterSystems SQL Optimization Guide
Interpreting an SQL Query Plan
This chapter explains the language and terms used in an SQL query access plan generated by ShowPlan
An SQL table is stored as a set of maps
. Each table has a master map
that contains all the data in the table; the table may also have other maps such as index maps and bitmaps. Each map can be pictured as a multidimensional global
, with the data for some fields in one or more subscripts
, and with the remaining fields stored in the node value. The subscripts control what data is being accessed.
Compiling an SQL query produces a set of instructions to access and return the data specified by the query. These instructions are expressed as ObjectScript code in an .INT routine.
The instructions and the sequence in which they are executed are influenced by the data the SQL compiler has about the structure and content of the tables involved in the query. The compiler attempts to use information such as table sizes and available indices to make the set of instructions as efficient as possible.
The query access plan (ShowPlan
) is a human-readable translation of that resulting set of instructions. The author of the query can use this query access plan to see how the data will be accessed. While the SQL compiler tries to make the most efficient use of data as specified by the query, sometimes the author of the query knows more about some aspect of the stored data than is evident to the compiler. In this case, the author can make use of the query plan to modify the original query to provide more information or more guidance to the query compiler.
The plan for a query could access several tables. When accessing a table, the plan may access a single map (index or master map), two maps (an index map followed by the master map), or, in the case of a multi-index plan
, several maps.
In accessing the data via a map, the plan indicates the subscripts used. It also indicates what the actual subscript values will be: a single given value, a set of given values, a range of values, or all values present in the table for that subscript. Which one is chosen depends on the conditions specified in the query. Obviously, accessing a single subscript value or only a few subscript values is faster than accessing all the values at that subscript level.
When the query is run, various conditions specified by the query are tested. Except for certain subscript-limiting conditions as just mentioned, the ShowPlan output does not explicitly indicate the testing of conditions. It is always best to test conditions as early as possible. The optimal place for testing the various conditions can be inferred from the plan details.
Similarly, ShowPlan does not detail the computation of expressions and sub-expressions. Besides simplicity, the main reason for this is that in most database environments, table and index access constitute the more important aspect of processing; the cost of retrieving the table data dominates the overall query cost, as disk access speed is still orders of magnitude slower than CPU processing.
When accessing data from a table, it is often necessary to examine multiple rows iteratively. Such access is indicated by a loop
. The instructions to be executed for each pass are referred to as the body
of the loop. They are visually indicated by being indented. It is common for database access involving multiple tables to require loops within loops. In this case, each loop level is indicated by a further indentation when compared to the previous level.
Some temp-files contain data from processing a single table. In this instance, building the temp-file could be considered pre-processing
for the data in that table. Reading such a temp-file may or may not be followed by accessing the master map of the source table. In other cases, temp-files could contain the results of processing multiple tables. In still other situations, temp-files are used to store grouped aggregate values, to check DISTINCT, etc.
The building of temp-files, as well as other processing, may be delegated to a separate unit of work called a module
. Each module is named. When separate modules are listed, the plan indicates where each module is invoked. When execution of the module finishes, processing resumes at the next statement following the module invocation.
For parallel query processing and for sharding, the plan shows the various queries being sent to be processed in parallel or on the shards. The plan used for each of these queries is also displayed.
Some subqueries (and views) within the given query might also be processed separately. Their plans are specified in separate subquery
sections. The precise place where a subquery section is called from is not indicated in the plan. This is because they are often invoked as part of the processing of conditions or expressions.
For queries that specify OUTER JOIN, the plan may indicate the possible generation of a row of NULLs if no matching rows were found, in order to satisfy the requirements of the outer join semantics.
For UNION, the plan might indicate the combining of the result rows from the various union subqueries in a separate module, where further processing of these result rows may be done.
When analyzing the plan for a given query, application developers might sometimes feel that a different plan would be more efficient. The application developer has available a variety of ways to affect the plan.
In addition, analyzing the plan may indicate that certain changes to the class definition could result in plans that are more efficient, for example:
In some cases (though not always), the use of a temp-file for preprocessing can indicate that adding to the original table an index with the same or similar structure as the temp-file would eliminate the need to build the temp-file. Removing this processing step from the query plan obviously could make the query run faster, but this must be balanced against the amount of work needed to maintain the index when updating the table. For further details on creating an index, refer to the Defining and Building Indices
chapter of this manual.
When the plan shows an index being used, followed by access to the master map, this suggests that adding the master map fields being used in the query to the index node data might result in a faster plan for this query. Again, this must be balanced against the additional update time, as well as the extra time added to the processing of other queries that use this index, since the index would be larger and thus require somewhat more read time.
When the plan shows two tables being joined in a particular order (for example t1 being retrieved first, and then joined to t2 using the join condition t1.a=t2.b), it may be the case that the reverse table order would result in a faster plan. For example, if t2 has additional conditions that would significantly limit the number of qualifying rows. In that case, adding a t1 index on t1.a would enable such a join order to be considered. For further details on creating an index, refer to the Defining and Building Indices
chapter of this manual.
Content Date/Time: 2019-06-25 05:00:55