I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada. I am including my notes from the sessions in this blog, primarily for my own reference later. For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.
Session Title: Mastering SQL Server Execution Plan Analysis
Instructor: Paul White
- SQL technically has “bags” not “sets”
- Binding order: from, on, join, where, group by, cube/rollup, having, select, distinct, order by, top
- Can refer to elements defined in previous items
- “Anybody using Hekaton? No, me either. Don’t see any need for it.”
- Query optimization pipeline – multiple steps
- SQL Server not strongly typed; implicit type conversions can have serious consequences for performance
- Constant folding: Evaluate deterministic expressions, such as 1 + 1
- OPTION RECOMPILE vs WITH RECOMPILE
- Option recompile is very powerful tool, often worth paying the price to recompile for each value of a parameter
- Does not work if assigning to variable
- Option recompile enables parameter embedding
- Can provide more aggressive (better) query plans
- Substitutes parameters, does constant folding
- Use with caution!
- TF 3604 – Output to client
- TF 8605
- Outputs tree in message tab
- Particularly useful to view when plan does not optimize as expected
- SQL Server does not treat CTEs as a “fence” – i.e., not evaluated separately; rather, rolled into the plan (treated as internal view)
- Even causes view read counts to increase
- Simplification
- Eliminate unused columns
- Rewrite logical subqueries as inner/outer join/apply
- Full and outer joins to inner joins (if, i.e., WHERE clause does not allow nulls)
- Remove redundancies and empty expressions
- Primarily targets auto-generated SQL
- WITH QUERYRULEOFF XX
- Can combine query with things like check constraints to further restrict/simplify query
- Computed columns (persisted) can be simplified, but only if expression is in the same order!
- Joins that check for existence but FK exists
- See MVP Deep Dives Volume 1 for a method of querying against a generic view (Rob Farley) (or here)
- Unless FK is multicolumn
- Cardinality estimation
- Uses histograms, density, trie trees (short strings)
- Uses derived stats
- Heavily influences initial join order
- When writing expressions, always ask how the CE will come up with an estimate
- New 2014 cardinality estimator
- The compatibility level of the context DB controls which CE is used
- TF 2312 = new, TF 9481 = old
- Lots of square roots (exponential backoff)
- Example – join 2 tables
- t0 = most selective
- t1 = less selective
- old: t0 * t1
- new: t0 * sqrt(t1)
- “If you want an exciting support life, use linked servers and throw in lots of new features”
- Trivial Plan Queries
- Queries without joins only
- Property on root node of query plan: OptimizationLevel = Trivial vs Full
- Subqueries prevent trival plans, so can use SELECT (SELECT col) FROM tbl to force full optimization
- TF 8757 also prevents trivial
- Cost-based optimization
- Cascades framework (SearchSpace) – readily available
- TF 8606 – show tree at different stages
- TF 8612 – Show cardinality estimate (2012+)
- Memo groups
- Logical properties per group are the same (column lists)
- Physical properties vary – different ways explored for doing the same thing
- Sort order, cardinality est., parallelism, execution mode, Halloween protection
- Visualization: tree with single node, then add variations to grow tree on out
- Based on applying rules (395 of them in SQL2014)
- Most rules are not used most of the time
- Small set of rules most of the time
- Works very well in vast majority of cases
- Query hints work by disabling rules in the optimizer
- FORCE ORDER query hint disables many rules – has a big impact on plan
- Other hints imply FORCE ORDER, such as inner LOOP join
- OPTION (QUERYRULEOFF)
- DBCC SHOWONRULES – List of rules currently enabled. Examples:
- JNToNL = logical join to nested loop
- LOJNToNL = left outer logical join to nested loop
- FOJNnoneqToSM = full outer join, non-equality, to sort merge
- DBCC SHOWOFFRULES
- dm_exec_query_transformation_stats – shows how frequently rules are used
- See Paul’s query Q07 – which rules for test query
- TF 8619 (2012+) – also shows applied rules
- TF 8621 (2012+) – more detailed
- Optimization phases
- Transaction processing (Search 0)
- At least 3 tables
- Mostly OLTP queries, small number of rows with nested loops
- Quick plan (Search 1)
- Subset of rules
- If cost exceed threshold for parallelism, run second time with parallel enabled
- Full optimization (Search 2)
- At least 5 tables
- If Search1 resulted in serial, only serial plans searched; same for parallel
- All rules available
- Costing
- Always assumes cold cache
- Doesn’t consider specific hardware, mostly (but does account for CPU cores and memory)
- Many assumptions that may or may not be true
- Costing numbers originally mean an estimate of number of seconds on “Nick’s” machines (dev at MS)
- Now completely meaningless for most purposes
- Not really valid to compare two queries
- Not even valid within a query; i.e., if it says sort of 60% of query, doesn’t make it so
- Useful only as a sign post
- Plan guides don’t necessarily reduce compilation time
- Still runs Search 2
- Guide used to shape search
- TF 8608 – Shows initial memo groups
- TF 8675 – Shows search phase info
- TF 8606 – Shows tree info
- TF 8615 – Shows final memos
- TF 8607 – Output tree from optimizer
- TF 7352 – Copy out tree (similar to plan)
- Query execution engine
- The “true” execution plan is called the MXC (“execution context”)
- Far more detailed that the user execution plan
- Many internal details are hidden
- The building blocks of the execution plan are “iterators”
- Have a generic interface
- fn_PhysLocFormatter(%%physloc%%)
- Returns file, page, slot for row
- TF 8666 – Adds some debugging data to plan (see the F4 properties, InternalDebuggingInfo)
- Interesting query with 9 different seeks:
- LastName in (‘aaa’, ‘bbb’, ‘ccc’) AND FirstName in (‘ddd’, ‘eee’, ‘fff’)
- Plan gives only one index seek, but:
- Properties show 9 different seek predicates
- Physically does 9 seeks
- Limited to 64 seek operations
- Beware that seeks can be hidden in this way
- Filter iterator – predicates than can’t be used in seeks
- May have start-up expression: can control whether or not entire subtree is executed at all
- Residual – hidden filter
- Filter predicate is pushed to the child iterator
- Because it is hidden, is easy to overlook
- “In a lot of ways, this is the major takeaway of the day”
- TF 9130 – Reveal pushed predicates (doesn’t work for join predicates)
- Also: If predicate is too large to fit on the tooltip, SSMS will simply leave it off! Have to look at properties window.
- Nested loops
- Be sure to look at “outer references” property
- SSMS shows estimated execution stats based on one loop; actual exec stats on aggregated loops, so they generally don’t correlate
- If having performance issues with nested loop joins, be sure to check for residual predicates. May be able to rewrite to push predicate further down the tree
- TF 2324 – Disables implications of predicates
- Row-mode parallel plans
- TF 8649 – Force parallel plan
- If a table has a computed column, and the table is referenced in the query (even if the computed column is not referenced), will not get a parallel query (computed column is considered a scalar function)
- Batch mode parallel execution – Rather than operating one row at a time, operates on between 64 and 900 at once
- Queue up batches; and thread can process next batch
- Merge Joins – Check if many-to-many property is true; usually indicates a query error
- See Paul’s blog on residual predicates on hash joins (make sure types match)
- Bitmap – either simple or Bloom filter bitmap (for parallel plans only)
- TOP (n) PERCENT must read the entire set and will include an eager spool
- Sorts have a memory fraction property
- Spool is a place in tempdb to cache rows in case they may be needed later
- Eager spool often indicates a missing index
- Merge join operator can be performing a UNION ALL rather than a standard join!
- SSMS update operator can actually be updating multiple indexes; view properties to see for sure
- Wide vs. Narrow update plans
- Wide: Each index updated by separate operator
- Narrow: Multiple indexes updated by one operator
- Split, Sort, Collapse operators in a row generally indicates that a unique index is being updated
- TF 4138 (documented, supported) – Disable row goal behavior
- TOP adds row goal behavior
- Queries that execute LIKE against a variable can have a Constant Expression that computes at run-time the range begin/end values
- GetRangeWithMismatchedTypes (only in XML) indicates a type mismatch that must be converted to a range (i.e., datetime to date)
- One way to handle catch-all search queries (rather than dynamic SQL or option recompile) is to use a columnstore index
- The “true” execution plan is called the MXC (“execution context”)
- Transaction processing (Search 0)
- Example – join 2 tables