I attended the SQL intersection conference in November 2014 in Las Vegas, Nevada. I am including my notes from the sessions in this blog, primarily for my own reference later. For anyone that comes across these notes, please take them with a grain of salt – I may very easily have misheard something or have written it down incorrectly.
Session Title: Parallel Query Execution, Part 1
Instructor: Paul White
Within a parallel plan, each thread is effectively running a serial plan
- Parallel seeks and scans
- 2005 and earlier use parallel page provider
- 2008+ use range enumerators
- May see ACCESS_METHODS_DATASET_PARENT latch waits
- Only forward scans and seeks are supported
- Backward scans can only happen serially
- Exchange types
- Hash – Distributed based on hash function
- Round Robin – Sent to next available consumer
- Broadcast – Pushed to all consumers
- Demand – Row at a time (rare)
- Range – Mostly index build plans
- Exchanges can be sort order preserving or not
- Root node has property called ThreadStats
- Will show number of threads reserved and number of threads used
- May not use all reserved threads depending on timing of thread allocated in the query (can reuse threads that are already complete)
- exec_context_id is the thread number
- 0 = serial
- Query to identify node id where CXPACKET wait may be happening
- Parallel merge join requires order preservation
- Does not scale well to high DOP
- Parallel hash join
- Generally scales well to high DOP
- Bitmap reduction – can only appear in parallel plans, and can greatly reduce work to be done (i.e., early reduction)
- Parallel nested loops join
- Inner side runs serially
- Optimizer is “biased” against them
- TF 8649 – Force parallel plan
- Various tricks or plan guides to force usage
- Such as specifying row goals
- Express queries as APPLY – As query is pushed toward parallel nested loop queries, can push toward smaller memory grants and more efficient use of resources
- Parallel sorts
- Observer memory fraction percentage
- Tricks to avoid spills
- Cast strings to larger strings (i.e., select cast(str as varchar(100))) to force larger memory grant
- Split sort into smaller sorts (initial select into temp table, then cross apply to temp table)
- Avoid too much parallelism – set the cost threshold for parallelism
- Parallel deadlocks
- Once deadlock is detected, contents of CXPACKET buffers spill to tempdb
- Interesting crafted demo that causes spills only at even DOPs (but not with odd)!