Searching Only Active Jobs

This article is the fourth and final in a series of four posts about a process to search active SQL Agent jobs. The first three articles describe some utility functions, and the fourth post will pull them all together.

  1. Find which job steps are reachable
  2. Determine if a job is scheduled
  3. Determine if a job has executed recently
  4. (This post) Search only active jobs

Searching only active jobs

My motivation for wanting to search only active jobs came several months ago when I needed to search every job across several dozen instances for different strings that I needed to change. The updates that I needed to make were rather laborious and needed to be done manually, and we have a number of disabled jobs and other situations where it would be a waste of time to make these changes, at least in the short-term.

Therefore, I wanted to filter out jobs and/or job steps that don’t typically get executed. (In my particular case, it wasn’t such a big deal if I missed something because the changes would merely cause the job step to fail, which would in turn send an alert, and I could fix the problem in those few cases. I think this wound up happening twice, but that in turn led me to refine these queries to address those situations.)

So I established the following criteria to eliminate results.

  • The job needs to be enabled. However, it is entirely possible for a disabled to job to run. For example, it can be manually started, or it may be started by a different job or an external process, so additional criteria are needed.
  • The job needs to be scheduled, although for the same reasons, this won’t catch everything since jobs can be triggered via other mechanisms.
  • To catch jobs that are either disabled or unscheduled, we also want to check if the job has run “recently.” I figure that if for any reason it has executed in the past week or month or whatever, it is something that I’m interested in looking at.

Considerations

The exact nature of the search can vary quite a bit depending how the jobs are set up and what we want to find. Usually, I will search the job name or the step for certain text, or (most commonly of all), I will search the command text, as is done in the sample query below.

Input values: The sample query uses the following variables (at the top of the script) to control the search. These will vary according to your specific search needs.

  • @searchPattern (nvarchar(max)): A pattern indicating the search criteria.
  • @activeCutoffDate (datetime): A value indicating the threshold for how recent a job must have executed to be considered “recent.”

Output: The sample query returns these columns, but you may want to vary the output to your needs.

  • job_name (the name of the job as found in msdb.dbo.sysjobs)
  • step_name (the name of the job step as found in msdb.dbo.sysjobsteps)
  • command (the command behind the job step)

Sample search code

declare @searchPattern nvarchar(max) = '%search%';
declare @activeCutoffDate datetime = dateadd(day, -7, getdate());

select sr.job_name, sr.step_name, js.command
from CI_Admin.dbo.fn_AreJobStepsReachable(null) sr
inner join CI_Admin.dbo.fn_IsJobScheduled() sch on sch.job_id = sr.job_id
inner join CI_Admin.dbo.fn_HasJobRunRecently(@activeCutoffDate) rr on rr.job_id = sr.job_id
inner join msdb.dbo.sysjobsteps js on js.job_id = sr.job_id and js.step_id = sr.step_id
where sr.is_reachable = 1
and (sch.is_scheduled = 1 or rr.has_run_recently = 1)
and js.command like @searchPattern;

Determine If a Job Has Run Recently

This article is the third in a series of four posts about a process to search active SQL Agent jobs. The first three articles describe some utility functions, and the fourth post will pull them all together.

  1. Find which job steps are reachable
  2. Determine if a job is scheduled
  3. (This post) Determine if a job has executed recently
  4. Search only active jobs

Finding jobs that have executed recently

The function below is an inline table-valued function that will return a value indicating if a job has been executed recently, with “recently” being defined by an input parameter. The logic is considerably simple than what was needed in the first post where we determined which job steps are reachable.

Considerations

Parameters: The function takes one argument that is a threshold date for defining “recent.”

  • @cutoff_date (datetime): A value indicating the threshold for how recent a job must have executed to be considered “recent.”

Output: The function returns one row per job. The columns are:

  • job_id (the uniqueidentifier of the job as found in msdb.dbo.sysjobs)
  • job_name (the name of the job as found in msdb.dbo.sysjobs)
  • last_run_date (a datetime indicating the last time that the job executed, or NULL if the last run time cannot be determined, which may indicate that the job has never executed or that since the last execution, job history has been purged)
  • has_run_recently (a boolean that is true if the job has executed “recently” and false if not)

Example usage:

select job_id, name, last_run_date, has_run_recently from fn_HasJobRunRecently(dateadd(day, -5, getdate()));

The Code

if exists (select * from sys.objects o join sys.schemas s on o.schema_id = s.schema_id where s.name = 'dbo' and o.name = 'fn_HasJobRunRecently' and o.type = 'IF')
	drop function dbo.fn_HasJobRunRecently;
go
create function dbo.fn_HasJobRunRecently
(
	@cutoff_date datetime
)
returns table
as
	return
		select j.job_id, j.name, jh.last_run_date,
			cast(case when jh.last_run_date >= @cutoff_date then 1 else 0 end as bit) has_run_recently
		from msdb.dbo.sysjobs j
		left join
		(
			select jh.job_id,
				cast(
					cast(jh.run_date / 10000 as nvarchar(9)) + N'-' + 
					cast((jh.run_date % 10000) / 100 as nvarchar(9)) + N'-' + 
					cast(jh.run_date % 100 as nvarchar(9)) + N' ' +
					cast(jh.run_time / 10000 as nvarchar(9)) + N':' + 
					cast((jh.run_time % 10000) / 100 as nvarchar(9)) + N':' + 
					cast(jh.run_time % 100 as nvarchar(9))
				as datetime) last_run_date,
				row_number() over (partition by jh.job_id order by jh.run_date desc, jh.run_time desc) rn
			from msdb.dbo.sysjobhistory jh
		) jh on j.job_id = jh.job_id and jh.rn = 1
go

Determine If a Job Is Scheduled

This article is the second in a series of four posts about a process to search active SQL Agent jobs. The first three articles describe some utility functions, and the fourth post will pull them all together.

  1. Find which job steps are reachable
  2. (This post) Determine if a job is scheduled
  3. Determine if a job has executed recently
  4. Search only active jobs

Finding scheduled jobs

The function below is an inline table-valued function that will return a value indicating if a job is scheduled. The logic is considerably simple than what was needed in the previous post where we determined which job steps are reachable.

Considerations

Parameters: The function takes no arguments.

Output: The function returns one row per job. The columns are:

  • job_id (the uniqueidentifier of the job as found in msdb.dbo.sysjobs)
  • job_name (the name of the job as found in msdb.dbo.sysjobs)
  • is_scheduled (a boolean that is true if the job is scheduled and false if not)

Example usage:

select job_id, job_name, is_scheduled
from dbo.fn_IsJobScheduled();

Discussion: We are defining a job as scheduled if all of the following are true:

  • The job has one or more schedules associated to it
  • At least one of the schedules is enabled
  • The enabled schedule must have a recurrence type of one-time, daily, weekly or monthly
  • The end date on the schedule must be in the future

The Code

if exists (select * from sys.objects o join sys.schemas s on o.schema_id = s.schema_id where s.name = 'dbo' and o.name = 'fn_IsJobScheduled' and o.type = 'IF')
	drop function dbo.fn_IsJobScheduled;
go
create function dbo.fn_IsJobScheduled ()
returns table
as
	return
		select j.job_id,
			j.name job_name,
			cast(case when exists
			(
				select *
				from msdb.dbo.sysjobschedules js 
				join msdb.dbo.sysschedules s on s.schedule_id = js.schedule_id
				where js.job_id = j.job_id
				and s.enabled = 1
				and s.freq_type in (1, 4, 8, 16, 32)
				and msdb.dbo.agent_datetime(s.active_end_date, s.active_end_time) > getdate()
			) then 1 else 0 end as bit) is_scheduled
		from msdb.dbo.sysjobs j;
go

Finding Which Job Steps Are Reachable

This article is the first in a series of four posts about a process to search active SQL Agent jobs. The first three articles describe some utility functions, and the fourth post will pull them all together.

  1. (This post) Find which job steps are reachable
  2. Determine if a job is scheduled
  3. Determine if a job has executed recently
  4. Search only active jobs

Finding reachable job steps

There are a number of reasons that we might create SQL Agent job where there is at least one job step that isn’t reachable. Perhaps most commonly, we want to have some processing that normally doesn’t run (on a scheduled basis) but that occasionally we might kick off manually.

For this article series, I am presuming that when we search active SQL jobs, we want to ignore any job steps that are unreachable. For instance, we might be doing some refactoring and want to find all jobs that reference certain tables or columns, but don’t really care about job or job steps that are never executed.

I have put together a SQL function to help identify these unreachable job steps. Unfortunately, it is a multi-statement table-valued function (I really dislike MSTVFs), but I haven’t yet figured out how to get all of the logic into an inline TVF. (I got the core logic in place, but the issue I haven’t worked past is how to deal with infinite loops within jobs. In the MSTVF version, this gets handled already, but the inline version uses recursion and chokes on jobs that have infinite loops.)

Considerations

Parameters: The function takes a single argument called @job_name. If this value is NULL, the function processes all jobs on the instance, otherwise, it limits itself to the single specified job. It is worth comparing whether it is more efficient to run the function once for each job you are interested in compared to running the function once for all jobs (NULL input parameter) and caching the results, for example into a temporary table.

Performance: This is an MSTVF, and it has the performance issues associated with these types of functions. I find that with a NULL value for the @job_name parameter, the function typically operates in no more than a few seconds on a instance with several hundred jobs, but the run time seems to depend more on job complexity than anything else.

Output: The function returns one row per job step. The columns are:

  • job_id (the uniqueidentifier of the job as found in msdb.dbo.sysjobs)
  • job_name (the name of the job as found in msdb.dbo.sysjobs)
  • step_id (the step number of the job as found in msdb.dbo.sysjobsteps)
  • step_name (the name of the step as found in msdb.dbo.sysjobsteps)
  • is_reachable (a boolean that is true if the step is reachable and false if it is not reachable)

Example usage:

select job_id, job_name, step_id, step_name, is_reachable
from dbo.fn_AreJobStepsReachable(null);

select job_id, job_name, step_id, step_name, is_reachable
from dbo.fn_AreJobStepsReachable('Name of important job');

Discussion: This code works by first identifying all of the job steps in a job and then “visiting” the steps beginning at the one identified as the start step. At each step, we are only interested in the steps marked as either “Go to the next step” or “Go to step n” in either the “On Success” or “On Failure” actions. If one of theses steps has already been visited, we skip it, otherwise we add the step to a queue, and then consider the next step in the queue. This continues until we have exhausted all items from the queue. Any job steps that were not visited are considered unreachable.

The Code

if exists (select * from sys.objects o join sys.schemas s on o.schema_id = s.schema_id where s.name = 'dbo' and o.name = 'fn_AreJobStepsReachable' and o.type = 'TF')
	drop function dbo.fn_AreJobStepsReachable;
go
create function dbo.fn_AreJobStepsReachable
(
	@job_name sysname
)
returns @resultTable table
(
	job_id uniqueidentifier,
	job_name nvarchar(128),
	step_id int,
	step_name nvarchar(128),
	is_reachable bit
)
as
begin
	declare @jobInfo table
	(
		job_id uniqueidentifier,
		start_step_id int,
		step_id int,
		on_success_action tinyint,
		on_success_step_id int,
		on_fail_action tinyint,
		on_fail_step_id int,
		is_reachable bit
	);

	declare @queue table
	(
		ID int not null identity(1,1) primary key clustered,
		job_id uniqueidentifier,
		step_id int
	);

	declare @next_queue table
	(
		ID int not null identity(1,1) primary key clustered,
		job_id uniqueidentifier,
		step_id int
	);

	if @job_name is null
	begin
		insert @jobInfo (job_id, start_step_id, step_id, on_success_action, on_success_step_id, on_fail_action, on_fail_step_id, is_reachable)
		select
				j.job_id, 
				j.start_step_id, 
				js.step_id, 
				js.on_success_action, 
				js.on_success_step_id, 
				js.on_fail_action, 
				js.on_fail_step_id,
				cast (0 as bit) is_reachable
		from msdb.dbo.sysjobs j
		inner join msdb.dbo.sysjobsteps js on js.job_id = j.job_id
	end
	else
	begin
		insert @jobInfo (job_id, start_step_id, step_id, on_success_action, on_success_step_id, on_fail_action, on_fail_step_id, is_reachable)
		select
				j.job_id, 
				j.start_step_id, 
				js.step_id, 
				js.on_success_action, 
				js.on_success_step_id, 
				js.on_fail_action, 
				js.on_fail_step_id,
				cast (0 as bit) is_reachable
		from msdb.dbo.sysjobs j
		inner join msdb.dbo.sysjobsteps js on js.job_id = j.job_id
		where j.name = @job_name;
	end

	insert @queue (job_id, step_id)
	select distinct job_id, start_step_id
	from @jobInfo;

	while exists (select * from @queue)
	begin
		update j
		set is_reachable = 1
		from @jobInfo j
		inner join @queue q on q.job_id = j.job_id and q.step_id = j.step_id;

		with NextStep as
		(
			select j.job_id,
				case j.on_success_action when 3 then q.step_id + 1
					when 4 then j.on_success_step_id end as next_step_id
			from @jobInfo j
			join @queue q on q.job_id = j.job_id and q.step_id = j.step_id
			where j.on_success_action in (3, 4)

			union all

			select j.job_id,
				case j.on_fail_action when 3 then q.step_id + 1
					when 4 then j.on_fail_step_id end as next_step_id
			from @jobInfo j
			inner join @queue q on q.job_id = j.job_id and q.step_id = j.step_id
			where j.on_fail_action in (3, 4)
		)
		insert @next_queue (job_id, step_id)
		select j.job_id, j.step_id
		from @jobInfo j
		inner join NextStep ns on j.job_id = ns.job_id and j.step_id = ns.next_step_id
		where j.is_reachable = 0;

		delete @queue;

		insert @queue (job_id, step_id)
		select job_id, step_id
		from @next_queue;

		delete @next_queue;
	end

	insert @resultTable (job_id, job_name, step_id, step_name, is_reachable)
	select ji.job_id, j.name job_name, ji.step_id, js.step_name, is_reachable
	from @jobInfo ji
	inner join msdb.dbo.sysjobs j on j.job_id = ji.job_id
	inner join msdb.dbo.sysjobsteps js on ji.job_id = js.job_id and ji.step_id = js.step_id;

	return;
end;
go

Don’t Fear the Timeout

Lately I’ve heard a few people over-react (in my opinion) to a particular message that can show up in a SQL Server query plan on rare occasions. As an example, consider this query against AdventureWorks2014:

SELECT p.Name, soh.ShipDate, sp.Name StateName,
	SUM((sod.OrderQty * sod.UnitPrice) *
		(1.0 - sod.UnitPriceDiscount))
		over (partition by p.Name, sp.Name) Price,
	th.MostRecentTransactionDate
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail sod
	on sod.ProductID = p.ProductID
INNER JOIN Sales.SalesOrderHeader soh
	on soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Person.Address addr
	on addr.AddressID = soh.ShipToAddressID
INNER JOIN Person.StateProvince sp
	on sp.StateProvinceID = addr.StateProvinceID
INNER JOIN
(
	SELECT MAX(tha.TransactionDate) MostRecentTransactionDate,
		tha.ProductID 
	FROM Production.TransactionHistoryArchive AS tha 
	GROUP BY tha.ProductID
) th on p.ProductID = th.ProductID
WHERE NOT EXISTS
    (SELECT *
     FROM Production.ProductModel AS pm 
     WHERE p.ProductModelID = pm.ProductModelID
           AND pm.Name LIKE N'%Long%')
OPTION (RECOMPILE);

If we get a query plan on this and look at the properties on the root node, we will see something like this.

Note the “Reason for Early Termination of Statement Optimization” is Time Out. The equivalent in the XML is this snippet (abbreviated for conciseness).

<StmtSimple StatementOptmLevel="FULL"
QueryHash="0xCF50B953FE93513A"
QueryPlanHash="0x80E16631F5292DFD"
StatementOptmEarlyAbortReason="TimeOut"
CardinalityEstimationModelVersion="120">

The comment I’ve heard several times is something to effect that, “Look! SQL Server couldn’t even finish compiling the query! It timed out!”

Here’s the thing. The fact that it timed out optimizing the query is a good thing. As queries become more and more complex, the search space for the optimizer to explore becomes larger at an exponential level. The optimizer will consider various join orders, which indexes may be useful to resolve the query, which of the army of transformation rules at its disposal are applicable and beneficial, parallelization options, and so forth. The number of permutations of query plans at SQL Server’s disposal can quickly become literally astronomical.

If the optimizer were to evaluate every single possibility, the compilation time could easily progress into minutes, hours, even into years at some point. Part of the art of query optimization is knowing when to quit — when the time spent evaluating the myriad of possibilities overwhelms the time to run the query. When this happens, the optimizer will gracefully time out and output its best option so far.

Of course it’s not always perfect — nothing ever is. But it’s usually pretty good.

Yes, the presence of a timeout can be an indication that the query is too complex and that simplification may be in order. But the timeout itself isn’t such a big deal.

It’s really a good thing.

CodeStock 2016

My Background with CodeStock

It was probably about three years ago that I was attending a SQL Saturday when one of the speakers (I wish I could remember who, so as to give proper credit) asked if anyone attended various community technical conferences. I jotted down the names that he threw out there, and then forgot about it for a while.

A few months passed, and then I ran across my note. I started to look up the conferences, and the one that really stuck out was CodeStock. It was an annual event held in Knoxville, Tennessee and the mix of topics from the most recent gathering looked to be just about right. The conference had already passed for the year, so once again I put the notion on the back burner but continued to watch for announcement of the next time around.

Then in late January of 2014, the notice came: the event would be held on July 11-12 of that year. I continued to watch as the call for speakers opened and close, a keynote was announced, and in due course the session voting began. The list of available sessions sealed the deal for me, and I attended the conference in July. Then again in 2015.

And now I’ve just returned from the 2016 event.

CodeStock 2016

So a quick overview of my CodeStock experience this year is in order.

It began with the keynote address by the excellent Cory House (b | t), who also presented the keynote during my first year. His basic theme was that it is OK to be anywhere on the technology adoption curve (bleeding-edge, mainstream, legacy), so long as it is a deliberate and informed decision. My favorite take-away was a comment regarding providing praise to others. Cory presented it, as I recall, in the context of complimenting a child’s accomplishment. Saying that “you’re so smart” is easy but downplays the actual achievement. Better to say, “You must have worked really hard.” I like that.

I won’t summarize every session I attended, but I’ll just highlight a couple. That’s not to suggest that any of the others weren’t worthwhile, but there were a few that really stood out.

I loved what I came to think of as the “encryption two-for”: back-to-back sessions in the same room starting with Steven Swenson (b | t), who gave a nuts-and-bolts presentation on how to use cryptography in applications, including several concrete implementations.

Then Adam Caudill (b | t) discussed the state of modern cryptography, include what is broken today, what is likely to break soon, and what remains solid. Then he moved into a fascinating explanation of where crypto appears to be headed in the next few years.

These two sessions provided a great “level-up” for me on a topic where I have a fair bit of interest but haven’t keep up on things lately.

I also attended the Women in Technology panel discussion. It was a good conversation and there some quality questions that arose, but the standout to me was the “adventure” that the moderator proposed with the leftover time during the session. She wanted everyone to meet someone new and to have a discussion with them. Honestly, this is the kind of thing that frightens me, being the big-time introvert that I am. Fortunately, the two people sitting on the same row with me had already left, so I figured I was off the hook.

Not so. Just as I getting ready to head on out, the moderator herself came to me and said, “You’re not talking to anyone!” She then began the conversation, and it was actually pretty fun and welcoming. A couple of other women also joined in over the next few minutes, and it ended up being a highlight of the conference for me. So thanks to them for engaging with me.

On the second day of the conference I mostly attended some “soft-skills” sessions, something I have tried of late to incorporate into conferences that I go to. They were all quite valuable, but the one session of the day that really resonated with me was from Cameron Presley (t) on the SOLID principles. I’ve tried to wrap my head around these ideas in the past, but Cameron finally managed to show it in a way that really made sense to me, so thanks to him for that. Now I’ve got a couple of personal projects that need a whole lot of refactoring.

Conclusion

CodeStock is an awesome and growing conference that has become a summer ritual for me over the past few years. It is interesting for me, as someone who mostly lives in the SQL Server world, but who dabbles in software development, to get to attend a conference like this and at least stay somewhat current on where the dev world is at and where it is going.

One last note. I think I was asked about three or four times if I planned to submit to present next year. I think it’s great that there is so much emphasis on growing a speaker base. (I actually did submit a session for CodeStock this year, but in the end was actually somewhat relieved when it wasn’t selected. May and June ended being pretty busy for me and it would have been challenging to create a presentation on top of all that.)

But yes, I have found myself noodling over what I might submit for 2017.

The Imminent Release of SQL Server 2016

When I first started working with SQL Server, version 7.0 was the cutting edge. That version was released in late 1998, and the project I was working on at the time was still in the planning stages. The powers that be decided to move to 7.0 rather than 6.5, and it was only some time later when I had to support some 6.5 instances that I came to appreciate the paradigm shift that the new version represented. I think we can all agree that we all benefit from that brave new world to this day.

Since then, we have seen six new releases of SQL Server (2000, 2005, 2008, 2008R2, 2012 and 2014), and we now find ourselves on the cusp of the release of SQL Server 2016. We have seen over the past couple of releases an increase in the transparency of the development process with the release of beta version, and 2016 took it a whole new level. There have been, if my count is correct, eleven public preview versions released as of this writing, up to release candidate 1. That is absolutely incredible, and I think Microsoft deserves a ton of credit for opening up the process to this degree.

The new and updated feature set in SQL Server 2016 is equally impressive. I have downloaded, installed, and (to varying degrees) used every single one of those preview versions, and yet I feel that I’ve barely skimmed the surface of what I need to eventually learn. I have studied pretty much every new feature in the product, but I just haven’t been able to dig in nearly as deep as I would like. There’s just too much.

So, I guess that all of this just my way of saying that I’m really quite excited for the imminent release of SQL Server 2016. It is packed with new features, and I am sure that it will be some time until I get the opportunity to use some of these new bits of functionality in real systems (and until they mature enough to be ready for such use). Still, this is an exciting time for me — more so than for the past few releases.

Professional Development Notes

I have been doing a brain dump on Hekaton for the past few weeks, but I need a bit of break before diving into (and possibly drowning in) the internals.

I have attended a few technology-related professional development courses over the past few years, and I wanted to place some of the better pieces of advice here, primarily for my reference.  Because I very much need this as much as anyone, if not more.

Because I’m such an expert on such matters (not!), I have decided to leave these suggestions in bullet form. It may also have something to with the fact I really don’t have anything to add.

  • Interview question: What is one of the worst things about working here?
  • Act like a consultant (even if you aren’t)
  • Work on advanced projects – just enough to be uncomfortable
  • Get some exercise / mediate
  • Non-computer-related hobby
  • Make your accomplishments visible (far beyond the team that already knows)
  • Have strong opinions, weakly held (be willing to change)
  • Be a jack-of-all-trades
  • Be willing to step up and work extra hours when the need arises
  • Avoid negativity – in yourself and others
  • Learn new, related technologies
  • Know where you want to go
  • Use objective information (not emotion) to project the lifetime of a technology
  • If your current job doesn’t include what you love to do, expand your job and take on extra responsibilities
  • Once per year take time to assess where you were a year ago and where you want to be in one year
  • Iterative, continual learning is the best approach
  • Recognize that short-term thinking is usually driven by artificially dictated project deadlines
  • On the other hand, too much long-term thinking can cause the project to become mired and never ship
  • Learn when to ignore company politics and just do your job
  • Choose your battles; is this the right hill to die on?
  • Listen before responding; let them give their story
  • Most people need criticism; most people who deliver criticism do it poorly (get over it)
  • Don’t try to do too much; do fewer things and do them well
  • Low performers realize they can do 20% and survive; high performers do 20% more and stay ahead
  • Continually interview, even if you don’t intend to leave (helps evaluate the industry and self-skills)

Hekaton Part 7: Error Handling

One in a series of posts about Microsoft’s In-Memory OLTP Functionality

In the previous post in this series, I discussed three validation errors that are unique to Hekaton and their common causes:

  • Error 41302 – Generated when two concurrent sessions update the same record.
  • Error 41303 – Generated when a concurrent update causes a repeatable read failure.
  • Error 41325 – Generated when a concurrent insert causes a serializable failure.

There is also error 41301 that I will address later in this series.

Applications that access databases containing In-Memory tables need to be aware of these errors and be able to respond gracefully when they occur.

There is one error condition that has been around for a long time in SQL Server that closely parallels these three errors, and that is the deadlock (error 1205). Deadlocks, however, are largely not accounted for in error handling because they tend to be rare, and can largely be avoided by good coding practices (specifically, consistent order of access to tables and other resources, when at all possible). On the other hand, the Hekaton errors will, in practice, be more commonplace, and I consider them to be “normal” events in practice, especially as the throughput on the system becomes heavy.

The appropriate response, in most situations, is to retry the operation when this error is raised. Microsoft has an example of handling errors in stored procedures, but applications also need to address these errors.

Here is a stub of a data access class that demonstrates how this can be done.

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace HekatonErrorTrapping
{
    class Dal : IDisposable
    {
        private SqlConnection _connection;

        public Dal()
        {
            SqlConnectionStringBuilder sb = new SqlConnectionStringBuilder();
            sb.DataSource = @".";
            sb.InitialCatalog = "Hekaton";
            sb.IntegratedSecurity = true;

            _connection = new SqlConnection(sb.ToString());
            _connection.Open();
        }

        public SqlTransaction StartTransaction()
        {
            return _connection.BeginTransaction();
        }

        public void UpdateProductUnitPrice(int productId, decimal newPrice, SqlTransaction transaction)
        {
            string sql = @"update dbo.Product with (snapshot) set UnitPrice = @NewPrice where ProductID = @ProductId;";
            SqlParameter idParameter = new SqlParameter("@ProductId", productId);
            SqlParameter priceParameter = new SqlParameter("@NewPrice", newPrice);

            using (SqlCommand command = new SqlCommand(sql, _connection, transaction))
            {
                command.Parameters.Add(idParameter);
                command.Parameters.Add(priceParameter);

                int retryCount = 5;
                bool isComplete = false;
                while (isComplete == false)
                {
                    try
                    {
                        command.ExecuteNonQuery();
                        isComplete = true;
                    }
                    catch (SqlException ex)
                    {
                        switch (ex.Number)
                        {
                            // 41302: Updated a record that has been updated since the txn started.
                            case 41302:

                            // 41305: Repeatable read validation failure.
                            case 41305:

                            // 41325: Serializable validation failure.
                            case 41325:

                            // 41301: Commit dependency on a transaction that aborted.
                            case 41301:

                            // 1205: Deadlock (not applicable to in-memory tables, but same principles apply).
                            case 1205:
                                retryCount--;
                                if (retryCount <= 0)
                                {
                                    throw;
                                }

                                // Take a very brief pause -- mostly needed in cases of very high write contention.
                                System.Threading.Thread.Sleep(1);
                                break;

                            default:
                                throw;
                        }
                    }
                }
            }
        }

        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }

        private void Dispose(bool disposing)
        {
            if (disposing)
            {
                if (_connection != null)
                {
                    _connection.Dispose();
                    _connection = null;
                }
            }
        }
    }
}

And here is an example of how to call this code.

using (Dal dal = new Dal())
{
    using (SqlTransaction transaction = dal.StartTransaction())
    {
        dal.UpdateProductUnitPrice(productId, newPrice, transaction);
        transaction.Commit();
    }
}

Now if one of these errors is generated during the update, the system will pause for one millisecond (just in case there is a lot of write contention going on; this pause may need to be longer on some systems), and retry the operation up to five times. (Of course, these values should be configurable and not hard-coded.)

There is one HUGE caveat to this, however. When one of these errors is thrown, the entire transaction is rolled back. In this simple example, the transaction is just a single update statement, but if the transaction spans a dozen statements, they all get rolled back.

This means that at the very least the application needs to be aware that the full transaction will go away, and it should wrap transaction appropriately. Better yet, the application will know how to replay the full transaction up to the point of failure and will gracefully retry the operation a given number of times before giving up.

Hekaton Part 6: MVCC Assumptions and Validation

One in a series of posts about Microsoft’s In-Memory OLTP Functionality

I originally intended to make part 6 in this series about error handling, but quickly realized that I needed to lay a bit of a foundation on multi-version concurrency control (MVCC), and particularly some of the assumptions made by MVCC and how those assumptions are subsequently confirmed.

Historically, SQL Server has used a pessimistic form of concurrency. My session sees every other connection as the enemy. Every other connection on the system is presumed to want access to my resources, and so SQL employs a locking mechanism to prevent anyone else from stomping over my work.

Now, locking and isolation levels is a complex topic in itself, but a simple example will suffice for the the purposes of this discussion. Suppose I update the Product table for ID 23, and before I commit my transaction, you come along and try to update the same row. Your connection will be blocked from proceeding until I commit or rollback, at which point you will be allowed to make the update. Under most isolation levels you won’t even be allowed to read the row I’ve updated so long as it is uncommitted.

Blocking is a good thing in that it preserved isolation, but a bad thing when everybody is always waiting on someone else.

In SQL Server 2005, Microsoft introduced a quasi-optimistic concurrency model called snapshot that addressed some of these problems. With snapshot enabled, you would be able to read the row that I’ve updated (you will see the values in the row prior to any of my changes). However, you still won’t be able to update it. You will be blocked just as always.

Snapshot isolation is frequently explained as reader don’t block writers, and writers don’t block readers.

Hekaton takes us a huge step forward toward optimistic concurrency. It makes a couple of assumptions that remove that “every other session is the enemy” paradigm. One assumption is that write conflicts are rare, so SQL always proceeds with data modifications under the assumption that no one else is going to change the same row.

A second assumption is that competing transactions are going be committed and not rolled back.

Now, sometimes these assumptions simply aren’t valid, so SQL Server will includes validation checks to make sure that things don’t go awry.

We’ll get into internals in a later part of this series to better explain how SQL does all this magic, but for now suffice it to say that SQL maintains multiple versions of the same row along with a pair of time stamps indicating when the row is valid. So when we update ProductId 23, SQL will actually create a new row in the table, and mark the original row as valid through, say, 1:17 pm, and the new row only becomes valid at 1:17 pm. When a user starts a new transaction, SQL notes the time that the transaction starts and during the course the transaction will always return the row that was valid and committed at that time.

Let’s demonstrate. We’ll open two tabs in Management Studio, each connected to our Hekaton database. In the first tab, we’ll make sure that the price on product 23 is a known value.

update dbo.Product
set UnitPrice = 23.00
where ProductID = 23;

Now in tab 1, we’ll start a transaction and modify the price.

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 30.00
where ProductID = 23;

In tab 2, let’s examine the value.

select ProductID, UnitPrice
from dbo.Product
where ProductID = 23;

This confirms that connection 2 sees a UnitPrice of 23.00, which is the committed value of the row at the time the transaction started. Continuing in tab 2, start a new transaction:

begin transaction;

Switch back to tab 1 and commit.

commit;

Re-execute the select in connection 2, and you will still get 23.00. Again, this was the committed value of the row at the time the transaction began. But if we close the transaction and do it again:

commit;

begin transaction;

select ProductID, UnitPrice
from dbo.Product with (snapshot)
where ProductID = 23;

commit;

This time we get a unit price of 30.00, reflecting the committed price when the new transaction begins.

So that is MVCC in a nutshell.

What about the validations? If one of the MVCC assumptions doesn’t hold up, SQL Server will detect this circumstance by one of several validation methods. In some cases, these validations happen as the transaction progresses, and in cases the validation happens when the transaction commits.

In connection 1, make sure that there are no open transactions, then run the following:

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 30.00
where ProductID = 23;

And in connection 2 we will try a simultaneous update.

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 35.00
where ProductID = 23;

This immediately fails with an error:

Msg 41302, Level 16, State 110, Line 18
The current transaction attempted to update a record that has been updated since this transaction started. The transaction was aborted.
Msg 3998, Level 16, State 1, Line 16
Uncommittable transaction is detected at the end of the batch. The transaction is rolled back.
The statement has been terminated.

I find the wording on the 41302 error a bit odd because it’s not completely correct, but the upshot is that there are two simultaneous update operations in progress. The “write-write conflict” assumption has failed, so the error is generated. Note that SQL Server does not block the progress of the transaction, as with previous versions, but it does generate an error.

In addition to the 41302 error, we might also get 41303 and 41325 errors. These relate to violations of the repeatable read and serializable isolation level requirements. Make sure that both connections don’t have open transactions, then run this in connection 1:

begin transaction;

select ProductID, UnitPrice
from dbo.Product with (repeatableread)
where ProductID = 23;

Then back in connection 2 change the price:

begin transaction;

update dbo.Product with (snapshot)
set UnitPrice = 38.00
where ProductID = 23;

commit;

Now try to commit transaction 1:

commit;

At commit time, SQL Server will validate the repeatable read requirement and the following error will be generated.


Msg 41305, Level 16, State 0, Line 25
The current transaction failed to commit due to a repeatable read validation failure.

Finally, we can create the conditions for a 41325 error. Commit any open transactions, then in connection 1 run:

delete dbo.Product where ProductID = 1000;

begin transaction;

select ProductID, UnitPrice
from dbo.Product with (serializable)
where ProductID = 1000;

And in connection 2 run:

insert dbo.Product (ProductID, Description, UnitPrice)
values (1000, 'Product 1000', 1000.00);

Back in connection 1:

commit;


Msg 41325, Level 16, State 0, Line 35
The current transaction failed to commit due to a serializable validation failure.

Whew, that was a lot just to talk about the validations that take place (and there is actually one more validation that happens at commit time, but we’ll address that separately). With that background, we can now talk about error handling in the next post.