Skip to content

Monitoring Availability Groups Part 2 – SQL Agent Alerts

In my last post on monitoring Availability Groups I covered looking at Perfmon Counters now we take a look at getting alerts from SQL Agent. When it comes to checking on your Availability Groups one of the easiest ways to keep track at a high level of the health of the them is to setup SQL Agent alerts.  I’m going to go out on a limb here an assume if you are using SQL Agent alerts you have severity alerts 19 through 25 along with a few extras like error numbers 823 and 825 setup if not check out this post at SQLskills.  I personally have 16 through 18 turned on as well going to email only, no paging.

For Availability Groups we have a few extra error numbers we care about. Error number 1480 tells when a server changes roles, so we can know when a server flips from a secondary to a primary, or from a primary to a secondary. Error number 35264 tells when data movement has suspended on any database. This can occur for many reasons. One I have seen is when you have expanded your mount point on your primary and the data or log file runs out of space on the secondary the data or log file can not expand on the secondary because you forgot to expand the secondary. Error number 35265 tells you when the data movement has resumed on any database.  Error number 41404 let’s you know if your AG is offline which can be bad if you expected an automatic failover.  Error number 41405 let’s you know if an Availability Group can’t automatically failover for any reason.  In the later to cases you will want to look at your SQL Error logs and AlwaysOn Extended Events Health session.

Series Navigation<< Monitoring Availability Groups Part 1 – Perfmon CountersMonitoring Availability Groups Part 3 – Extended Events >>

3 thoughts on “Monitoring Availability Groups Part 2 – SQL Agent Alerts

  1. Pingback: Availability Group Agent Alerts – Curated SQL

  2. Robert says:

    Another good message id to monitor in an availability group is id 19406, which is for replica state change. This event will occur for all the events listed in this article and more. 19406 kind of generic due to a lot of different possible events triggering this error. It requires further investigation to determine the cause.

    Reply
    • Tracy Boggiano says:

      I will have to look at that alert further and amend once I’ve added it tor our systems.

      Reply

Leave a Comment