This entry is part 3 of 3 in the series Monitoring Availability Groups

In last part of the series, we will touch on how to monitor availability groups using extended events.  I prefer this method over using SQL Agent alerts just because I don’t like relying on the individual servers to send me emails.  I like to have our central monitoring server send out alerts based on queries we run against the servers.

Microsoft creates for by default when we create an Availability Group an AlwaysOn_Health extended events session for us.  The code for it listed below:

Now that the extended events session is setup we can use some queries to query information about our AGs and error messages happening on our servers.  First to query when the server failover and becomes your primary server query the event availability_replica_state_change.  The first 10 lines just read in the files for the extended event session so you don’t have to identify the exact filenames.  Then we parse the xml event for get the timestamp, previous state, current state, repliace name, and group name for the event FROM the filenames we collected before.  In the WHERE clause were are looking for when the state has changed to PRIMARY_NORMAL indicating a failover to the server.

There are a number of errors it tracks that are specific to AGs that I talked about in part two of this series, then we see they track errors numbers 823 and 824 and 829 common corruptions numbers.  Which tells me we can customize this to capture any errors that we want to be query-able from another system.  So if we replace the definition with code below we can get all our SQL Agent alerts in one spot via querying the extended event session by adding to the WHERE clause the severity >= (19).  Highlighted below is the change to the extended event session.  You will need to drop the old extended event session in order to create this new one.

To query for error messages above sev 19 and error numbers included for AGs (1480 – Role Change, 35264 – AG Data Movement – Suspended, 35265 – AG Data Movement – Resumed, 41404 – 41404 – AG is offline, and 41405 AG is not ready for automatic failover).  In this query we are parsing out the timestamp, error number, severity, and message.

Series Navigation<< Monitoring Availability Groups Part 2 – SQL Agent Alerts
Tracy Boggiano
Follow me

Tracy Boggiano

Database Superhero at ChannelAdvisor
Tracy has spent over 20 years in IT and has been using SQL Server since 1999 and is currently certified as a MCSE Data Management and Analytics. She has worked on SQL Server 6.5 and up including currently SQL 2017 RC1. She enjoys monitoring, performance tuning, and high availability and disaster recovery technologies. Tracy is currently a co-organizar the for special interest group for Advanced DBA Topics for the TriPASS user group.

She also tinkered with databases in middle school to keep her sports card collection organized.

Tracy has volunteered through the NC Guardian ad Litem program since 2003 advocating for abused and neglected foster children in court.This is her passion outside of SQL Server and favorite job.More information about this program in North Carolina can be found at http://volunteerforgal.org or the national organization CASA at http://www.casaforchildren.org.
Tracy Boggiano
Follow me