Skip to content

Monitoring Availability Groups Part 3 – Extended Events

In last part of the series, we will touch on how to monitor availability groups using extended events.  I prefer this method over using SQL Agent alerts just because I don’t like relying on the individual servers to send me emails.  I like to have our central monitoring server send out alerts based on queries we run against the servers.

Microsoft creates for by default when we create an Availability Group an AlwaysOn_Health extended events session for us.  The code for it listed below:

Now that the extended events session is setup we can use some queries to query information about our AGs and error messages happening on our servers.  First to query when the server failover and becomes your primary server query the event availability_replica_state_change.  The first 10 lines just read in the files for the extended event session so you don’t have to identify the exact filenames.  Then we parse the xml event for get the timestamp, previous state, current state, repliace name, and group name for the event FROM the filenames we collected before.  In the WHERE clause were are looking for when the state has changed to PRIMARY_NORMAL indicating a failover to the server.

There are a number of errors it tracks that are specific to AGs that I talked about in part two of this series, then we see they track errors numbers 823 and 824 and 829 common corruptions numbers.  Which tells me we can customize this to capture any errors that we want to be query-able from another system.  So if we replace the definition with code below we can get all our SQL Agent alerts in one spot via querying the extended event session by adding to the WHERE clause the severity >= (19).  Highlighted below is the change to the extended event session.  You will need to drop the old extended event session in order to create this new one.

To query for error messages above sev 19 and error numbers included for AGs (1480 – Role Change, 35264 – AG Data Movement – Suspended, 35265 – AG Data Movement – Resumed, 41404 – 41404 – AG is offline, and 41405 AG is not ready for automatic failover).  In this query we are parsing out the timestamp, error number, severity, and message.

Series Navigation<< Monitoring Availability Groups Part 2 – SQL Agent Alerts

One thought on “Monitoring Availability Groups Part 3 – Extended Events

  1. Pingback: Availability Group Extended Events – Curated SQL

Leave a Comment