Skip to content

Automatically Fixing High VLF Count on SQL Server 2012+

The Problem

Our databases autogrow because we have thousands of databases and before we know we have databases with high VLF counts in the transaction logs.  If you don’t know why that is bad refer to the resources section of this article.

Update – November 5, 2017

I have added this code to my github repository to make it easier to download.  Be sure to read the steps at the bottom on how to setup the SQL Agent job and the details between here and there to understand what it does.

The Solution

Fix the high VLF count without any manual intervention by the DBA.  This will require a few pieces to the puzzle to work across the various versions of SQL Server.

Step 1

The first part of the process is to capture the info from DBCC LOGINFO or if you are ready for 2017 the new dmv sys.dm_db_log_stats into a table you can read later to know how many VLFs exist in your database currently. So we going to create the table called VLFInfo and used the procedure VLF_UpdateInfo to populate that data.  The procedure would be called in step one of a SQL Agent Job to automate the fixing of VLF files during appropriate maintenance windows on your server.  But as you will see in Step 2 my solutions tries to account for that by not doing them too close to when the file just grew and acquired those new lovely extra VLFs.

Step 2

Now that we have that information captured we can execute a stored procedure to fix the high VLF count.  As part of our solution, we have an Extended Event session that tracks file growths and shrink activities.  This allows us to run the job all day long and check to see the last time the files have grown.  In our stored procedure VLF_AutoFix we have a parameter @LookBackTime that defaults to 60 minutes, so if the transaction log grew in the last 60 minutes it will not try to fix the high VLFs.

Step 3

Now that we have a way to check to see if the size of the log has changed in the time we specified let’s look at the procedure that we can run to automatically shrink and grow our your log file in the recommended file size chunks. First, we are going to create a table to track when we attempt to auto-fix our high VLF counts. We will use this table in our procedure to not attempt to auto fix the VLFs for another 24 hours if it was unable to fix it the first time due to active VLF at the end of the transaction log. We don’t the job to constantly be trying to fix those VLFs.

NOTE: The algorithm for change in 2014 for how VLFs are created so you most likely will not need this procedure for those environments and it doesn’t use a new formula.

Step 4

In the last part, we have two procedures.  One serves as a wrapper to call the procedure that actually fixes the high VLF count and logs the attempt to our table above.  It also before attempting the fix checks the Extended Event session to see if the log file size changed in the @LookBackTime parameters number of hours and checks the  VLFAutoFix table to see if a fix was attempted the last  @HoursSinceLastFix number of hours. If either those conditions are met it will not attempt to fix the VLFs and wait until the next time the job runs and check again.  The second procedure actually runs your backup job and attempts to do the shrinking and growing the file back out.

Bringing All Together

Well by now you have seen a whole bunch of code and it would be a good time to give you a summary on whole to implement the solution in your environment.  First, create your table VLFInfo.  Then create a job to run the first stored procedure VLF_UpdateInfo to populate that table.  Before you set up the next steps to do any automatic fixes check the VLFInfo across all your servers and see if where you can test this at.

Next, you need to create the Extended Event XE_DatabaseSizeChangeEvents session and let it start collecting data for you.  Then we can create the tracking table VLFAutoFix so we will know when it attempts to fix high VLF counts.  Then create stored procedures VLF_AutoFix and VLF_Fix.  Then as part of the job you created setup step two to execute procedure VLF_AutoFix.

Now create a SQL Agent job.  Step one will call procedure VLF_UpdateInfo.  Step two will call VLF_AutoFix with the parameter @LogBackJobName parameter specifying your log back job so it can help mark the VLFs a the end the log file not active.

Currently, this system runs hourly across 400 servers over thousands of databases with 30K+ transactions per second without causing any issues because it does take the precautions not to run it if the file has grown in the last hour and do not attempt to fix it within 6 hours of the previous attempt.

Resources

Transaction Log VLFs – too many or too few?

8 Steps to better Transaction Log throughput

Performance impact of lots of VLFs in the transaction log

13 thoughts on “Automatically Fixing High VLF Count on SQL Server 2012+

  1. Pingback: Automatically Fix Those VLFs – Curated SQL

  2. OZ says:

    Nice article Tracy, I might roll this one out too! Tiny typo in title, replace “Sever” with “Server” ^_^

    Reply
  3. PS says:

    Tracy,

    Are you sure this works on 2008 SQL Srvr? Particularly the XE session in step 2. I just tried to run Step 2 on a SQL Server 2008 R2 database, and it fails with the message “The event name, “sqlserver.database_file_size_change”, is invalid, or the object could not be found”.

    Perhaps SS 2012 or greater… I haven’t tried this yet on a 2012 instance.

    Reply
    • Tracy Boggiano says:

      Your right I think it is 2012 and up. I forgot when I actually wrote this but looked when the extended event was created and confirmed it is in 2012.

      Reply
  4. Steve says:

    I implemented something similar in my shop, only it warns of VLF issues instead of fixing them. The option is there to auto fix, but it is turned off. We have a lot of log reader sensitive activity – cdc, log shipping, etc. and shrinking a log at the wrong time can be problematic.

    Reply
    • Tracy Boggiano says:

      We started off that way with warnings but found with the extra logic of not doing if the log grew recently we didn’t have any issues with it. We have AGs that we run it with no issues and replication.

      Reply
  5. James says:

    For the Agent job, how frequent is it ran?

    Reply
    • Tracy Boggiano says:

      We run ours every hour as it only fix it when it finds one that is over the limit that hasn’t grown the last hour.

      Reply
  6. Tom says:

    Tracy,

    Nice job! Could you explain more about why “not to run it if the file has grown in the last hour and to not attempt to fix it within 6 hours of the previous attempt.”?

    Thanks,
    Tom

    Reply
    • Tracy Boggiano says:

      Yes, we don’t run if it has grown in the last hour just in case the activity that caused it to grow is continuing to happen so we want the activity to settle down. We wait the 6 hours because we don’t want to constantly trying to run shrink commands every hour while we wait for the active VLF to be wrap around to the beginning of the file. It’s just a safety measure to not have too much maintenance activity occurring on the system.

      Reply
  7. Tom Roush says:

    very nicely done – thank you for sharing. question I have is that we’ve got an external backup provider (Microsoft’s Data Protection Manager) – and if we used your script, we’d break the recovery chain pretty quickly because we’d have tlog backups going on in two places. Do you know how to kick off a DPM backup from a sql job? could be through powershell…thoughts?

    Reply
    • Tracy Boggiano says:

      I think in this scenario where you have a third party tool, or your taking your backups not with SQL Agent jobs the best thing you can do to work around it is to setup a SQL Agent that issues a WAITFOR command. Set the wait time for the time between backups assuming your logs backups are taken fairly close together say every 5 to 15 minutes.

      The other option would be to comment out the lines (54-93) that take the backups and get back as many VLFs are you can at that time then I would probably increase the @HoursSinceLastFix to higher number to account for the fact you can maximize the benefits of the log backups. This will give the log file a longer amount of time to wrap around to the beginning of the file.

      Reply

Leave a Comment