Tuesday, September 15, 2009

I've lost my memory.......SQL Server VAS memory fragmentation issue

SQL Server VAS memory fragmentation issue
The SQL Server cluster is having regular backup issues with Red Gate Backup reporting a VDI Error. After extensive research the error is because of a memory fragmentation issue.
Long story short VDI (Microsoft’s Virtual Device Interface) is used by pretty much all 3rd party backup utilities to back up databases. VDI is only allowed to use a very narrow portion of memory known as VAS memory (Virtual Address Space) also known as MemToLeave. The default amount of VAS Space allocated for SQL Server is between 128 MB and 256 MB depending…. I’ve found that number varies from source to source.
There are 4 major contributors to VAS memory issues in regards to SQL Server
1) Extended Stored Procedures with memory leaks
2) Use of linked servers
3) COM integration via Stored Procedures
4) Complex Queries that generate Execution plans of greater than 8K and cause paging
How does this impact us?
Glad you asked. What is happening is that our VAS memory is being fragmented. Why is this fragmentation a problem? VDI requires several blocks of specific sizes of contiguous memory in order to buffer the data and perform its task. In this case backing up, compressing and encrypting data.

Steps to solve:
1) Use the –g startup parameter for SQL Server and up the VAS memory for SQL to 512MB. This will give us a bigger area to play in and hopefully more contiguous memory blocks. This change is recommended by Microsoft for Database Servers that have over 4GB of RAM and 75+ databases. We have 32 GB of RAM and over 50 databases so it is a needed change soon anyway
2) Investigate ways to eliminate the one XP I know we are using. This is an SMTP XP used for alerting. While it has been in use for years and has had no negative impact I have no doubt my friends at Microsoft would ask I remove it… just to rule it out as a problem.
3) Verify that the only place I can find that is using COM in a SP has been commented out. I had to write a handy dandy script to ferret that sucker out, I may share it in the future.
4) Reduce amount of queries that attach to the cluster via Linked Server these come from two places
1) Reports on report server that simply call a SP on the cluster
2) old SP code that references Linked Servers on the cluster its self
5) Code review any Stored Procedures released on the cluster within the past couple of weeks to see if we can simplify any of the execution plans. Why only the last couple? Because this is a change in how the cluster was working. We never had any of these issues before, so what changed? I’m guessing our intern released some nasty code on the sly…

This is a fairly recent issue as the cluster worked for a number of weeks without any issues at all. This has really crept in during the last 2 weeks.
Now I could go into the depth of detail that some grumpy DBA’s and other MVP’s have gone into, but really, that data is out there. This is just my experience with this issue, and because all of that data is so easily available and complex (and I stand to get it wrong in my translation) I’ll just let google do the walking for you!

Labels: , , ,

0 Comments:

Post a Comment

<< Home