Tuesday, September 28, 2010

Proactive performance tuning

So I was bored the other day... at work... yeah... not so much bored, but rather tired of doing the same tasks and so I decided to do something new. I put myself in learning mode and decided to figure out what I could do to help the performance on my Reporting Cluster.

See, it is a subscriber in a transactional replication scheme, and when it is under heavy load it sometimes lags by 30 minutes in committing transactions. 30 minutes is a bit too much for this database (yes I know, it is reporting, near real time should do)

So I dug around and instead of trying to fix one report I kept failing in narrowing down the right one, I went for the 'big hammer' approach.

I found some ways of using some system management views to show me what indexes aren't being used. I did this in hopes of deleting some indexes and saving some IO and making the transactions commit faster.

In messing with that, I found another dynamic management view (complete with joins) that shows system recommended indexes... as well as some stats behind them... how many times they would have been used, time saved, along with other things to help rank.

NOW remember that indexing is as much art as science, and that simply indexing everything is not such a good idea... and to make the indexes as wide as they need to be, and try to avoid single purpose indexes.... unless they have HUGE impact.

So on one table I got rid of 5 unused indexes (legacy) and added 3. Net loss of 2.... and on the next day during peak time I had to ask the other developers if they tweaked anything, or deactivated reports... because we never had any spikes in IO or CPU usage (was normal) and we never had a single replication alert... which means we never dipped above 4 min lag time in transactions committed.

Awesome.

I proceeded to tweak some more the next day... heck I spent 8 full hours tweaking.. I tweaked Reporting, I tweaked OLTP, I tweaked Dev.... I played and tweaked and tweaked and played...

The upside is that we are MUCH faster... the down side is that our database has grown a bit... but thats ok. I have more indexes to delete when I need the space and performance, and I have yet to compress some key tables. That will help our IO even more, gain some more performance...

Lesson, in going off the beaten path I was able to make a huge impact, get some behind the sceens hero mojo working, make things faster and better.

Not that anybody noticed. I even got a ticket to look into some of the symptoms I had fixed 4 days earlier!

As for the code. I've got some other queries but here are the basics.. and yes I know I could format them better... but hey, this is what I have now

--Find Missing Index
SELECT
--statement AS [database.scheme.table],
--column_id , column_name, column_usage,
sb.name,id.statement,id.equality_columns,id.inequality_columns,id.included_columns,
migs.user_seeks * avg_total_user_cost as AdditiveCost,
migs.user_seeks, migs.user_scans,
migs.last_user_seek, migs.avg_total_user_cost,
migs.avg_user_impact--,
--,*
--,mid.*
--FROM sys.dm_db_missing_index_details AS mid
--CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
--INNER JOIN
from sys.dm_db_missing_index_groups AS mig
--ON mig.index_handle = mid.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats AS migs
ON mig.index_group_handle=migs.group_handle
inner join sys.dm_db_missing_index_details id
on id.index_handle = mig.index_handle
inner join sys.sysdatabases sb
on id.database_id = sb.dbid
where sb.name <> 'msdb' --and user_seeks >1000
--and last_user_seek > '2010-09-16 08:00:17.820'
and avg_total_user_cost > 1
and user_seeks > 10
order by 6 desc
go
--Conversly Unused index
DECLARE @TABLENAME SYSNAME
SET @TABLENAME= 'dbo.card'

SELECT DB_NAME(DATABASE_ID) AS [DATABASE NAME]
, OBJECT_NAME(SS.OBJECT_ID) AS [OBJECT NAME]
, I.NAME AS [INDEX NAME]
, I.INDEX_ID AS [INDEX ID]
, USER_SEEKS AS [NUMBER OF SEEKS]
, USER_SCANS AS [NUMBER OF SCANS]
, USER_LOOKUPS AS [NUMBER OF BOOKMARK LOOKUPS]
, USER_UPDATES AS [NUMBER OF UPDATES]
FROM
SYS.DM_DB_INDEX_USAGE_STATS SS
INNER JOIN SYS.INDEXES I
ON I.OBJECT_ID = SS.OBJECT_ID
AND I.INDEX_ID = SS.INDEX_ID
WHERE DATABASE_ID = DB_ID()
AND OBJECTPROPERTY (SS.OBJECT_ID,'IsUserTable') = 1
AND SS.OBJECT_ID = OBJECT_ID(@TABLENAME)
ORDER BY [OBJECT NAME]
,USER_SEEKS desc
, USER_SCANS desc
, USER_LOOKUPS desc
, i.index_id
GO

Labels: , ,

Thursday, September 02, 2010

Nulls, bad data and execution plans

Msg 681, Level 16, State 3, Line 1
Attempting to set a non-NULL-able column's value to NULL.

That is the error I recieved one morning on a job that has run with success for many moons. There were no non-null-able columns in the dest table at all. Which made things seem very odd. Then I saw a couple KB articles from Microsoft that seemed to point to one of two hot fixes. One was included in SP1, and we had applied that months ago. The other was in CU 7 to SP1... great.

I wasted an entire day trying to debug it and was even tempted to put SQL 2008 SP2 CTP in production without testing *GASP* The issue for my company was huge, big reporting implications and what not.

I went down a route that had me avoiding the linked server which, actually, worked in small scale testing. I was able to do between 500,000 and 1,000,000 rows a 'new' way of moving the data around and I wasn't able to do 1 the old way. I thought I had it licked. It was, in my mind, simply a new execution plan that caused hash joins on a nullable field rather than what it was doing before that worked, and I had a work around.

SOOOOO I released the new code and waited for the process to start. I woke up to the SAME error.

Arrrrrgggghhhhhhh (and no it isn't talk like a pirate day)

Then one of the data guys 'fessed up to doing some new ETL processes on two of the tables that were in question... and one of them, guess what, had NULLS in a field used for a join. Excellent. I found the nulls, 'fixed' them and re-ran the process.

KABLAMO. But this time I got a new error. turns out that the last data folks here decided to use a char(1) as a tiny int, and somehow this new ETL process put a '*' into the char(1) field. Once that was fixed all was well.


So what are the lessons here?
1) Error Messages aren't what they seem
2) While KB articles may not be 100% accurate, they often point to the right place. Follow them
3) What changed sometimes IS the data. Just because nobody did a code release doesn't mean that things didn't change.

Labels: , , , , ,