Thursday, May 19, 2011

Update Statistics Of DB

SQL Server UPDATE STATISTICS

You can add statistics on columns that don't have statistics in order to boost query performance. For example, you can create statistics for columns in a composite index other than the first column (which is the default for AUTO CREATE STATISTICS), and for columns that are not indexed.

The SQL Server Query Optimizer has the ability to use statistics on columns that are not indexed in order to create more optimal execution plans. Doing so can increase performance without having the overhead of indexes. In fact, sometimes it is more efficient to use column statistics instead of an index to optimize query performance. The only way to know this for sure is to make controlled experiments and see what works best for your situation.

Generally, column statistics, if needed by the Query Optimizer, are generated automatically by SQL Server when queries are being optimized by the Query Optimizer. For example, column statistics are automatically created when there are currently no statistics available for the column being used as a search argument in a WHERE clause or a JOIN clause. There are two cases when SQL Server will not automatically create column statistics, even when they fit the criteria described above. These include: when the costs needed to create the column statistics are larger than the costs of the query plan; and when SQL Server is too busy.

Another way to identify non-indexed columns to add statistics to is to use the Index Wizard. It can identify columns that can benefit from column statistics, and add them for you automatically if you wish.

You use the CREATE STATISTICS command to manually create statistics on columns.

*****

One way to tell if the statistics for a column are missing or outdated, or if the accuracy of current statistics is not ideal (they may not be ideal if the sample size of the statistics are not high enough), is to run a graphical execution plan of a query as an estimated plan and as a real plan. In other words, you will run both an estimated and an actual graphical execution plan using Query Analyzer, and then compare the results.

When you run an estimated graphical execution plan for a query, you get results, such as the estimated row count returned. When you run an actual graphical execution plan for a query, you get the actual row count returned. If the statistics for the column(s) used by the query are current and accurate, then the estimated row count and the actual row count should be very similar. If they are wildly different, this can mean three things. Either the statistics are missing, out of date, or not accurate enough.

First, check to see if the statistics are missing. If they are, they can easily be added. Second, if statistics do exist, then update them to see if this resolves the problem. Third, if the first two suggestions don't work, then the statistics may not be accurate enough.

By default, when a table is less than 8 MB, the sample used to create the column statistics is the entire table. So for tables of less than 8 MB, in theory the statistics created can't be any better than they are. So if your table is less than 8 MB, and updating the statistics doesn't resolve the issue of a large difference between the estimated and actual row count, then nothing can. In a case like this, your only option, assuming there is a performance problem, is to use a hint to override what the Query Optimizer is doing wrong.

But if the table in question is greater than 8 MB, and when SQL Server automatically creates statistics for a table, it does not examine every row. Instead, it takes a sample of the available rows. This sample may or may not be good enough to produce accurate enough statistics. One way to get better statistics is to manually update the statistics for the table using the UPDATE STATISTICs command. This command has two options that allow you to get a better sampling of the rows in your table.

First, you can use the FULLSCAN option. This will force every row in the table or index to be fully examined and used to build the column statistics. As you might guess, using this option can increase the time it takes to update the statistics, which could hurt performance elsewhere on your server, especially if the table is huge.

As an alternative to the FULLSCAN option, you can use the SAMPLE option. This allows you to control how large a sample of rows to sample. For example, you might specify that only 50 percent of the rows are to be sampled. This is less than what a FULLSCAN will produce, but more than what a default sample will produce, offering a compromise in accuracy and resources needed to perform the sample.

If you find that the statistics for one or more of our tables is not accurate enough for your needs, you may need to schedule running the UPDATE STATISTICS command at regular intervals using an appropriate sampling level. Only through testing will you know what sampling level is best for your situation.

*****

Statistics are only maintained on the first column of a composite index. Because of this, SQL Server's Query Optimizer might not make as good use of composite indexes as it could. If you suspect that a composite index is not being used as you expected it would be, you will want first to verify that the index is not being used by examining the query's execution plan. If you see that it is not being used, and it should be, consider adding a hint to force the index to be used, then see if using the index really helps performance. If it does, keep the hint. If it does not help performance, drop the hint and the composite index that is not being used, assuming that it is not needed by another query.

*****

If you want to find out if an index has had its indexes updated or not, you can use the DBCC SHOW_STATISTICS command, like this:

DBCC SHOW_STATISTICS (table_name , index_name)

This command will not only tell you when statistics were last updated, but also provide you information on the kind of statistics that has been collected for the index you are examining

*****

If you have a performance issue with a query, the very first step you should take before analyzing it is to update the statistics on all the tables in your database. This is because many query performance issues are due to outdated statistics, and updating them before troubleshooting query performance may save you a lot of time. If the query is still giving you trouble after updating the statistics, then you can begin your troubleshooting.

*****

According to a SQL Server tuning expert at Microsoft (one of their best), it was suggested to me that if your SQL Server maintenance window allows for it, that you should update statistics for all tables and for all databases every night. This is because the auto-update option is less than perfect, and accurate and up-to-date statistics is so important to overall performance of SQL Server. Of course, if this is not possible because of time constraints, then you will have to perform this step less often, or even not at all, and rely on the auto-update feature. But if you do have this window, you should take advantage of it.

*****

If you upgrade to SQL Server 2000 from SQL Server 7.0, you should update all of the statistics in all your databases using either UPDATE STATISTICS or sp_updatestats. This is because statistics are not automatically updated during the conversion process.

If you don't manually update the statistics, then you may find that many of the queries are running slowly because of incorrect statistics, and incorrectly blame SQL Server 2000 as being slower than SQL Server 7.0, which is just not the case.

CREATE STATISTICS

Creates a histogram and associated density groups (collections) over the supplied column or set of columns.

Syntax

CREATE STATISTICS statistics_name
ON { table | view } ( column [ ,...n ] )
    [ WITH
        [ [ FULLSCAN
            | SAMPLE number { PERCENT | ROWS } ] [ , ] ]
        [ NORECOMPUTE ]
    ]

Arguments

statistics_name

Is the name of the statistics group to create. Statistics names must conform to the rules for identifiers.

table

Is the name of the table on which to create the named statistics. Table names must conform to the rules for identifiers. table is the table with which the column is associated. Specifying the table owner name is optional. Statistics can be created on tables in another database by specifying a qualified database name.

view

Is the name of the view on which to create the named statistics. A view must have a clustered index before statistics can be created on it. View names must conform to the rules for identifiers. view is the view with which the column is associated. Specifying the view owner name is optional. Statistics can be created on views in another database by specifying a qualified database name.

column

Is the column or set of columns on which to create statistics. Computed columns and columns of the ntext, text, or image data types cannot be specified as statistics columns.

Is a placeholder indicating that multiple columns can be specified.

FULLSCAN

Specifies that all rows in table should be read to gather the statistics. Specifying FULLSCAN provides the same behavior as SAMPLE 100 PERCENT. This option cannot be used with the SAMPLE option.

SAMPLE number { PERCENT | ROWS }

Specifies that a percentage, or a specified number of rows, of the data should be read using random sampling to gather the statistics. number can be only an integer: if PERCENT, number should be from 0 through 100; if ROWS, number can be from 0 to the n total rows.

This option cannot be used with the FULLSCAN option. If no SAMPLE or FULLSCAN option is given, an automatic sample is computed by Microsoft® SQL Server™.

NORECOMPUTE

Specifies that automatic recomputation of the statistics should be disabled. If this option is specified, SQL Server continues to use previously created (old) statistics even as the data changes. The statistics are not automatically updated and maintained by SQL Server, which may produce suboptimal plans.

Remarks

Only the table owner can create statistics on that table. The owner of a table can create a statistics group (collection) at any time, whether or not there is data in the table.

CREATE STATISTICS can be executed on views with clustered index, or indexed views. Statistics on indexed views are used by the optimizer only if the view is directly referenced in the query and the NOEXPAND hint is specified for the view. Otherwise, the statistics are derived from the underlying tables before the indexed view is substituted into the query plan. Such substitution is supported only on Microsoft SQL Server 2000 Enterprise and Developer Editions.

Permissions

CREATE STATISTICS permissions default to members of the sysadmin fixed server role and the db_ddladmin and db_owner fixed database roles and the table owner, and are not transferable.

A. Use CREATE STATISTICS with SAMPLE number PERCENT

This example creates the names statistics group (collection), which calculates random sampling statistics on five percent of the CompanyName and ContactName columns of the Customers table.

CREATE STATISTICS names

   ON Customers (CompanyName, ContactName)

   WITH SAMPLE 5 PERCENT

GO

B. Use CREATE STATISTICS with FULLSCAN and NORECOMPUTE

This example creates the names statistics group (collection), which calculates statistics for all rows in the CompanyName and ContactName columns of the Customers table and disables automatic recomputation of statistics.

CREATE STATISTICS names

   ON Northwind..Customers (CompanyName, ContactName)

   WITH FULLSCAN, NORECOMPUTE

GO

DROP STATISTICS

Drops statistics for multiple collections within the specified tables (in the current database).

Syntax

DROP STATISTICS table.statistics_name | view.statistics_name [ ,...n ]

Arguments

table | view

Is the name of the target table or indexed view for which statistics should be dropped. Table and view names must conform to the rules for identifiers. For more information, see Using Identifiers. Specifying the table or view owner name is optional.

statistics_name

Is the name of the statistics group to drop. Statistics names must conform to the rules for identifiers.

Is a placeholder indicating that more than one statistics_name group (collection) can be specified.

Remarks

Be careful when dropping statistics because dropping statistics may affect the plan chosen by the query optimizer.

Examples

DROP STATISTICS authors.anames, titles.tnames

SP for Create Statistics

sp_createstats

Syntax

sp_createstats [ [ @indexonly = ] 'indexonly' ]
[ , [ @fullscan = ] 'fullscan' ]
[ , [ @norecompute = ] 'norecompute' ]

Arguments

[@indexonly =] 'indexonly'

Specifies that only the columns participating in an index should be considered for statistics creation. indexonly is char(9), with a default of NO.

[@fullscan =] 'fullscan'

Specifies that the FULLSCAN option is used with the CREATE STATISTICS statement. If fullscan is omitted, Microsoft® SQL Server™ performs a default sample scan. fullscan is char(9), with a default of NO.

[@norecompute =] 'norecompute'

Specifies that automatic recomputation of statistics is disabled for the newly created statistics. norecompute is char(12) with a default of NO.

Return Code Values

0 (success) or 1 (failure)

Result Sets

None

Permissions

Permissions default to members of the sysadmin fixed server role, the db_owner fixed database role, and the owner of the objects.

Examples

This example creates statistics for all eligible columns for all user tables in the current database.

EXEC sp_createstats

This example creates statistics for only the columns participating in an index.

EXEC sp_createstats 'indexonly'

UPDATE STATISTICS

Updates information about the distribution of key values for one or more statistics groups (collections) in the specified table or indexed view.

Syntax

UPDATE STATISTICS table | view
    [
        index
        | ( statistics_name [ ,...n ] )     ]
    [    WITH
        [
            [ FULLSCAN ]
            | SAMPLE number { PERCENT | ROWS } ]
            | RESAMPLE
        ]
        [ [ , ] [ ALL | COLUMNS | INDEX ]
        [ [ , ] NORECOMPUTE ]
    ]

Arguments

table | view

Is the name of the table or indexed view for which to update statistics. Table or view names must conform to the rules for identifiers. For more information, see Using Identifiers. Because index names are not unique within each database, table or view must be specified. Specifying the database, table, or view owner is optional. Indexed views are supported only on Microsoft® SQL Server™ 2000, Enterprise Edition.

index

Is the index for which statistics are being updated. Index names must conform to the rules for identifiers. If index is not specified, the distribution statistics for all indexes in the specified table or indexed view are updated. To see a list of index names and descriptions, execute sp_helpindex with the table or view name.

statistics_name

Is the name of the statistics group (collection) to update. Statistics names must conform to the rules for identifiers. For more information about creating statistics groups, see CREATE STATISTICS.

Is a placeholder indicating that multiple statistics_name groups can be specified.

FULLSCAN

Specifies that all rows in table or view should be read to gather the statistics. FULLSCAN provides the same behavior as SAMPLE 100 PERCENT. FULLSCAN cannot be used with the SAMPLE option.

SAMPLE number { PERCENT | ROWS }

Specifies the percentage of the table or indexed view, or the number of rows to sample when collecting statistics for larger tables or views. Only integers are allowed for number whether it is PERCENT or ROWS. To use the default sampling behavior for larger tables or views, use SAMPLE number with PERCENT or ROWS. Microsoft SQL Server ensures a minimum number of values are sampled to ensure useful statistics. If the PERCENT, ROWS, or number option results in too few rows being sampled, SQL Server automatically corrects the sampling based on the number of existing rows in the table or view.

Note The default behavior is to perform a sample scan on the target table or indexed view. SQL Server automatically computes the required sample size.

RESAMPLE

Specifies that statistics will be gathered using an inherited sampling ratio for all existing statistics including indexes. If the sampling ratio results in too few rows being sampled, SQL Server automatically corrects the sampling based on the number of existing rows in the table or view.

ALL | COLUMNS | INDEX

Specifies whether the UPDATE STATISTICS statement affects column statistics, index statistics, or all existing statistics. If no option is specified, the UPDATE STATISTICS statement affects all statistics. Only one type (ALL, COLUMNS, or INDEX) can be specified per UPDATE STATISTICS statement.

NORECOMPUTE

Specifies that statistics that become out of date are not automatically recomputed. Statistics become out of date depending on the number of INSERT, UPDATE, and DELETE operations performed on indexed columns. When specified, this option causes SQL Server to disable automatic statistics rebuilding. To restore automatic statistics recomputation, reissue UPDATE STATISTICS without the NORECOMPUTE option or execute sp_autostats.

Important Disabling automatic statistics recomputation can cause the SQL Server query optimizer to choose a less optimal strategy for queries that involve the specified table.

Remarks

SQL Server keeps statistics about the distribution of the key values in each index and uses these statistics to determine which index(es) to use in query processing. Users can create statistics on nonindexed columns by using the CREATE STATISTICS statement. Query optimization depends on the accuracy of the distribution steps:

If there is significant change in the key values in the index, rerun UPDATE STATISTICS on that index.
If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated, use UPDATE STATISTICS.

To see when the statistics were last updated, use the STATS_DATE function.

Statistics can be created or updated on tables with computed columns only if the conditions are such that an index can be created on these columns. For more information about the requirements and restrictions on creating indexes on computed columns, see CREATE INDEX.

If you disable automatic statistics recomputation, you must manually update the statistical information.

Note The UPDATE STATISTICS statement reenables automatic statistical updating unless the NORECOMPUTE clause is specified.

Examples

A. Update all statistics for a single table

This example updates the distribution statistics for all indexes on the authors table.

UPDATE STATISTICS authors

B. Update only the statistics for a single index

This example updates only the distribution information for the au_id_ind index of the authors table.

UPDATE STATISTICS authors au_id_ind

C. Update statistics for specific statistics groups (collections) using 50 percent sampling

This example creates and then updates the statistics group for the au_lname and au_fname columns in the authors table.

CREATE STATISTICS anames

   ON authors (au_lname, au_fname)

   WITH SAMPLE 50 PERCENT

GO

-- Time passes. The UPDATE STATISTICS statement is then executed.

UPDATE STATISTICS authors(anames)

   WITH SAMPLE 50 PERCENT

GO

D. Update statistics for a specific statistics groups (collections) using FULLSCAN and NORECOMPUTE

This example updates the anames statistics group (collection) in the authors table, forces a full scan of all rows in the authors table, and turns off automatic statistics updating for the statistics group (collection).

UPDATE STATISTICS authors(anames)

   WITH FULLSCAN, NORECOMPUTE

GO

SQL Server 2000 and 2005 keep statistics on the tables and indexed views to enable it to optimize how the database engine runs the queries and improve performance. Theoretically these statistics are updated automatically. However, since we have found that manually updating the statistics can lead to dramatically better database performance, indicating that this isn’t always true.

To manually update the statistics for a single table the syntax is:

Update statistics '<table name>'

Example
UPDATE STATISTICS 'VDATA'

To update the statistics for an entire database run the stored procedure sp_updatestats

Example:
useSPC
sp_updatestats

The advantage of running the update statics query on individual tables is that you can only perform the operation on the relevant tables. If the Auto Update setting is not it should either be turned on or a stored procedure should setup to run once a week to update the statistics

To verify current status of the auto update statistics setting run the query:

sp_dboption '<database name>', 'UPDATE STATISTICS'

If the feature is not enabled you can enable it by running the query:
sp_dboption '<database name>', 'update statistics', 'on'

Common Table Expression

Common Table Expression

A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

A CTE can be used to:

· Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.

· Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.

· Reference the resulting table multiple times in the same statement.

Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.

CTEs can be defined in user-defined routines, such as functions, stored procedures, triggers, or views.

Structure of a CTE:

A CTE is made up of an expression name representing the CTE, an optional column list, and a query defining the CTE. After a CTE is defined, it can be referenced like a table or view can in a SELECT, INSERT, UPDATE, or DELETE statement. A CTE can also be used in a CREATE VIEW statement as part of its defining SELECT statement.

The basic syntax structure for a CTE is:

WITH expression_name [ ( column_name [,...n] ) ]

( CTE_query_definition )

The list of column names is optional only if distinct names for all resulting columns are supplied in the query definition.

The statement to run the CTE is:

SELECT <column_list>

FROM expression_name;

Recursive is the process in which the query executes itself. It is used to get results based on the output of base query. We can use CTE as Recursive CTE (Common Table Expression).

Here, the result of CTE is repeatedly used to get the final resultset. The following example will explain in detail where I am using AdventureWorks database and try to find hierarchy of Managers and Employees.

Introduction:
The common table expression is one of the new features in sql server 2005. It can be used instead of temp table or table variables in the stored procedures in the circumstances. Let's see CTE with some example queries.

Background:
Most of the developers while writing the stored procedures they create the temp tables or table variables. They need some table to store the temporary results in order to manipulate the data in the other tables based on this temp result.

The temp variables will be stored on the tempdb and it needs to be deleted in the tempdb database.

The table variable is best when compare with the temp tables. Because the table variable initially will be there in the memory for the certain limit of size and if the size increase then it will be moved to the temp database. However the scope of the table variable is only up to that program. When compare with table variable the CTE is best. It just store the result set like normal view.

CTE (Common Table Expression):
The CTE is one of the essential features in the sql server 2005.It just store the result as temp result set. It can be access like normal table or view. This is only up to that scope.

The syntax of the CTE is the following.

WITH name (Alias name of the retrieve result set fields)
AS
(
//Write the sql query here
)
SELECT * FROM name

Here the select statement must be very next to the CTE. The name is mandatory and the argument is an optional. This can be used to give the alias to the retrieve field of the CTE.

CTE 1: Simple CTE
WITH ProductCTE
AS( SELECT ProductID AS [ID],ProductName AS [Name],CategoryID AS [CID],UnitPrice AS [Price]
FROM Products
)SELECT * FROM ProductCTE

Here all the product details like ID, name, category ID and Unit Price will be retrieved and stored as temporary result set in the ProductCTE.

This result set can be retrieved like table or view.

CTE2:Simple CTE with alias
WITH ProductCTE(ID,Name,Category,Price)AS( SELECT ProductID,ProductName,CategoryID,UnitPrice
FROM Products
)SELECT * FROM ProductCTE

Here there are four fieds retrieves from the Products and the alias name have given in the arqument to the CTE result set name.

It also accepts like the following as it is in the normal select query.

WITH ProductCTE
AS( SELECT ProductID AS [ID],ProductName AS [Name],CategoryID AS [CID],UnitPrice AS [Price]
FROM Products
)SELECT * FROM ProductCTE

CTE 3: CTE joins with normal table
The result set of the CTE can be joined with any table and also can enforce the relationship with the CTE and other tables.

WITH OrderCustomer
AS( SELECT DISTINCT CustomerID FROM Orders
)SELECT C.CustomerID,C.CompanyName,C.ContactName,C.Address+', '+C.City AS [Address] FROM Customers C INNER JOIN OrderCustomer OC ON OC.CustomerID = C.CustomerID

Here the Ordered Customers will be placed in the CTE result set and it will be joined with the Customers details.

CTE 4: Multiple resultsets in the CTE
WITH MyCTE1
AS( SELECT ProductID,SupplierID,CategoryID,UnitPrice,ProductName FROM Products
),

MyCTE2AS( SELECT DISTINCT ProductID FROM "Order Details")SELECT C1.ProductID,C1.ProductName,C1.SupplierID,C1.CategoryID FROM MyCTE1 C1 INNER JOIN MyCTE2 C2 ON C1.ProductID = C2.ProductID

Here, there are two result sets that will be filtered based on the join condition.

CTE 5: Union statements in the CTE
WITH PartProdCateSaleAS(SELECT ProductID FROM Products WHERE CategoryID = (SELECT CategoryID FROM Categories WHERE CategoryName='Condiments')UNION ALL
SELECT ProductID FROM Products WHERE CategoryID = (SELECT CategoryID FROM Categories WHERE CategoryName='Seafood')
)SELECT OD.ProductID,SUM(OD.UnitPrice*OD.Quantity) AS [Total Sale] FROM "Order Details" OD INNER JOIN PartProdCateSale PPCS ON PPCS.ProductID = OD.ProductIDGROUP BY OD.ProductID

Normally when we combine the many result sets we create table and then insert into that table. But see here, we have combined with the union all and instead of table, here CTE has used.

CTE 6: CTE with identity column
WITH MyCustomCTE
   AS   (      SELECT CustomerID,row_number() OVER (ORDER BY CustomerID) AS iNo FROM
         Customers
   )SELECT * FROM MyCustomCTE

AdvantagesCan be referenced multiple times in a query.

Can be used to create recursive queries.
Can be used in place of a view in scenarios when the metadata definition need not be stored.
Improves readability.
CTEs help improve readability of the code without compromising performance.
They help improve maintainability of the code without compromising performance.
They make writing recursive code in T-SQL significantly easier than the previous SQL Server versions.

Fragmentation Checking

How to check Fragmentation on SQL Server 2005

I have been asked this question several times: Is there a tool in SQL Server where I can easily check database fragmentation? The answer, as with many other things, would be "Yes" and "No".

Fragmentation is a "natural" process in the database lifecycle. If the database is frequently updated via INSERT, UPDATE, or DELETE statements we can expect it to become fragmented over the time. If database indexes are fragmented, SQL Server query optimizer will take not-so-optimal decisions when using and index to resolve a query, affecting the overall query performance. SQL Server provides tools to check and fix database fragmentation but we need first to understand how to use these tools. This is something just a little bit more difficult than simply firing up Windows defrag.exe to check for filesystem fragmentation.

Let's start with some theory so we can better understand what fragmentation is and how it affects database performance. There are two different types of fragmentation in SQL Server: Internal and External. Internal fragmentation is the result of index pages taking up more space than needed. It is like having a book where some of the pages are left blank; we do not know what pages are blank until we read the entire book and same applies for SQL Server, which has to read all the pages in the index wasting extra-time and server resources in the empty pages. External fragmentation occurs when the pages are not contiguous on the index. Following the book analogy, it is like having a book where pages are not ordered in a logical way (page 1, then page 2, then page 3 and so on) causing you to go back and forward to compound the information and make sense of the reading. Heavily used tables that contains fragmented indexes will impact your database performance. If you are still unsure about what external and internal fragmentation means, refer to this article published on SQL Server Magazine by Kalen Delaney (although written for SQL Server 2000, definitions are still valid for SQL Server 2005).

In our minds we associate fragmentation to something bad that should be avoided at any cost but, is this always the case? Not with internal fragmentation. If your tables are frequently changed via UPDATE and INSERT operations, having a small amount of free space on the index or data pages (having a small amount of internal fragmentation) will cause a new page addition (page split) in order to allocate that new data. This leads ultimately to external fragmentation since the new added data page won't be probably adjacent to the original page. Internal fragmentation, therefore, can be desirable at low levels in order to avoid frequent page split, while external fragmentation, however, should always be avoided. Please understand that by 'low levels' I simply mean 'low levels'. The amount of free space that can be reserved on a index can be controlled using the Fill Factor.

It is also important to understand that by external fragmentation we do not mean filesystem fragmentation or disk fragmentation.

SQL Server 2005 introduces a new DMV (Dynamic Management View) to check index fragmentation levels: sys.dm_db_index_physical_stats. Although SQL Server 2005 still supports the SQL Server 2000 DBCC SHOWCONTING command, this feature will be removed on a future version of SQL Server. Here you can check the differences between both instructions when checking for fragmentation on the HumanResources.Employee table in the sample database AdventureWorks:

· DBCC SWOWCONTING example:

USE AdventureWorks;
GO
DBCC SHOWCONTIG ('HumanResources.Employee')
GO

DBCC SHOWCONTIG scanning 'Employee' table...
Table: 'Employee' (869578136); index ID: 1, database ID: 6
TABLE level scan performed.
- Pages Scanned................................: 7
- Extents Scanned..............................: 3
- Extent Switches..............................: 2
- Avg. Pages per Extent........................: 2.3
- Scan Density [Best Count:Actual Count].......: 33.33% [1:3]
- Logical Scan Fragmentation ..................: 14.29%
- Extent Scan Fragmentation ...................: 33.33%
- Avg. Bytes Free per Page.....................: 172.6
- Avg. Page Density (full).....................: 97.87%

· sys.dm_db_index_physical_stats DMV example:

USE AdventureWorks
GO
SELECT object_id, index_id, avg_fragmentation_in_percent, page_count
FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'), OBJECT_ID('HumanResources.Employee'), NULL, NULL, NULL);

In this last example I have selected only relevant information to show from the DMV, you will see that DMV can provide much more details about the index structure. In case you wanted to show fragmentation details for all the objects in the AdventureWorks database, the command would be as follows:

SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID('AdventureWorks'), NULL, NULL, NULL , NULL);

Please, refer to SQL Server 2005 Books Online for more information on sys.dm_db_index_physical_stats syntax.

How do we know if our database is fragmented? We have to pay attention to the avg_fragmentation_in_percent value. A value between 5-30% indicates moderate fragmentation, while any value over 30% indicates high fragmentation (book pages missing any order).

The avg_page_space_used_in_percent is another value that it is worth to look closely. This value represent the amount of spaced used in the indexes. A value below 75% is usually associated to internal fragmentation (more blank pages on our book than recommended).

In order to reduce fragmentation we will have to reorganize or rebuild the indexes. Choosing between reorganizing or rebuilding depends on the resultant values. For moderate fragmentation index reorganization will be enough, for heavily fragmented indexes a rebuild process is needed. The following table summarizes when to use each one .

Reference Values (in %)	Action	SQL statement
avg_fragmentation_in_percent > 5 AND < 30	Reorganize Index	ALTER INDEX REORGANIZE
avg_fragmentation_in_percent > 30	Rebuild Index	ALTER INDEX REBUILD

REORGANIZE statement is always executed online while REBUILD index is executed offline by default and can be optionally executed while database is accessed by users with the ONLINE statement, which can be a plus for 24x7 environments. This approach, which is only available in SQL Server 2005, has some limitations; refer to the ALTER INDEX statement in SQL Sever Books Online for more details.

If you are looking for an easy way to automate these processes the SQL Server Books Online reference for the sys.dm_db_index_physical_stats contains a sample script you can implements within minutes. This script will take care of reorganizing any index where avg_fragmentation_in_percent is below 30% and rebuilding any index where this values is over 30% (you can change this parameters for your specific needs). Add a new SQL Server Execute T-SQL statement task to your weekly or daily maintenance plan containing this script so you can keep you database fragmentation at optimum level.