SQL Server UPDATE STATISTICS
You can add statistics on columns that don't have statistics in order to boost query performance. For example, you can create statistics for columns in a composite index other than the first column (which is the default for AUTO CREATE STATISTICS), and for columns that are not indexed.
The SQL Server Query Optimizer has the ability to use statistics on columns that are not indexed in order to create more optimal execution plans. Doing so can increase performance without having the overhead of indexes. In fact, sometimes it is more efficient to use column statistics instead of an index to optimize query performance. The only way to know this for sure is to make controlled experiments and see what works best for your situation.
Generally, column statistics, if needed by the Query Optimizer, are generated automatically by SQL Server when queries are being optimized by the Query Optimizer. For example, column statistics are automatically created when there are currently no statistics available for the column being used as a search argument in a WHERE clause or a JOIN clause. There are two cases when SQL Server will not automatically create column statistics, even when they fit the criteria described above. These include: when the costs needed to create the column statistics are larger than the costs of the query plan; and when SQL Server is too busy.
Another way to identify non-indexed columns to add statistics to is to use the Index Wizard. It can identify columns that can benefit from column statistics, and add them for you automatically if you wish.
You use the CREATE STATISTICS command to manually create statistics on columns.
*****
One way to tell if the statistics for a column are missing or outdated, or if the accuracy of current statistics is not ideal (they may not be ideal if the sample size of the statistics are not high enough), is to run a graphical execution plan of a query as an estimated plan and as a real plan. In other words, you will run both an estimated and an actual graphical execution plan using Query Analyzer, and then compare the results.
When you run an estimated graphical execution plan for a query, you get results, such as the estimated row count returned. When you run an actual graphical execution plan for a query, you get the actual row count returned. If the statistics for the column(s) used by the query are current and accurate, then the estimated row count and the actual row count should be very similar. If they are wildly different, this can mean three things. Either the statistics are missing, out of date, or not accurate enough.
First, check to see if the statistics are missing. If they are, they can easily be added. Second, if statistics do exist, then update them to see if this resolves the problem. Third, if the first two suggestions don't work, then the statistics may not be accurate enough.
By default, when a table is less than 8 MB, the sample used to create the column statistics is the entire table. So for tables of less than 8 MB, in theory the statistics created can't be any better than they are. So if your table is less than 8 MB, and updating the statistics doesn't resolve the issue of a large difference between the estimated and actual row count, then nothing can. In a case like this, your only option, assuming there is a performance problem, is to use a hint to override what the Query Optimizer is doing wrong.
But if the table in question is greater than 8 MB, and when SQL Server automatically creates statistics for a table, it does not examine every row. Instead, it takes a sample of the available rows. This sample may or may not be good enough to produce accurate enough statistics. One way to get better statistics is to manually update the statistics for the table using the UPDATE STATISTICs command. This command has two options that allow you to get a better sampling of the rows in your table.
First, you can use the FULLSCAN option. This will force every row in the table or index to be fully examined and used to build the column statistics. As you might guess, using this option can increase the time it takes to update the statistics, which could hurt performance elsewhere on your server, especially if the table is huge.
As an alternative to the FULLSCAN option, you can use the SAMPLE option. This allows you to control how large a sample of rows to sample. For example, you might specify that only 50 percent of the rows are to be sampled. This is less than what a FULLSCAN will produce, but more than what a default sample will produce, offering a compromise in accuracy and resources needed to perform the sample.
If you find that the statistics for one or more of our tables is not accurate enough for your needs, you may need to schedule running the UPDATE STATISTICS command at regular intervals using an appropriate sampling level. Only through testing will you know what sampling level is best for your situation.
*****
Statistics are only maintained on the first column of a composite index. Because of this, SQL Server's Query Optimizer might not make as good use of composite indexes as it could. If you suspect that a composite index is not being used as you expected it would be, you will want first to verify that the index is not being used by examining the query's execution plan. If you see that it is not being used, and it should be, consider adding a hint to force the index to be used, then see if using the index really helps performance. If it does, keep the hint. If it does not help performance, drop the hint and the composite index that is not being used, assuming that it is not needed by another query.
*****
If you want to find out if an index has had its indexes updated or not, you can use the DBCC SHOW_STATISTICS command, like this:
DBCC SHOW_STATISTICS (table_name , index_name)
This command will not only tell you when statistics were last updated, but also provide you information on the kind of statistics that has been collected for the index you are examining
*****
If you have a performance issue with a query, the very first step you should take before analyzing it is to update the statistics on all the tables in your database. This is because many query performance issues are due to outdated statistics, and updating them before troubleshooting query performance may save you a lot of time. If the query is still giving you trouble after updating the statistics, then you can begin your troubleshooting.
*****
According to a SQL Server tuning expert at Microsoft (one of their best), it was suggested to me that if your SQL Server maintenance window allows for it, that you should update statistics for all tables and for all databases every night. This is because the auto-update option is less than perfect, and accurate and up-to-date statistics is so important to overall performance of SQL Server. Of course, if this is not possible because of time constraints, then you will have to perform this step less often, or even not at all, and rely on the auto-update feature. But if you do have this window, you should take advantage of it.
*****
If you upgrade to SQL Server 2000 from SQL Server 7.0, you should update all of the statistics in all your databases using either UPDATE STATISTICS or sp_updatestats. This is because statistics are not automatically updated during the conversion process.
If you don't manually update the statistics, then you may find that many of the queries are running slowly because of incorrect statistics, and incorrectly blame SQL Server 2000 as being slower than SQL Server 7.0, which is just not the case.
CREATE STATISTICS
Creates a histogram and associated density groups (collections) over the supplied column or set of columns.
Syntax
CREATE STATISTICS statistics_name
ON { table | view } ( column [ ,...n ] )
[ WITH
[ [ FULLSCAN
| SAMPLE number { PERCENT | ROWS } ] [ , ] ]
[ NORECOMPUTE ]
]
ON { table | view } ( column [ ,...n ] )
[ WITH
[ [ FULLSCAN
| SAMPLE number { PERCENT | ROWS } ] [ , ] ]
[ NORECOMPUTE ]
]
Arguments
statistics_name
Is the name of the statistics group to create. Statistics names must conform to the rules for identifiers.
table
Is the name of the table on which to create the named statistics. Table names must conform to the rules for identifiers. table is the table with which the column is associated. Specifying the table owner name is optional. Statistics can be created on tables in another database by specifying a qualified database name.
view
Is the name of the view on which to create the named statistics. A view must have a clustered index before statistics can be created on it. View names must conform to the rules for identifiers. view is the view with which the column is associated. Specifying the view owner name is optional. Statistics can be created on views in another database by specifying a qualified database name.
column
Is the column or set of columns on which to create statistics. Computed columns and columns of the ntext, text, or image data types cannot be specified as statistics columns.
n
Is a placeholder indicating that multiple columns can be specified.
FULLSCAN
Specifies that all rows in table should be read to gather the statistics. Specifying FULLSCAN provides the same behavior as SAMPLE 100 PERCENT. This option cannot be used with the SAMPLE option.
SAMPLE number { PERCENT | ROWS }
Specifies that a percentage, or a specified number of rows, of the data should be read using random sampling to gather the statistics. number can be only an integer: if PERCENT, number should be from 0 through 100; if ROWS, number can be from 0 to the n total rows.
This option cannot be used with the FULLSCAN option. If no SAMPLE or FULLSCAN option is given, an automatic sample is computed by Microsoft® SQL Server™.
NORECOMPUTE
Specifies that automatic recomputation of the statistics should be disabled. If this option is specified, SQL Server continues to use previously created (old) statistics even as the data changes. The statistics are not automatically updated and maintained by SQL Server, which may produce suboptimal plans.
Remarks
Only the table owner can create statistics on that table. The owner of a table can create a statistics group (collection) at any time, whether or not there is data in the table.
CREATE STATISTICS can be executed on views with clustered index, or indexed views. Statistics on indexed views are used by the optimizer only if the view is directly referenced in the query and the NOEXPAND hint is specified for the view. Otherwise, the statistics are derived from the underlying tables before the indexed view is substituted into the query plan. Such substitution is supported only on Microsoft SQL Server 2000 Enterprise and Developer Editions.
Permissions
CREATE STATISTICS permissions default to members of the sysadmin fixed server role and the db_ddladmin and db_owner fixed database roles and the table owner, and are not transferable.
A. Use CREATE STATISTICS with SAMPLE number PERCENT
This example creates the names statistics group (collection), which calculates random sampling statistics on five percent of the CompanyName and ContactName columns of the Customers table.
CREATE STATISTICS names
ON Customers (CompanyName, ContactName)
WITH SAMPLE 5 PERCENT
GO
B. Use CREATE STATISTICS with FULLSCAN and NORECOMPUTE
This example creates the names statistics group (collection), which calculates statistics for all rows in the CompanyName and ContactName columns of the Customers table and disables automatic recomputation of statistics.
CREATE STATISTICS names
ON Northwind..Customers (CompanyName, ContactName)
WITH FULLSCAN, NORECOMPUTE
GO
DROP STATISTICS
Drops statistics for multiple collections within the specified tables (in the current database).
Syntax
DROP STATISTICS table.statistics_name | view.statistics_name [ ,...n ]
Arguments
table | view
Is the name of the target table or indexed view for which statistics should be dropped. Table and view names must conform to the rules for identifiers. For more information, see Using Identifiers. Specifying the table or view owner name is optional.
statistics_name
Is the name of the statistics group to drop. Statistics names must conform to the rules for identifiers.
n
Is a placeholder indicating that more than one statistics_name group (collection) can be specified.
Remarks
Be careful when dropping statistics because dropping statistics may affect the plan chosen by the query optimizer.
Examples
DROP STATISTICS authors.anames, titles.tnames
SP for Create Statistics
sp_createstats
Syntax
sp_createstats [ [ @indexonly = ] 'indexonly' ]
[ , [ @fullscan = ] 'fullscan' ]
[ , [ @norecompute = ] 'norecompute' ]
[ , [ @fullscan = ] 'fullscan' ]
[ , [ @norecompute = ] 'norecompute' ]
Arguments
[@indexonly =] 'indexonly'
Specifies that only the columns participating in an index should be considered for statistics creation. indexonly is char(9), with a default of NO.
[@fullscan =] 'fullscan'
Specifies that the FULLSCAN option is used with the CREATE STATISTICS statement. If fullscan is omitted, Microsoft® SQL Server™ performs a default sample scan. fullscan is char(9), with a default of NO.
[@norecompute =] 'norecompute'
Specifies that automatic recomputation of statistics is disabled for the newly created statistics. norecompute is char(12) with a default of NO.
Return Code Values
0 (success) or 1 (failure)
Result Sets
None
Permissions
Permissions default to members of the sysadmin fixed server role, the db_owner fixed database role, and the owner of the objects.
Examples
This example creates statistics for all eligible columns for all user tables in the current database.
EXEC sp_createstats
This example creates statistics for only the columns participating in an index.
EXEC sp_createstats 'indexonly'
UPDATE STATISTICS
Updates information about the distribution of key values for one or more statistics groups (collections) in the specified table or indexed view.
Syntax
UPDATE STATISTICS table | view
[
index
| ( statistics_name [ ,...n ] ) ]
[ WITH
[
[ FULLSCAN ]
| SAMPLE number { PERCENT | ROWS } ]
| RESAMPLE
]
[ [ , ] [ ALL | COLUMNS | INDEX ]
[ [ , ] NORECOMPUTE ]
]
[
index
| ( statistics_name [ ,...n ] ) ]
[ WITH
[
[ FULLSCAN ]
| SAMPLE number { PERCENT | ROWS } ]
| RESAMPLE
]
[ [ , ] [ ALL | COLUMNS | INDEX ]
[ [ , ] NORECOMPUTE ]
]
Arguments
table | view
Is the name of the table or indexed view for which to update statistics. Table or view names must conform to the rules for identifiers. For more information, see Using Identifiers. Because index names are not unique within each database, table or view must be specified. Specifying the database, table, or view owner is optional. Indexed views are supported only on Microsoft® SQL Server™ 2000, Enterprise Edition.
index
Is the index for which statistics are being updated. Index names must conform to the rules for identifiers. If index is not specified, the distribution statistics for all indexes in the specified table or indexed view are updated. To see a list of index names and descriptions, execute sp_helpindex with the table or view name.
statistics_name
Is the name of the statistics group (collection) to update. Statistics names must conform to the rules for identifiers. For more information about creating statistics groups, see CREATE STATISTICS.
n
Is a placeholder indicating that multiple statistics_name groups can be specified.
FULLSCAN
Specifies that all rows in table or view should be read to gather the statistics. FULLSCAN provides the same behavior as SAMPLE 100 PERCENT. FULLSCAN cannot be used with the SAMPLE option.
SAMPLE number { PERCENT | ROWS }
Specifies the percentage of the table or indexed view, or the number of rows to sample when collecting statistics for larger tables or views. Only integers are allowed for number whether it is PERCENT or ROWS. To use the default sampling behavior for larger tables or views, use SAMPLE number with PERCENT or ROWS. Microsoft SQL Server ensures a minimum number of values are sampled to ensure useful statistics. If the PERCENT, ROWS, or number option results in too few rows being sampled, SQL Server automatically corrects the sampling based on the number of existing rows in the table or view.
Note The default behavior is to perform a sample scan on the target table or indexed view. SQL Server automatically computes the required sample size.
RESAMPLE
Specifies that statistics will be gathered using an inherited sampling ratio for all existing statistics including indexes. If the sampling ratio results in too few rows being sampled, SQL Server automatically corrects the sampling based on the number of existing rows in the table or view.
ALL | COLUMNS | INDEX
Specifies whether the UPDATE STATISTICS statement affects column statistics, index statistics, or all existing statistics. If no option is specified, the UPDATE STATISTICS statement affects all statistics. Only one type (ALL, COLUMNS, or INDEX) can be specified per UPDATE STATISTICS statement.
NORECOMPUTE
Specifies that statistics that become out of date are not automatically recomputed. Statistics become out of date depending on the number of INSERT, UPDATE, and DELETE operations performed on indexed columns. When specified, this option causes SQL Server to disable automatic statistics rebuilding. To restore automatic statistics recomputation, reissue UPDATE STATISTICS without the NORECOMPUTE option or execute sp_autostats.
Important Disabling automatic statistics recomputation can cause the SQL Server query optimizer to choose a less optimal strategy for queries that involve the specified table.
Remarks
SQL Server keeps statistics about the distribution of the key values in each index and uses these statistics to determine which index(es) to use in query processing. Users can create statistics on nonindexed columns by using the CREATE STATISTICS statement. Query optimization depends on the accuracy of the distribution steps:
- If there is significant change in the key values in the index, rerun UPDATE STATISTICS on that index.
- If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated, use UPDATE STATISTICS.
To see when the statistics were last updated, use the STATS_DATE function.
Statistics can be created or updated on tables with computed columns only if the conditions are such that an index can be created on these columns. For more information about the requirements and restrictions on creating indexes on computed columns, see CREATE INDEX.
If you disable automatic statistics recomputation, you must manually update the statistical information.
Note The UPDATE STATISTICS statement reenables automatic statistical updating unless the NORECOMPUTE clause is specified.
Examples
A. Update all statistics for a single table
This example updates the distribution statistics for all indexes on the authors table.
UPDATE STATISTICS authors
B. Update only the statistics for a single index
This example updates only the distribution information for the au_id_ind index of the authors table.
UPDATE STATISTICS authors au_id_ind
C. Update statistics for specific statistics groups (collections) using 50 percent sampling
This example creates and then updates the statistics group for the au_lname and au_fname columns in the authors table.
CREATE STATISTICS anames
ON authors (au_lname, au_fname)
WITH SAMPLE 50 PERCENT
GO
-- Time passes. The UPDATE STATISTICS statement is then executed.
UPDATE STATISTICS authors(anames)
WITH SAMPLE 50 PERCENT
GO
D. Update statistics for a specific statistics groups (collections) using FULLSCAN and NORECOMPUTE
This example updates the anames statistics group (collection) in the authors table, forces a full scan of all rows in the authors table, and turns off automatic statistics updating for the statistics group (collection).
UPDATE STATISTICS authors(anames)
WITH FULLSCAN, NORECOMPUTE
GO
SQL Server 2000 and 2005 keep statistics on the tables and indexed views to enable it to optimize how the database engine runs the queries and improve performance. Theoretically these statistics are updated automatically. However, since we have found that manually updating the statistics can lead to dramatically better database performance, indicating that this isn’t always true.
To manually update the statistics for a single table the syntax is:
Update statistics '<table name>'
Example
UPDATE STATISTICS 'VDATA'
UPDATE STATISTICS 'VDATA'
To update the statistics for an entire database run the stored procedure sp_updatestats
Example:
useSPC
sp_updatestats
useSPC
sp_updatestats
The advantage of running the update statics query on individual tables is that you can only perform the operation on the relevant tables. If the Auto Update setting is not it should either be turned on or a stored procedure should setup to run once a week to update the statistics
To verify current status of the auto update statistics setting run the query:
sp_dboption '<database name>', 'UPDATE STATISTICS'
If the feature is not enabled you can enable it by running the query:
sp_dboption '<database name>', 'update statistics', 'on'
sp_dboption '<database name>', 'update statistics', 'on'