Statistics can be generated with the ANALYZE statement or with the package DBMS_STATS.
The statistics generated include the following:
Table statistics
Number of rows
Number of blocks
Number of empty blocks
Average row length
Column statistics
Number of distinct values (NDV) in column
Number of nulls in column
Data distribution (histogram)
Index statistics
Number of leaf blocks
Levels
Clustering factor
Generating Statistics
Because the cost-based approach relies on statistics, you should generate statistics for all tables and clusters and all types of indexes accessed by your SQL statements before using the cost-based approach. If the size and data distribution of your tables change frequently, then you should generate these statistics regularly to ensure the statistics accurately represent the data in the tables.
Oracle generates statistics using the following techniques:
Estimation based on random data sampling
Exact computation
User-defined statistics collection methods
To perform an exact computation, Oracle requires enough space to perform a scan and sort of the table. If there is not enough space in memory, then temporary space may be required. For estimations, Oracle requires enough space to perform a scan and sort of only the rows in the requested sample of the table. For indexes, computation does not take up as much time or space, so it is best to perform a full computation.
Some statistics are always computed exactly, such as the number of data blocks currently containing data in a table or the depth of an index from its root block to its leaf blocks.
Use estimation for tables and clusters rather than computation, unless you need exact values. Because estimation rarely sorts, it is often much faster than computation, especially for large tables.
To estimate statistics, Oracle selects a random sample of data. You can specify the sampling percentage and whether sampling should be based on rows or blocks.
Row sampling reads rows without regard to their physical placement on disk. This provides the most random data for estimates, but it can result in reading more data than necessary. For example, in the worst case a row sample might select one row from each block, requiring a full scan of the table or index.
Block sampling reads a random sample of blocks and uses all of the rows in those blocks for estimates. This reduces the amount of I/O activity for a given sample size, but it can reduce the randomness of the sample if rows are not randomly distributed on disk. Block sampling is not available for index statistics.
When you generate statistics for a table, column, or index, if the data dictionary already contains statistics for the object, then Oracle updates the existing statistics. Oracle also invalidates any currently parsed SQL statements that access the object.
The next time such a statement executes, the optimizer automatically chooses a new execution plan based on the new statistics. Distributed statements issued on remote databases that access the analyzed objects use the new statistics the next time Oracle parses them.
When you associate a statistics type with a column or domain index, Oracle calls the statistics collection method in the statistics type if you analyze the column or domain index.