Oversampling creates a data set that contains a specific ratio of a selected data item. For example, it can be used to guarantee ...

Oversampling creates a data set that contains a specific ratio of a selected data item. For example, it can be used to guarantee that you have an equal number of males and females in your data, even if there is a large difference in the ratio in the source data. For this method, you specify the data item you wish to balance, the ratio you want to have for this item in the resulting data set, and the maximum number of rows that the resulting set will contain. Rows not containing the specified data item are randomly selected to fill the data set to the size you specify, if there are enough rows to do so. The result set is placed in a new worksheet. {0}This method is typically used when the data item of interest occurs very rarely in the source data. Increasing the distribution of such a state can often improve mining results. Testing should be performed on a data set that has not been balanced using this method.