Outlier analysis is a critical element of statistical data processing. Outliers refer to any data that deviate from average trends, which is especially relevant in regression studies. Finding such outliers allows assessing whether the overall data are accurate or provides insights into particular cases that do not distort the overall data but complement it. In other words, studying outliers is of practical value to the study.
There are several statistical models for detecting such outliers. Statistical methods find such anomalies either by using quartile formulas or one-and-a-half times the interval or by making assumptions about the normal distribution of the data. Distance-based methods examine the proximity of a point to the center of the object of study through radius: if the number of neighboring points in the vicinity of a point is less than a threshold value, this indicates an anomaly.
Outliers in areas with low density are determined using the same name method: if the density of a point is lower than the density of its neighbors, it indicates an outlier. Finally, the deviation-based method considers outliers as objects that deviate from the characteristic description of the general trend. In other words, when such an object is removed, the variance of the whole sample will tend to decrease.