Squaring makes each term positive so that values above the mean do not cancel values below the mean. Squaring adds more weighting to the larger differences, and in many cases this extra weighting is appropriate since points further from the mean may be more significant.
The mathematics are relatively manageable when using this measure in subsequent statisitical calculations. Because the differences are squared, the units of variance are not the same as the units of the data. Therefore, the standard deviation is reported as the square root of the variance and the units then correspond to those of the data set. The calculation and notation of the variance and standard deviation depends on whether we are considering the entire population or a sample set.
They focus on ease of mathematical calculations which is nice but by no means fundamental or on properties of the Gaussian Normal distribution and OLS. Around Gauss started with least squares and variance and from those derived the Normal distribution--there's the circularity. A truly fundamental reason that has not been invoked in any answer yet is the unique role played by the variance in the Central Limit Theorem. Another is the importance in decision theory of minimizing quadratic loss.
Show 21 more comments. Active Oldest Votes. The benefits of squaring include: Squaring always gives a positive value, so the sum will not be zero. Squaring emphasizes larger differences—a feature that turns out to be both good and bad think of the effect outliers have.
Improve this answer. Pitouille 1, 4 4 silver badges 16 16 bronze badges. Tony Breyal Tony Breyal 3, 1 1 gold badge 17 17 silver badges 13 13 bronze badges. I wasn't implying that anything about absolute values in that statement. The size in each dimension is the difference from the mean for that sample.
Show 5 more comments. Rich Rich 4, 1 1 gold badge 20 20 silver badges 20 20 bronze badges. This makes analytical optimization more difficult. Consider the 1 dimension case; you can express the minimizer of the squared error by the mean: O n operations and closed form. You can express the value of the absolute error minimizer by the median, but there's not a closed-form solution that tells you what the median value is; it requires a sort to find, which is something like O n log n.
Least squares solutions tend to be a simple plug-and-chug type operation, absolute value solutions usually require more work to find. Median does not require sorting. Show 3 more comments. Reed Copsey Reed Copsey 1, 8 8 silver badges 5 5 bronze badges. It's essentially a Pythagorean equation. Add a comment. Neil G Neil G The sd is not always the best statistic. No, it is already always positive. This is what I mean. After all you want the best model. Show 7 more comments.
However, in the end it appears only to rephrase the question without actually answering it: namely, why should we use the Euclidean L2 distance? However, what pushed them over the top I believe was Galton's regression theory at which you hint and the ability of ANOVA to decompose sums of squares--which amounts to a restatement of the Pythagorean Theorem, a relationship enjoyed only by the L2 norm. Thus the SD became a natural omnibus measure of spread advocated in Fisher's "Statistical Methods for Research Workers" and here we are, 85 years later.
One of them related to Student is its independence of the mean in the normal case , which of course is a restatement of orthogonality, which gets us right back to L2 and the inner product. In 1-D it's hard to understand why squaring the difference is seen as better. But in multiple dimensions or even just 2 one can easily see that Euclidean distance squaring is preferable to Manhattan distance sum of absolute value of differences. Also, where can I read more about this? Show 4 more comments.
The squared formulation also naturally falls out of parameters of the Normal Distribution. Michael Hardy 6, 1 1 gold badge 20 20 silver badges 36 36 bronze badges.
Why-is-it-so-cool-to-square-numbers-in-terms-of-finding-the-standard-deviation The take away message is that using the square root of the variance leads to easier maths.
Community Bot 1. Computers do all the hard work anyway. Michael Hardy Michael Hardy 6, 1 1 gold badge 20 20 silver badges 36 36 bronze badges.
Can you please point to a reference? But I'll see if I can find something on that. Eric Suh Eric Suh 2 2 silver badges 3 3 bronze badges.
RockScience RockScience 2, 4 4 gold badges 27 27 silver badges 45 45 bronze badges. Additionally, penalisation of the coefficients, such as L2, will resolve the uniqueness problem, and the stability problem to a degree as well. Frank Harrell Frank Harrell The variance is half the mean square over all the pairwise differences between values, just as the Gini mean difference is based on the absolute values of all the pairwise difference.
Standard deviation is the right way to measure dispersion if you assume normal distribution. And a lot of distributions and real data are an approximately normal. I'll think about some better word. References: Gorard, S. Revisiting a year-old debate: the advantages of the mean deviation , British Journal of Educational Studies, 53 , 4, pp. Gorard, S. Jen Jen 1 1 silver badge 3 3 bronze badges. To me this could mean two things: The width of a sampling distribution The accuracy of a given estimate For point 1 there is no particular reason to use the standard deviation as a measure of spread, except for when you have a normal sampling distribution.
Calculating distance What's the distance from point 0 to point 5? How about the distance from point 0, 0 to point 3, 4? Regardless of the distribution, the mean absolute deviation is less than or equal to the standard deviation. MAD understates the dispersion of a data set with extreme values, relative to standard deviation.
Mean Absolute Deviation is more robust to outliers i. Geometrically speaking, if the measurements are not orthogonal to each other i. Aaron Hall Aaron Hall 5 5 silver badges 18 18 bronze badges. Eric L. Michelsen Eric L. Michelsen 59 1 1 silver badge 1 1 bronze badge. Superpronker Superpronker 5 5 silver badges 6 6 bronze badges. The statistical mean minimizes the variance MSE , i. Variances are additive for independent variables, i.
The same does not hold for the mean absolute deviation this was already mentioned by eric-l-michelsen. For mean absolute deviations, there is no approximation rule for error propagation propagation of uncertainty. Central moments are shape descriptors and with an increasing number of moments you can describe a distribution with increasing accuracy.
Securities that are close to their means are seen as less risky, as they are more likely to continue behaving as such. Securities with large trading ranges that tend to spike or change direction are riskier.
In investing, risk in itself is not a bad thing, as the riskier the security, the greater potential for a payout. The standard deviation and variance are two different mathematical concepts that are both closely related. The variance is needed to calculate the standard deviation. These numbers help traders and investors determine the volatility of an investment and therefore allows them to make educated trading decisions. Financial Analysis. Tools for Fundamental Analysis.
Portfolio Management. Advanced Technical Analysis Concepts. Risk Management. Your Privacy Rights. To change or withdraw your consent choices for Investopedia. At any time, you can update your settings through the "EU Privacy" link at the bottom of any page.
These choices will be signaled globally to our partners and will not affect browsing data. We and our partners process data to: Actively scan device characteristics for identification.
I Accept Show Purposes. Your Money. Personal Finance. Your Practice. Popular Courses. Financial Analysis How to Value a Company. Key Takeaways Standard deviation looks at how spread out a group of numbers is from the mean, by looking at the square root of the variance. The variance measures the average degree to which each point differs from the mean—the average of all data points.
0コメント