How would you explain the difference between correlation and covariance? - Cross Validated
An explanation of Variance, Covariance and Correlation in rigorous yet clear terms Exploring the relationship between Correlation and the. Variance, covariance, and correlation are all used in statistics to measure and communicate the relationships between multiple variables. This relationship is very important both in probability and statistics. As these terms suggest, covariance and correlation measure a certain kind of dependence .
Squaring before calculating Expectation and after calculating Expectation yield very different results! The difference between these results is the Variance.
What is really interesting is the only time these answers are the same is if the Sampler only outputs the same value each time, which of course intuitively corresponds to the idea of there being no Variance.
The greater the actual variation in the values coming from the Random Variable is the greater the different between the two values used to calculate Variance will be. At this point we have a very strong, and very general sense of how we can measure Variance that doesn't rely on any assumptions our intuition may have about the behavior of the Random Variable.
Covariance and correlation
Covariance - measuring the Variance between two variables Mathematically squaring something and multiplying something by itself are the same. Because of this we can rewrite our Variance equation as: But now we can ask the question "What if one of the Xs where another Random Variable? If Variance is a measure of how a Random Variable varies with itself then Covariance is the measure of how one variable varies with another.
Correlation - normalizing the Covariance Covariance is a great tool for describing the variance between two Random Variables. But this new measure we have come up with is only really useful when talking about these variables in isolation.
Covariance and correlation - Wikipedia
Correlation between different Random Variables produce by the same event sequence The only real difference between the 3 Random Variables is just a constant multiplied against their output, but we get very different Covariance between any pairs. The problem is that we are no longer accounting for the Variance of each individual Random Variable.
The way we can solve this is to add a normalizing term that takes this into account. Putting everything we've found together we arrive at the definition of Correlation: The short answer is The Cauchy-Schwarz inequality. Exploring the relationship between Correlation and the Cauchy-Schwarz inequality deserves its own post to really develop the intuition.
For now it is only important to realize that dividing Covariance by the square root of the product of the variance of both Random Variables will always leave us with values ranging from -1 to 1.
Hence when an observation is further from the mean, this operation will be given a higher value. As gung points out in the comments, this is frequently called the cross product perhaps a useful example to bring back up if one were introducing basic matrix algebra for statistics. Take note of what happens when multiplying, if two observations are both a large distance above the mean, the resulting observation will have an even larger positive value the same is true if both observations are a large distance below the mean, as multiplying two negatives equals a positive.
Also note that if one observation is high above the mean and the other is well below the mean, the resulting value will be large in absolute terms and negative as a positive times a negative equals a negative number.
Finally note that when a value is very near the mean for either observation, multiplying the two values will result in a small number.
Difference Between Covariance and Correlation
Again we can just present this operation in a table. We can see all the seperate elements of what a covariance is, and how it is calculated come into play. Now, the covariance in and of itself does not tell us much it can, but it is needless at this point to go into any interesting examples without resorting to magically, undefined references to the audience. In a good case scenario, you won't really need to sell why we should care what the covariance is, in other circumstances, you may just have to hope your audience is captive and will take your word for it.Lighting up Statistics - Correlation/Covariance
But, continuing on to develop the difference between what the covariance is and what the correlation is, we can just refer back to the formula for correlation. If you need to define the variance itself, you could just say that the variance is the same thing as the covariance of a series with itself i.
And all the same concepts that you introduced with the covariance apply i. Maybe note here that a series can not have a negative variance as well which should logically follow from the math previously presented.
So we are dividing the covariance we just calculated by the product of the variances of each series.
So again, I'm a hypocrite and resort to some, take my word for it, but at this point we can introduce all the reasons why we use the correlation coefficient. One can then relate these math lessons back to the heuristics that have been given in the other statements, such as Peter Flom's response to one of the other questions.