To get an idea of the shape of data, we can’t use Mean, Median or Mode.

Instead the Measurements of Dispersion are used, which are done with the following functions:

- Range
- Standard Deviation
- Variance

## Range

Simply this is the literal range, if we were measuring heights of a sample of men, the data might be:

`6.5, 6.0, 5.2, 5.0, 5.5, 5.6, 6.2, 5.4`

In the sample above that would be the max value of 6.5 minus the minimum value of 5.0.

To quickly get the min. and max. values from a dataset in Python:

```
import numpy as np
np.min([6.5, 6.0, 5.2, 5.0, 5.5, 5.6, 6.2, 5.4])
np.max([6.5, 6.0, 5.2, 5.0, 5.5, 5.6, 6.2, 5.4])
```

## Standard Deviation

The Standard Deviation gives a square root of the sample variance (see below.)

The standard deviation is calculated by taking the **Square Root** of the **Sum** of (x – the **sample mean**)^2 / n (**sample size**) – 1.

In **Python**, using **NumPy**, we can calculate the **standard deviation**

(**Note**: the calculation of standard deviation in NumPy does not default to n-1, but rather N, meaning the **population** size, not a sample size):

`np.std([6.5, 6.0, 5.2, 5.0, 5.5, 5.6, 6.2, 5.4]) # Population Std. Deviation`

To generate a standard deviation of the sample size (meaning divide by (n-1) instead of N), you use the following optional parameter in the method call:

`np.std([6.5, 6.0, 5.2, 5.0, 5.5, 5.6, 6.2, 5.4], ddof=1) # Sample Std. Deviation`

**Note**: If you use R instead of Python, the calculation for Standard Deviation uses Sample size, NOT Population size and is done as follows:

```
# Code from:
# https://stats.stackexchange.com/questions/25956/what-formula-is-used-for-standard-deviation-in-r
> #sd in R
> sd1 <- sd(x)
>
> #self-written sd
> sd2 <- sqrt(sum((x - mean(x))^2) / (n - 1))
>
> #comparison
> c(sd1, sd2) #:-)
```

## Variance

To calculate the variance we need to know the mean. The mean is used in the calculation. Variance is the sum of distances from the mean – which is calculated by summing the square of each point to the mean.

Var(X) = E[X-*μ*]^2

Remember that the *μ* is the population mean.

Using Python to calculate the variance can be done with the NumPy library like so:

```
import numpy as np
np.var([6.5, 6.0, 5.2, 5.0, 5.5, 5.6, 6.2, 5.4]) # Population Variance
np.var([6.5, 6.0, 5.2, 5.0, 5.5, 5.6, 6.2, 5.4], ddof=1) # Sample Variance
```

## Sample vs Population Variance

Sample variance takes the Sum of (x – sample mean) / n (sample size) -1

Population variance is the measure of the sum of (x – population mean) / N (population size)

## No responses yet