Kurtosis using R

Dhruv Saksena
3 min readSep 1, 2023

--

For computing statistics on a dataset we will be using psych package from R

pacman::p_load(pacman, dplyr, GGally, ggplot2, ggthemes, ggvis, httr,
lubridate, plotly, rio, rmarkdown, shiny, stringr, tidyr)

library(datasets)

head(iris)

p_load(psych)

describe(iris$Sepal.Length)

describe(iris)

When we execute the above script following is what we get in console

This gives us more detailed statistics over a dataset

Skewness is computed against the distribution of a dataset. Specifically, it quantifies the degree of asymmetry in the probability distribution of the data. Skewness measures how the data is distributed around its mean.

The formula for calculating skewness involves the mean, median, and standard deviation of the dataset. Here’s the formula again for reference:

Skewness = (3 * (mean — median)) / standard deviation

In this formula:

  • Mean: The arithmetic average of the dataset.
  • Median: The middle value of the dataset when it’s sorted. Half the data points are above the median, and half are below.
  • Standard Deviation: A measure of the dispersion or spread of the dataset.

By comparing the mean, median, and standard deviation, skewness provides insight into the shape of the dataset’s distribution:

  • If the mean is greater than the median and the tail of the distribution is stretched to the right (longer right tail), the data is positively skewed.
  • If the mean is less than the median and the tail of the distribution is stretched to the left (longer left tail), the data is negatively skewed.
  • If the mean and median are approximately equal and the distribution is symmetric, the skewness is close to zero, indicating no significant skew.

Skewness doesn’t provide information about the specific type of distribution (e.g., normal, exponential, etc.), but rather about the departure from symmetry. It’s a useful tool for understanding the general shape of data and for identifying potential issues such as outliers or non-normality.

Kurtosis is a statistical measure that describes the “tailedness” or the degree of heaviness of the tails in the probability distribution of a dataset. It provides information about the shape of the distribution’s tails and the presence of outliers.

In other words, kurtosis measures the relative amount of data in the tails compared to the rest of the distribution. A high kurtosis value indicates that the dataset has heavy tails with more data points in the tails than a normal distribution, while a low kurtosis value indicates lighter tails.

There are different ways to define and calculate kurtosis, but one common formula for sample kurtosis is:

Sample Kurtosis = (Σ(xi — x̄)⁴ / n) / s⁴ — 3

Where:

  • Σ is the sum of all data points
  • xi is each individual data point
  • x̄ is the mean of the dataset
  • n is the number of data points
  • s is the standard deviation of the dataset

Interpretation of kurtosis:

  • If Sample Kurtosis > 0: The distribution has heavier tails than a normal distribution (leptokurtic). This means it has more data in the tails, which can indicate the presence of outliers.
  • If Sample Kurtosis < 0: The distribution has lighter tails than a normal distribution (platykurtic). The tails are less extreme.
  • If Sample Kurtosis = 0: The distribution has tails similar to a normal distribution (mesokurtic). This does not mean the distribution is necessarily normal; it only suggests that the tails’ heaviness is similar to a normal distribution.

It’s important to note that kurtosis, like skewness, provides information about the shape of the distribution, but it doesn’t describe the specific type of distribution (e.g., normal, exponential, etc.). Additionally, interpretation of kurtosis can be context-dependent and may vary based on the domain and the nature of the data being analyzed.

--

--