Excel Unplugged

ggplot2 basics: learn ggplot2 in 15 minutes!

ggplot2 Basics

ggplot2 overview

Ggplot2 is the most famous R library for graphics, enabling you to create beautiful print quality and publication-ready data visualizations. If you’re not familiar with R, start with R basics: learn R in 15 minutes!.

It is based on grammar of graphics idea, which basically means each part of code stands for a component or a layer. We add components together using the + sign. Consequently the basic structure of any plot looks like this:

ggplot(data = , aes(x =, y = )) + geom_name()
  1. ggplot(): creates an object and assigns columns from data dataframe to x and y axes
  2. geom_name(): plots the data in desired geometry, for example point, histogram, boxplot. Some examples are:

geom_histogram() for histograms

similarly, geom_bar() or geom_col() for bar charts

geom_boxplot() for boxplots

geom_point() for points

geom_line() or geom_path() for lines

and likewise geom_smooth() for trend lines

Let’s look at some examples. You can download the content of this article in the R file here and follow along.

Point plot with ggplot2

In all examples, we’ll use the R’s famous Iris Dataset, which contains information on Iris plant. It specifies measurements for four features measured for three variants of Iris flower (setosa, virginica, versicolor). All measurements are given in centimeters.

Here is a preview of the Iris Dataset.

> iris

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species

1            5.1         3.5          1.4         0.2     setosa

2            4.9         3.0          1.4         0.2     setosa

3            4.7         3.2          1.3         0.2     setosa

4            4.6         3.1          1.5         0.2     setosa

5            5.0         3.6          1.4         0.2     setosa

6            5.4         3.9          1.7         0.4     setosa

7            4.6         3.4          1.4         0.3     setosa

8            5.0         3.4          1.5         0.2     setosa

9            4.4         2.9          1.4         0.2     setosa

10           4.9         3.1          1.5         0.1     setosa

11           5.4         3.7          1.5         0.2     setosa

12           4.8         3.4          1.6         0.2     setosa

Now let us plot a basic Point Plot. We input the iris dataframe as data and assign the Sepal.Length column to x axis and Sepal.Width to y axis.

Add geom_point() to plot the points to the plot.

> library(ggplot2)
> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
Point Plot created with gglot2 library in R language
Plot created with gglot2 library in R language

We often want to separate data according to the field in our data. For example let us specify the Species column as our color argument with colour = Species.

> # Add colors to groups
> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Species)) + geom_point()
Point Plot with color distinctions created with gglot2 library in R language
Point Plot with color distinctions created with gglot2 library in R language

 

Histogram in ggplot2

histogram organizes a group of data points into ranges and plots the frequency of occurrence within each range. As a result a Histogram is one of the most widely used chart types in Statistics and Analysis.

> ggplot(iris, aes(x = Sepal.Length)) + geom_histogram()
Histogram plot created with ggplot2
Histogram created with ggplot2

Moreover, if we wanted to color separate the chart by group we specify Species column as our fill argument with fill = Species.

>  # Add colors to groups
> ggplot(iris, aes(x = Sepal.Length, fill = Species)) + geom_histogram()
Histogram with colour categorization
Histogram with color categorization

Boxplot

As one of the most important statistical representations of data, the Boxplot illustrates the distribution of data based on descriptive statistics: minimum, first quartile, median, third quartile, and maximum.

> ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot()
Boxplot created with ggplot2
Boxplot created with ggplot2




Trend Line

If our data varies widely, it’s very hard to understand a “general direction” the data is taking. Therefore we need a “tool” to visualize the general trend of data. So enter the simple trendline to give us a mathematical description of our data. We get the trendline by adding geom_smooth().

> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + geom_smooth()
TrendLine on a Point Plot created by ggplot2
TrendLine on a Point Plot created by ggplot2

In the same vain as with other plots, we can add color to trendlines. To add trendline by group we specify Species column as our colour argument with colour = Species.

>  # Add trendlines based on groups in Species column
> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + geom_smooth(aes(colour = Species))
Multiple TrendLines based on groups in Species Column
Multiple TrendLines based on groups in Species Column

Labels

Labels are critical to every plot. In short, to add labels to the plot created in ggplot2 with labs() function.

> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
geom_point() + 
geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant")

Labels added to a Point Plot with TrendLines in ggplot2
Labels added to a Point Plot with TrendLines in ggplot2

Themes

Add theme to change look of the plot. Some of the most popular themes are: theme_classic(),

theme_bw(), theme_minimal(), theme_gray().

Firstly, let’s look at theme classic.


> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant") +
    theme_classic()
Different Themes in ggplot2
Different Themes in ggplot2

Secondly, here is theme bw.

> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant") +
  theme_bw()

Further, here is theme minimal.

> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant") +
  theme_minimal()

And last but not least here is theme gray.

> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant") +
  theme_gray()

In short, themes are great as they allow us to have a uniform look throughout our reports.

Legend

Certainly, the ability to change the position of the legend within a theme is something we will often use. Therefore here is an example.

> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant") +
  theme_minimal() +
  theme(legend.position = "bottom")

Facets or Small Multiples in ggplot2

Ggplot2 can neatly arrange our plot into smaller facets or small multiples. Functions we use are called facet_wrap() and facet_grid().

facet_wrap() is used to show a different plot for each level of a single variable.

facet_grid() is used to show all intersection plots of two variables.

> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant") +
  facet_wrap(facets = ~Species)
> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(aes(colour = Species)) +
  labs(
    title = "Sepal measurements of Iris flower",
    subtitle = "The use of multiple measurements in taxonomic problems, 1936",
    x = "Sepal length",
    y = "Sepal width",
    color = "Variant") +
  facet_grid(Species ~ .)

Saving images

Plots can be saved as image or PDF directly in plot viewer by selecting Export > Save as Image or Save as PDF.

Alternatively, you can use function ggsave(filename, path). For example:

> ggsave(filename = "my_ggplot.png", path = "C:/temp")
 

If path argument is omitted, it will default to current working directory.

Learn more about ggplot2

This was a quick overview of ggplot2 basics. To learn more, stand by for our upcoming R course where we’ll go in depth and cover all the relevant topics of R language.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.