While boxplots have become the de facto standard for plotting the distribution of data this is a vast oversimplification and may not show everything needed to evaluate the variation of data. This is particularly important for datasets which do not form a Gaussian “Normal” distribution that most researchers have become accustomed to.
While density plots are helpful in this regard, they can be less aesthetically pleasing than boxplots and harder to interpret for those familiar with boxplots. Often the only ways to compare multiple data types with density use slices of the data with faceting the plotting panes or overlaying density curves with colours and a legend. This approach is jarring for new users and leads to cluttered plots difficult to present to a wider audience.
##Violin Plots
Therefore violin plots are a powerful tool to assist researchers to visualise data, particularly in the quality checking and exploratory parts of an analysis. Violin plots have many benefits:
As shown below for the iris
dataset, violin plots show distribution information that the boxplot is unable to.
library("vioplot")
data(iris)
boxplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"))
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"))
##Violin y-axis
###Logarithmic scale
However the existing violin plot packages (such as ) do not support log-scale of the y-axis. This has been amended with the ylog
argument.
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", ylog = T, ylim=c(log(1), log(10)))
## Warning in plot.window(xlim, ylim, log = log, asp = asp, bty = bty, cex = cex, :
## nonfinite axis limits [GScale(-inf,0.362216,2, .); log=1]
## Warning in plot.window(xlim, ylim, log = log, asp = asp, bty = bty, cex = cex, :
## nonfinite axis limits [GScale(-inf,0.362216,2, .); log=1]
This can also be invoked with the log="y"
argument compatible with boxplot
:
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", log = T, ylim=c(log(1), log(10)))
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", log = "y", ylim=c(log(1), log(10)))
###custom y-axes
The y-axes can also be removed with yaxt="n"
to enable customised y-axes:
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", yaxt="n")
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", ylog = T, yaxt="n", ylim=c(log(1), log(10)))
## Warning in plot.window(xlim, ylim, log = log, asp = asp, bty = bty, cex = cex, :
## nonfinite axis limits [GScale(-inf,0.362216,2, .); log=1]
## Warning in plot.window(xlim, ylim, log = log, asp = asp, bty = bty, cex = cex, :
## nonfinite axis limits [GScale(-inf,0.362216,2, .); log=1]
Thus custom axes can be added to violin plots. As shown on a linear scale:
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", yaxt="n")
axis(2, at=1:10, labels=1:10)
As well as for on a log scale:
vioplot(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="versicolor"], iris$Sepal.Length[iris$Species=="virginica"], names=c("setosa", "versicolor", "virginica"), main="Sepal Length", yaxt="n", log="y", ylim=c(log(4), log(9)))
axis(2, at=log(1:10), labels=1:10)