Using nVennR to generate and explore n-dimensional, quasi-proportional Venn diagrams

nVennR provides an R interface to the nVenn algorithm. This vignette intends to illustrate three basic uses of nVennR:

Create diagrams

There are two ways to create a Venn diagram object (nVennObj), which will be referenced as high-level (by providing intersecting lists) and low-level (from scratch).

High-level

The most common use for a package like nVennR is to depict the relationships between several intersecting lists. The main function for this task is plotVenn. The input is a list of vectors (or lists) describing each set. The name of each inner vector will be used for labeling. If inner vectors are not named, labels can be provided as a vector with sNames. Empty names will be filled with GroupN. Examples:

library(nVennR)
exampledf
#>    Employee SAS Python R
#> 1      A001   Y      Y Y
#> 2      A002   N      Y Y
#> 3      A003   Y      Y N
#> 4      A004   Y      Y Y
#> 5      A005   Y      N N
#> 6      A006   Y      N Y
#> 7      A007   N      N N
#> 8      A008   Y      N N
#> 9      A009   N      N Y
#> 10     A010   N      N Y
#> 11     A011   Y      Y Y
#> 12     A012   Y      Y Y
#> 13     A013   Y      N Y
#> 14     A014   Y      N Y
#> 15     A015   N      N Y
#> 16     A016   N      N Y
#> 17     A017   N      Y N
#> 18     A018   N      Y N
sas <- subset(exampledf, SAS == "Y")$Employee
python <- subset(exampledf, Python == "Y")$Employee
rr <- subset(exampledf, R == "Y")$Employee
myV <- plotVenn(list(SAS=sas, PYTHON=python, R=rr), nCycles = 2000)

The number of sets is arbitrary. For more than five sets, the default 7000 simulation cycles may not be enough. You can set a different number of cycles with nCycles, or you can run the simulation repeatedly by providing the returned nVennObj to plotVenn. Repeated execution is encouraged, as long simulations are resource-intensive. Also, the nVenn algorithm lowers the speed of the simulation if the topology of the diagram fails. Running it a second time as shown below can recover resets the speed of the simulations, and sometimes makes it significantly faster.

myV2 <- plotVenn(list(SAS=sas, PYTHON=python, R=rr, c("A006", "A008", "A011", "Unk"), c("A011", "Unk", "A101", "A006", "A000"), c("A101", "A006", "A008")))
myV2 <- plotVenn(nVennObj = myV2)

Low-level

Users can also build an nVennObj from scratch. Most of the time, this will not be useful, but it might have some theoretical applications. For instance, let us get a five-set Venn diagram (in Venn diagrams, all the regions are shown). With the high-level procedure, we would need five sets with all the possible intersections. Instead, we can use createVennObj:

myV3 <- createVennObj(nSets = 5, sSizes = c(rep(1, 32)))
myV3 <- plotVenn(nVennObj = myV3, nCycles = 5000)
myV3 <- plotVenn(nVennObj = myV3, nCycles = 5000)

The sSizes vector contains the values of each region of the diagram. From the help,

To understand the order of the regions, one can think of a region as a binary number. Each bit tells whether the region belongs (1) or not (0) to a given set. For instance, with 4 sets we have 4 bits. The number 7 with 4 bits is 0111, which describes a region belonging to sets 2, 3, and 4 and not to set 1. To pass the values of the regions, those values are sorted according to the number describing the region. Thus, with four sets, the first element corresponds to region 0 (0000), the second to region 1 (0001), the third to region 2 (0010), … The last corresponds to region 15 (1111), which belongs to all the sets.

You can also read more than you want to know about this procedure here and here.

After creating myV3, we can manipulate the size of each region separately with the setVennRegion function:

myV3 <- setVennRegion(myV3, region = c("Group1", "Group3", "Group4"), value = 4) # region equivalent to c(1, 0, 1, 1, 0)
myV3 <- setVennRegion(myV3, region = c(0, 1, 0, 0, 1), value = 8) # region equivalent to c("Group2", "Group5")
myV3 <- plotVenn(nVennObj = myV3, nCycles = 3000)

Manipulate figures

Once an nVennObj has been created, you can generate the corresponding figure with plotVenn, as shown in previous examples. If you are happy with the layout of the diagram, you do not need to run the simulation again to tweak the figure. This is better achieved with showSVG. At its present form, with showSVG you can change the colors and opacity of the sets, the width of the set borders and the font sizes of the labels. Opacity and border width are easy to understand:

showSVG(nVennObj = myV3, opacity = 0.1, borderWidth = 3)

Set colors are provided as a vector, with one color per set. Colors are expressed in an SVG CSS compatible form. Please, be aware that these expressions are not evaluated. If you provide wrong colors, the SVG rendering will fail.

showSVG(nVennObj = myV3, setColors = c('#d7100b', 'teal', 'yellow', 'black', '#2b55b7'))

There are two types of labels: size labels (large numbers) and region labels (smaller numbers in parentheses). They can be hidden by setting showNumbers and labelRegions to false, respectively. The font size for size labels is double the font size of region labels. In this version on nVennR, this ratio is fixed, but the sizes can be manipulated with fontScale. The number provided multiplies the default font size of the labels. Numbers larger than 2 are discouraged, as labels can overlap in the figure.

showSVG(nVennObj = myV3, opacity = 0.1, labelRegions = F, fontScale = 3) # Avoid overlaps by hiding region labels

You can set the default graphic device to export the figure generated by showSVG. If you set this device to a bitmap (e. g., with png()) you should set the width and height to get a good resolution. If you have experience with vector formats, you can edit the figure to suit your needs. Setting systemShow to true will attempt to open the SVG figure in your default editor (e. g., InkScape). You can also provide a file name with the outFile option.

Finally, showSVG is called from plotVenn at the end of each simulation. That call receives any extra parameter sent to plotVenn, and therefore you can directly manipulate the appearance of the result:

myV4 <- plotVenn(list(a=c(1, 2, 3), b=c(3, 4, 5), c=c(3, 6, 1)), nCycles = 2000, setColors=c('red', 'green', 'blue'), labelRegions=F, fontScale=2, opacity=0.2, borderWidth=2)

Explore diagram

Finally, one can explore the diagram using a couple of functions. Obviously, these functions cannot be used with objects generated through low-level functions.

getVennRegion

This function lists the elements belonging to a given region. The region is expressed as in setVennRegion:

getVennRegion(myV, c("R", "SAS"))
#> [1] "A006" "A013" "A014"
getVennRegion(myV, c(1, 1, 1))
#> [1] "A001" "A004" "A011" "A012"

listVennRegions

This function returns a list of all the regions in the diagram. In turn, each region contains a list of the elements in it. If you want to see a complete list, including empty regions, set na.rm to FALSE.


listVennRegions(myV4)
#> $`0, 0, 1 (c)`
#> [1] 6
#> 
#> $`0, 1, 0 (b)`
#> [1] 4 5
#> 
#> $`1, 0, 0 (a)`
#> [1] 2
#> 
#> $`1, 0, 1 (a, c)`
#> [1] 1
#> 
#> $`1, 1, 1 (a, b, c)`
#> [1] 3
listVennRegions(myV4, na.rm = F)
#> $`0, 0, 0 ()`
#> [1] NA
#> 
#> $`0, 0, 1 (c)`
#> [1] 6
#> 
#> $`0, 1, 0 (b)`
#> [1] 4 5
#> 
#> $`0, 1, 1 (b, c)`
#> [1] NA
#> 
#> $`1, 0, 0 (a)`
#> [1] 2
#> 
#> $`1, 0, 1 (a, c)`
#> [1] 1
#> 
#> $`1, 1, 0 (a, b)`
#> [1] NA
#> 
#> $`1, 1, 1 (a, b, c)`
#> [1] 3