This vignette assumes an understanding of IP addresses and
networks. Please consult
vignette("ipaddress-classes", "ipaddress")
for a very basic
introduction.
Data visualization of the IP address space is challenging because there are so many unique addresses (approximately 4.3 billion for IPv4 and \(3.8 \times 10^{38}\) for IPv6). Owing to the hierarchical nature of address space, we must plot the addresses on a discrete scale (not a continuous scale). It’s simply not possible to display (or interpret) such a large number of discrete levels simultaneously.
There are a few actions we can take to improve the situation:
These are handled by the canvas_network
,
pixel_prefix
and curve
arguments of
coord_ip()
, respectively. This vignette describes these
actions in more detail.
As an example, consider the 32-bit representation of the IPv4 address
192.168.0.124
. If we wanted to visualize this single
address within the full context of the IPv4 address space, we’d need to
simultaneously display \(2^{32}\)
discrete levels (roughly 4.3 billion).
To reduce the visualized information, we could only show a subnetwork
of the full address space. In our example, we could only display the
192.0.0.0/8
network. This would effectively filter
addresses where the leading 8 bits match the specified network, thereby
reducing the number of discrete levels to \(2^{24}\) (roughly 16.8 million).
Alternatively, we could make each discrete level represent a network of addresses. To do this, we’d need to use a summary function to reduce the network data to a single value. In our example, we could make each discrete level represent a network with a prefix length of 24. This would effectively neglect the trailing 8 bits of the 32-bit address, thereby further reducing the number of discrete levels to \(2^{16}\) (65,536).
These two techniques become even more important in the IPv6 address space, which uses 128-bit addresses.
Note: To prevent accidentally plotting an
unreasonably large number of discrete levels, ggip limits the number of
plotted bits to 24. This means the coord_ip()
arguments
must satisfy:
pixel_prefix - prefix_length(canvas_prefix) <= 24
Inspired by an xkcd comic originally published in December 2006, we use a space-filling curve to map IP data (one-dimensional) to Cartesian coordinates (two-dimensional). This means our discrete levels become represented by pixels. Two curves are commonly chosen for this task: the Hilbert curve and the Morton curve (also known as the Z curve). Compared to other space-filling curves, these are advantageous because they preserve locality (i.e. subnetworks remain close together).
The curve order represents how nested the curve is and therefore determines how many data points can be visualized. Conversely, choosing the number of plotted bits (see above) determines the order of the curve. Since space-filling curves are fractal, increasing the curve order effectively improves the image resolution (plotted networks remain in the same overall location).
IP data is most commonly displayed on a Hilbert curve because it has optimal locality preservation.
This curve starts in the top-left corner and ends in the top-right corner.
The Morton curve technically offers slightly poorer locality preservation than the Hilbert curve. However, the discontinuous jumps in the curve actually correspond to crossing IP network boundaries. In this sense, the Morton curve is a more natural representation of the IP network structure. For example, the start and end addresses of a network are always located diagonally across from each other.
This curve starts in the top-left corner and ends in the bottom-right corner.
Finally, let’s consider a specific example.
coord_ip(
canvas_prefix = ip_network("0.0.0.0/0"),
pixel_prefix = 4,
curve = "hilbert"
)
This coordinate system will use a 2nd order Hilbert curve to
visualize the entire IPv4 address space, where each vertex represents a
/4
network.