1. Data

James Hollway

Obtaining data

While {migraph} includes a number of datasets (see here), and there are several packages in R that already include a range of social network data, there often comes a time when it is necessary to import and analyse data from other sources. Fortunately, {migraph} has a range of tools you can employ to import your data and manipulate it.

Finding data

There are a great number of networks datasets and data resources. Here we keep just a necessarily partial list, but we are happy to update it whenever additional datasets are suggested. See for example:

See also:

Please let us know if you identify any further repositories of social or political networks and we would be happy to add them here.

Import data

{migraph} includes several functions that help read from (import) and write to (export) network data in a growing number of formats.

One format most users are long familiar with is Excel. In Excel, users are typically collecting network data as edgelists, nodelists, or both. Edgelists are typically the main object to be imported, and we can import them from an Excel file or a .csv file.1

library(migraph)
g1 <- read_edgelist("~Downloads/mynetworkdata.xlsx")
g1 <- read_edgelist("~Downloads/mynetworkdata.csv", sv = "semi-colon")
g1 <- read_edgelist()
n1 <- read_nodelist()

If you do not specify a particular file name, a helpful popup will open that assists you with locating and importing a file from your operating system. Importing a nodelist of nodal attributes operates very similarly.

In some cases, users will be faced with having to collect data themselves, or wish to first manipulate the data in Excel before importing it, but may be uncertain about the expected format of an edgelist. Here it may be useful to try exporting one of the built-in datasets in {migraph} to see how complete network data looks. If this is potentially complex, calling write_edgelist() without any arguments will export a test file with a barebones structure that you can overwrite with your own data.

There are other functions here too that help import from or export to common external network data formats. Here are some examples:

# for importing .net or .paj files
read_pajek()
write_pajek()
# for importing .##h files 
# (.##d files are automatically imported alongside)
read_ucinet()
write_ucinet()

Converting between formats

By default, read_ and write_ edgelist and nodelist will import objects into a data frame or tibble format or ‘class’ object, and read_ and write_ pajek or ucinet will import objects into a tidygraph class format.

These can be already useful, as {migraph} functions recognise and work with most main classes of network/graph objects in R: edgelists, matrices, igraph, tidygraph, and network objects.

However it is sometimes necessary to convert a given object from one class to another. Here we can use any of a collection of coercion functions, all prefixed by as_, to move from any of those objects that {migraph} recognises to any other.

Let’s use one of the built in datasets in {migraph} to demonstrate this. Davis, Gardner and Gardner’s (1941) ison_southern_women dataset is a classic two-mode network, so let’s use this to start with. {migraph} stores this dataset as an ‘igraph’ object, though other included datasets are in ‘tidygraph’ or sometimes ‘network’ formats.

library(migraph)
ison_southern_women # this is in igraph format
#> IGRAPH f8d9f5f UN-B 32 93 -- 
#> + attr: type (v/l), name (v/c)
#> + edges from f8d9f5f (vertex names):
#>  [1] EVELYN   --E1 EVELYN   --E2 EVELYN   --E3 EVELYN   --E4 EVELYN   --E5
#>  [6] EVELYN   --E6 EVELYN   --E8 EVELYN   --E9 LAURA    --E1 LAURA    --E2
#> [11] LAURA    --E3 LAURA    --E5 LAURA    --E6 LAURA    --E7 LAURA    --E8
#> [16] THERESA  --E2 THERESA  --E3 THERESA  --E4 THERESA  --E5 THERESA  --E6
#> [21] THERESA  --E7 THERESA  --E8 THERESA  --E9 BRENDA   --E1 BRENDA   --E3
#> [26] BRENDA   --E4 BRENDA   --E5 BRENDA   --E6 BRENDA   --E7 BRENDA   --E8
#> [31] CHARLOTTE--E3 CHARLOTTE--E4 CHARLOTTE--E5 CHARLOTTE--E7 FRANCES  --E3
#> [36] FRANCES  --E5 FRANCES  --E6 FRANCES  --E8 ELEANOR  --E5 ELEANOR  --E6
#> + ... omitted several edges
as_tidygraph(ison_southern_women) # now let's make it a tidygraph tbl_graph object
#> # A tbl_graph: 32 nodes and 93 edges
#> #
#> # A bipartite simple graph with 1 component
#> #
#> # Node Data: 32 × 2 (active)
#>   type  name     
#>   <lgl> <chr>    
#> 1 FALSE EVELYN   
#> 2 FALSE LAURA    
#> 3 FALSE THERESA  
#> 4 FALSE BRENDA   
#> 5 FALSE CHARLOTTE
#> 6 FALSE FRANCES  
#> # … with 26 more rows
#> #
#> # Edge Data: 93 × 2
#>    from    to
#>   <int> <int>
#> 1     1    19
#> 2     1    20
#> 3     1    21
#> # … with 90 more rows
as_network(ison_southern_women) # a network object
#>  Network attributes:
#>   vertices = 32 
#>   directed = FALSE 
#>   hyper = FALSE 
#>   loops = FALSE 
#>   multiple = FALSE 
#>   bipartite = 18 
#>   total edges= 93 
#>     missing edges= 0 
#>     non-missing edges= 93 
#> 
#>  Vertex attribute names: 
#>     vertex.names 
#> 
#> No edge attributes
as_matrix(ison_southern_women) # a matrix object
#>           E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14
#> EVELYN     1  1  1  1  1  1  0  1  1   0   0   0   0   0
#> LAURA      1  1  1  0  1  1  1  1  0   0   0   0   0   0
#> THERESA    0  1  1  1  1  1  1  1  1   0   0   0   0   0
#> BRENDA     1  0  1  1  1  1  1  1  0   0   0   0   0   0
#> CHARLOTTE  0  0  1  1  1  0  1  0  0   0   0   0   0   0
#> FRANCES    0  0  1  0  1  1  0  1  0   0   0   0   0   0
#> ELEANOR    0  0  0  0  1  1  1  1  0   0   0   0   0   0
#> PEARL      0  0  0  0  0  1  0  1  1   0   0   0   0   0
#> RUTH       0  0  0  0  1  0  1  1  1   0   0   0   0   0
#> VERNE      0  0  0  0  0  0  1  1  1   0   0   1   0   0
#> MYRA       0  0  0  0  0  0  0  1  1   1   0   1   0   0
#> KATHERINE  0  0  0  0  0  0  0  1  1   1   0   1   1   1
#> SYLVIA     0  0  0  0  0  0  1  1  1   1   0   1   1   1
#> NORA       0  0  0  0  0  1  1  0  1   1   1   1   1   1
#> HELEN      0  0  0  0  0  0  1  1  0   1   1   1   1   1
#> DOROTHY    0  0  0  0  0  0  0  1  1   1   0   1   0   0
#> OLIVIA     0  0  0  0  0  0  0  0  1   0   1   0   0   0
#> FLORA      0  0  0  0  0  0  0  0  1   0   1   0   0   0
# this is an incidence matrix since it is a two-mode network
# if it were a one-mode network, the function would return an adjacency matrix
as_edgelist(ison_southern_women) # an edgelist data frame/tibble
#> # A tibble: 93 × 2
#>    from   to   
#>    <chr>  <chr>
#>  1 EVELYN E1   
#>  2 EVELYN E2   
#>  3 EVELYN E3   
#>  4 EVELYN E4   
#>  5 EVELYN E5   
#>  6 EVELYN E6   
#>  7 EVELYN E8   
#>  8 EVELYN E9   
#>  9 LAURA  E1   
#> 10 LAURA  E2   
#> # … with 83 more rows

Working with network data

Transforming network data

Generally, {migraph} attempts to retain as much information as possible when converting objects between different classes. The presumption is that users should explicitly decide to reduce or simplify their data. {migraph} includes a number of functions for transforming (or removing) certain properties of network objects. For example:

Then there are a few more special functions included here too:

to_unnamed(ison_marvel_relationships)
#> # A tbl_graph: 53 nodes and 558 edges
#> #
#> # An undirected multigraph with 4 components
#> #
#> # Node Data: 53 × 9 (active)
#>   Gender Appearances Attractive  Rich Intellect Omnilingual PowerOrigin
#>   <chr>        <int>      <int> <int>     <int>       <int> <chr>      
#> 1 Male           427          0     0         1           1 Radiation  
#> 2 Male           589          1     0         1           0 Human      
#> 3 Male          1207          0     0         1           1 Mutant     
#> 4 Male          7609          1     0         1           0 Mutant     
#> 5 Male          2189          1     1         1           0 Human      
#> 6 Female        2907          1     0         1           0 Human      
#> # … with 47 more rows, and 2 more variables: UnarmedCombat <int>,
#> #   ArmedCombat <int>
#> #
#> # Edge Data: 558 × 3
#>    from    to  sign
#>   <int> <int> <dbl>
#> 1     1     4    -1
#> 2     1    11    -1
#> 3     1    12    -1
#> # … with 555 more rows
to_named(ison_algebra)
#> # A tbl_graph: 16 nodes and 144 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Node Data: 16 × 1 (active)
#>   name    
#>   <chr>   
#> 1 Jermaine
#> 2 Willie  
#> 3 Erma    
#> 4 Rita    
#> 5 Michele 
#> 6 Stacy   
#> # … with 10 more rows
#> #
#> # Edge Data: 144 × 5
#>    from    to friends social tasks
#>   <int> <int>   <dbl>  <dbl> <dbl>
#> 1     1     5       0   1.2    0.3
#> 2     1     8       0   0.15   0  
#> 3     1     9       0   2.85   0.3
#> # … with 141 more rows
to_undirected(ison_algebra)
#> # A tbl_graph: 16 nodes and 76 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 16 × 1 (active)
#>   name    
#>   <chr>   
#> 1 Melinda 
#> 2 Abby    
#> 3 Darryl  
#> 4 Veronica
#> 5 Rylan   
#> 6 Lindsey 
#> # … with 10 more rows
#> #
#> # Edge Data: 76 × 5
#>    from    to friends social tasks
#>   <int> <int>   <dbl>  <dbl> <dbl>
#> 1     1     2       1   0      0  
#> 2     2     3       0   0.15   0  
#> 3     1     5       0   1.2    0.3
#> # … with 73 more rows
to_unsigned(ison_marvel_relationships, keep = "positive")
#> # A tbl_graph: 53 nodes and 277 edges
#> #
#> # An undirected simple graph with 6 components
#> #
#> # Node Data: 53 × 10 (active)
#>   name  Gender Appearances Attractive  Rich Intellect Omnilingual PowerOrigin
#>   <chr> <chr>        <int>      <int> <int>     <int>       <int> <chr>      
#> 1 Abom… Male           427          0     0         1           1 Radiation  
#> 2 Ant-… Male           589          1     0         1           0 Human      
#> 3 Apoc… Male          1207          0     0         1           1 Mutant     
#> 4 Beast Male          7609          1     0         1           0 Mutant     
#> 5 Blac… Male          2189          1     1         1           0 Human      
#> 6 Blac… Female        2907          1     0         1           0 Human      
#> # … with 47 more rows, and 2 more variables: UnarmedCombat <int>,
#> #   ArmedCombat <int>
#> #
#> # Edge Data: 277 × 2
#>    from    to
#>   <int> <int>
#> 1     2    25
#> 2     2    29
#> 3     2    44
#> # … with 274 more rows

Note that for two-mode networks, there are also functions for converting or ‘projecting’ two-mode networks into one-mode networks.

to_mode1(ison_southern_women)
#> IGRAPH 1960c78 UNW- 18 139 -- 
#> + attr: name (v/c), weight (e/n)
#> + edges from 1960c78 (vertex names):
#>  [1] EVELYN --LAURA     EVELYN --BRENDA    EVELYN --THERESA   EVELYN --CHARLOTTE
#>  [5] EVELYN --FRANCES   EVELYN --ELEANOR   EVELYN --RUTH      EVELYN --PEARL    
#>  [9] EVELYN --NORA      EVELYN --VERNE     EVELYN --MYRA      EVELYN --KATHERINE
#> [13] EVELYN --SYLVIA    EVELYN --HELEN     EVELYN --DOROTHY   EVELYN --OLIVIA   
#> [17] EVELYN --FLORA     LAURA  --BRENDA    LAURA  --THERESA   LAURA  --CHARLOTTE
#> [21] LAURA  --FRANCES   LAURA  --ELEANOR   LAURA  --RUTH      LAURA  --PEARL    
#> [25] LAURA  --NORA      LAURA  --VERNE     LAURA  --SYLVIA    LAURA  --HELEN    
#> [29] LAURA  --MYRA      LAURA  --KATHERINE LAURA  --DOROTHY   THERESA--BRENDA   
#> + ... omitted several edges
to_mode2(ison_southern_women)
#> IGRAPH 517e95e UNW- 14 66 -- 
#> + attr: name (v/c), weight (e/n)
#> + edges from 517e95e (vertex names):
#>  [1] E1 --E2  E1 --E3  E1 --E4  E1 --E5  E1 --E6  E1 --E8  E1 --E9  E1 --E7 
#>  [9] E2 --E3  E2 --E4  E2 --E5  E2 --E6  E2 --E8  E2 --E9  E2 --E7  E3 --E4 
#> [17] E3 --E5  E3 --E6  E3 --E8  E3 --E9  E3 --E7  E4 --E5  E4 --E6  E4 --E8 
#> [25] E4 --E9  E4 --E7  E5 --E6  E5 --E8  E5 --E9  E5 --E7  E6 --E8  E6 --E9 
#> [33] E6 --E7  E6 --E10 E6 --E11 E6 --E12 E6 --E13 E6 --E14 E7 --E8  E7 --E9 
#> [41] E7 --E12 E7 --E10 E7 --E13 E7 --E14 E7 --E11 E8 --E9  E8 --E12 E8 --E10
#> [49] E8 --E13 E8 --E14 E8 --E11 E9 --E12 E9 --E10 E9 --E13 E9 --E14 E9 --E11
#> [57] E10--E12 E10--E13 E10--E14 E10--E11 E11--E12 E11--E13 E11--E14 E12--E13
#> + ... omitted several edges

Adding data

If you import one or more edgelists and nodelists, it can be useful to bind these together in an igraph, tidygraph, or network class object.

Adding nodal attributes to a given network is relatively straightforward. One can bind a single new attribute to the nodes with add_node_attribute() or copy a set of attributes from one network/graph to another with copy_node_attributes(). But often the easiest way to do this is to take a network/graph, make sure it is first coerced into a tidygraph object, and then add any additional nodal attributes (including measures from {migraph}) as follows:

as_tidygraph(mpn_elite_mex) %>% 
  mutate(order = 1:35,
         color = "red",
         degree = node_degree(mpn_elite_mex))
#> # A tbl_graph: 35 nodes and 117 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 35 × 11 (active)
#>   name  full_name entry_year military in_mpn PlaceOfBirth state region order
#>   <chr> <chr>          <dbl>    <dbl>  <dbl> <chr>        <chr>  <dbl> <int>
#> 1 Trev… Trevino,…       1910        1      0 Guerrero     Coah…      1     1
#> 2 Made… Madero, …       1911        0      0 Parras de l… Coah…      1     2
#> 3 Carr… Carranza…       1913        1      0 Cuatro Cien… Coah…      1     3
#> 4 Agui… Aguilar,…       1918        1      0 Cordoba      Vera…      3     4
#> 5 Obre… Obregon,…       1920        1      0 Siquisiva, … Sono…      1     5
#> 6 Call… Calles, …       1924        1      0 Guaymas      Sono…      1     6
#> # … with 29 more rows, and 2 more variables: color <chr>, degree <node_msr>
#> #
#> # Edge Data: 117 × 2
#>    from    to
#>   <int> <int>
#> 1     2     3
#> 2     2     5
#> 3     2     6
#> # … with 114 more rows

Adding edge attributes or new edges is not quite so straightforward, in part because you will need to decide which it is that you want to do. If you would like to just add a new tie attribute to an existing set of ties, without adding any new edges, then add_tie_attributes() operates similarly to add_node_attribute() above. But if the result should be a multiplex network and the ties in the different component networks only partially overlap, then you will need to use join_ties():

generate_random(10, .3) %>% 
  join_ties(generate_random(10, .3), "next")
#> # A tbl_graph: 10 nodes and 24 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 10 × 0 (active)
#> # … with 4 more rows
#> #
#> # Edge Data: 24 × 4
#>    from    to  orig `next`
#>   <int> <int> <dbl>  <dbl>
#> 1     1     4     0      1
#> 2     1     7     1      0
#> 3     1     8     0      1
#> # … with 21 more rows

Retrieving data

Lastly, sometimes we want to extract certain information from a network or graph object. Here too {migraph} has you covered.

node_names(mpn_elite_mex) # gets the names of the nodes
#>  [1] "Trevino"            "Madero"             "Carranza"          
#>  [4] "Aguilar"            "Obregon"            "Calles"            
#>  [7] "Aleman Gonzalez"    "Portes Gil"         "L. Cardenas"       
#> [10] "Avila Camacho"      "I. Beteta"          "Jara"              
#> [13] "R. Beteta"          "Aleman Valdes"      "Sanchez Taboada"   
#> [16] "Serra Rojas"        "Ruiz Galindo"       "Bustamante"        
#> [19] "Loyo"               "Carvajal"           "Ruiz Cortines"     
#> [22] "Carrillo Flores"    "Ortiz Mena"         "Gonzalez Blanco"   
#> [25] "Salinas Lozano"     "Lopez Mateos"       "Margain"           
#> [28] "Diaz Ordaz"         "M.R. Beteta"        "Echeverria Alvarez"
#> [31] "Lopez Portillo"     "C. Cardenas"        "De la Madrid"      
#> [34] "Salinas de Gortari" "Aleman Velasco"
node_attribute(ison_marvel_relationships, "Gender") # gets any named nodal attribute
#>  [1] "Male"   "Male"   "Male"   "Male"   "Male"   "Female" "Male"   "Male"  
#>  [9] "Male"   "Male"   "Male"   "Male"   "Male"   "Male"   "Male"   "Male"  
#> [17] "Male"   "Female" "Male"   "Male"   "Male"   "Male"   "Male"   "Male"  
#> [25] "Female" "Male"   "Male"   "Female" "Female" "Male"   "Male"   "Female"
#> [33] "Female" "Male"   "Male"   "Male"   "Female" "Male"   "Male"   "Male"  
#> [41] "Male"   "Male"   "Male"   "Female" "Male"   "Male"   "Female" "Male"  
#> [49] "Male"   "Male"   "Male"   "Male"   "Male"
tie_attribute(ison_marvel_relationships, "sign") # gets any named edge attribute
#>   [1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1  1 -1 -1 -1  1 -1 -1 -1 -1 -1 -1
#>  [26] -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
#>  [51]  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1
#>  [76]  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1
#> [101]  1  1  1  1  1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1 -1 -1 -1 -1  1  1
#> [126]  1  1  1  1  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1
#> [151]  1  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1
#> [176]  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1
#> [201]  1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1  1
#> [226]  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
#> [251] -1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1  1  1 -1 -1 -1
#> [276] -1  1  1 -1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1 -1 -1
#> [301]  1 -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1 -1
#> [326] -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1
#> [351] -1 -1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1
#> [376] -1 -1 -1  1  1  1  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1
#> [401]  1  1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1  1  1 -1  1  1 -1 -1 -1 -1
#> [426] -1 -1 -1 -1 -1 -1 -1 -1  1  1 -1 -1  1  1  1  1  1  1 -1 -1 -1  1 -1 -1 -1
#> [451] -1 -1 -1  1  1 -1 -1 -1 -1 -1  1  1  1  1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1
#> [476] -1 -1 -1  1  1  1  1  1 -1 -1 -1 -1 -1  1  1  1  1  1 -1 -1  1  1  1  1  1
#> [501]  1  1 -1  1  1  1  1 -1  1  1  1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1  1  1  1
#> [526] -1 -1 -1 -1  1  1  1  1 -1 -1  1  1  1  1  1 -1 -1 -1  1  1  1 -1 -1 -1 -1
#> [551]  1  1 -1 -1  1 -1  1 -1
tie_weights(mpn_elite_mex)
#>   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [112] 1 1 1 1 1 1

We can describe the network using similar functions. How many nodes in the network, or how many edges?

graph_nodes(mpn_elite_mex)
#> [1] 35
graph_ties(mpn_elite_mex)
#> [1] 117
graph_dims(mpn_elite_mex)
#> [1] 35

  1. Note that if you import from a .csv file, please specify whether the separation value should be commas (sv = "comma") or semi-colons (sv = "semi-colon").↩︎