Reading and Writing JSON

Parsing from JSON

There are already a couple of packages to parse JSON (e.g. rjson or jsonlite). But for me they all had some drawbacks when working with JSON columns from a database:

jsontools relies on jsonlite for parsing from JSON and conversion to JSON. For parsing it has the following functions:

As an example assume we got the following JSON vector

library(jsontools)

customer_infos <- json2(c(
  '{"name": "Peter", "age": 19, "premium": true}',
  '{"name": "Daniel", "age": 41}',
  NA,
  '{"name": "Pablo", "age": 27, "premium": false}'
))

customer_infos
#> {"name": "Peter", "age": 19, "premium": true}
#> {"name": "Daniel", "age": 41}
#> NA
#> {"name": "Pablo", "age": 27, "premium": false}

By default the NA produces an error

parse_json_vector(customer_infos)
#> Error: input is NA.
#> To use a default value use the argument `.na`.

but with the .na argument we can specify which value should be used instead:

parse_json_vector(customer_infos, .na = NULL) %>% str()
#> List of 4
#>  $ :List of 3
#>   ..$ name   : chr "Peter"
#>   ..$ age    : int 19
#>   ..$ premium: logi TRUE
#>  $ :List of 2
#>   ..$ name: chr "Daniel"
#>   ..$ age : int 41
#>  $ : NULL
#>  $ :List of 3
#>   ..$ name   : chr "Pablo"
#>   ..$ age    : int 27
#>   ..$ premium: logi FALSE

Note that the default for simplifyDataFrame and simplifyMatrix has been changed to FALSE. While they can be quite nice for interactive parsing of JSON they do not allow you to control how the result should look like. For a more controlled conversion of the resulting list to a data frame have a look at the tibblify package.

Conversion to JSON

For conversion there are the following functions:

Let’s look at our customer informations from above. Assume we have converted it into a tibble:

customer_infos_df <- tibble::tibble(
  name = c("Peter", "Daniel", NA, "Pablo"),
  age = c(19L, 41L, NA, 27L),
  premium = c(TRUE, NA, NA, FALSE)
)

and filled the missing fields:

customer_infos_df[3, ] <- tibble::tibble(
  name = "Michael",
  age = 38,
  premium = FALSE
)
customer_infos_df$premium[2] <- TRUE

customer_infos_df
#> # A tibble: 4 x 3
#>   name      age premium
#>   <chr>   <int> <lgl>  
#> 1 Peter      19 TRUE   
#> 2 Daniel     41 TRUE   
#> 3 Michael    38 FALSE  
#> 4 Pablo      27 FALSE

We can now easily convert it back into a JSON vector

format_json_rowwise(customer_infos_df)
#> {"name":"Peter","age":19,"premium":true}
#> {"name":"Daniel","age":41,"premium":true}
#> {"name":"Michael","age":38,"premium":false}
#> {"name":"Pablo","age":27,"premium":false}

Compare this to format_json() which turns the data frame into an array of objects:

format_json(customer_infos_df)
#> [{"name":"Peter","age":19,"premium":true},{"name":"Daniel","age":41,"premium":true},{"name":"Michael","age":38,"premium":false},{"name":"Pablo","age":27,"premium":false}]

To format every element of a list to JSON instead of the whole list at once use format_json_list():

customer_infos_list <- parse_json_vector(customer_infos, .na = NULL)
str(customer_infos_list)
#> List of 4
#>  $ :List of 3
#>   ..$ name   : chr "Peter"
#>   ..$ age    : int 19
#>   ..$ premium: logi TRUE
#>  $ :List of 2
#>   ..$ name: chr "Daniel"
#>   ..$ age : int 41
#>  $ : NULL
#>  $ :List of 3
#>   ..$ name   : chr "Pablo"
#>   ..$ age    : int 27
#>   ..$ premium: logi FALSE

format_json_list(customer_infos_list)
#> {"name":["Peter"],"age":[19],"premium":[true]}
#> {"name":["Daniel"],"age":[41]}
#> {}
#> {"name":["Pablo"],"age":[27],"premium":[false]}

Notice that each value is wrapped in a JSON array? If you do not want this you have three possibilities:

While auto_unbox = TRUE is quite handy you should only use it in non-interactive scripts as otherwise the output JSON might change the format. For example we a value customers in a list. Usually, customers is a vector of length > 1.

x <- list(customers = 1:2)
format_json(x, auto_unbox = TRUE)
#> {"customers":[1,2]}

But in some rare cases there is only a single customer

x <- list(customers = 1)
format_json(x, auto_unbox = TRUE)
#> {"customers":1}

and instead of an array we get a scalar integer at the key “customers”.