Introduction to PxWebApiData

Øyvind Langsrud, Jan Bruusgaard, Solveig Bjørkholt and Susie Jentoft

2021-10-11

Preface

An introduction to the r-package PxWebApiData is given below. Three calls to the main function, ApiData, are demonstrated. First, two calls for reading data sets are shown.The last call captures meta data. However, in practise, one may look at the meta data first. Then three more examples and some background is given.

Note that the text below was written before the possibility to return a single data set was included in the package (the functions ApiData1, ApiData2, ApiData12).

Specification by variable indices and variable id’s

The dataset below has three variables, Region, ContentsCode and Tid. The variables can be used as input parameters. Here two of the parameters are specified by variable id’s and one parameter is specified by indices. Negative values are used to specify reversed indices. Thus, we here obtain the two first and the two last years in the data.

A list of two data frames is returned; the label version and the id version.

ApiData("http://data.ssb.no/api/v0/en/table/04861", 
        Region = c("1103", "0301"), ContentsCode = "Bosatte", Tid = c(1, 2, -2, -1))
$`04861: Area and population of urban settlements, by region, contents and year`
             region            contents year  value
1 Oslo municipality Number of residents 2000 504348
2 Oslo municipality Number of residents 2002 508134
3 Oslo municipality Number of residents 2019 677139
4 Oslo municipality Number of residents 2020 689560
5         Stavanger Number of residents 2000 106804
6         Stavanger Number of residents 2002 108271
7         Stavanger Number of residents 2019 132771
8         Stavanger Number of residents 2020 137663

$dataset
  Region ContentsCode  Tid  value
1   0301      Bosatte 2000 504348
2   0301      Bosatte 2002 508134
3   0301      Bosatte 2019 677139
4   0301      Bosatte 2020 689560
5   1103      Bosatte 2000 106804
6   1103      Bosatte 2002 108271
7   1103      Bosatte 2019 132771
8   1103      Bosatte 2020 137663

Specification by TRUE, FALSE and imaginary values (e.g. 3i).

All possible values is obtained by TRUE and corresponds to “all” in the api query. Elimination of a variables is obtained by FALSE. An imaginary value corresponds to “top” in the api query.

x <- ApiData("http://data.ssb.no/api/v0/en/table/04861", 
        Region = FALSE, ContentsCode = TRUE, Tid = 3i)

It is possible to select either label version or id version

x[[1]]
                         contents year      value
1 Area of urban settlements (km²) 2018    2205.07
2 Area of urban settlements (km²) 2019    2206.45
3 Area of urban settlements (km²) 2020    2218.08
4             Number of residents 2018 4327937.00
5             Number of residents 2019 4368614.00
6             Number of residents 2020 4416981.00
x[[2]]
  ContentsCode  Tid      value
1        Areal 2018    2205.07
2        Areal 2019    2206.45
3        Areal 2020    2218.08
4      Bosatte 2018 4327937.00
5      Bosatte 2019 4368614.00
6      Bosatte 2020 4416981.00

Obtaining meta data

Meta information about the data set can be obtained by “returnMetaFrames = TRUE”.

ApiData("http://data.ssb.no/api/v0/en/table/04861",  returnMetaFrames = TRUE)
$Region
   values     valueTexts
1    3001         Halden
2    3002           Moss
3    3003      Sarpsborg
4    3004    Fredrikstad
5    3005        Drammen
6    3006      Kongsberg
7    3007      Ringerike
8    3011         Hvaler
9    3012        Aremark
10   3013         Marker
11   3014  Indre Østfold
12   3015       Skiptvet
13   3016      Rakkestad
14   3017           Råde
15   3018  Våler (Viken)
16   3019         Vestby
17   3020   Nordre Follo
18   3021             Ås
19   3022          Frogn
20   3023       Nesodden
21   3024          Bærum
22   3025          Asker
23   3026 Aurskog-Høland
24   3027       Rælingen
25   3028        Enebakk
 [ reached 'max' / getOption("max.print") -- omitted 796 rows ]

$ContentsCode
   values                      valueTexts
1   Areal Area of urban settlements (km²)
2 Bosatte             Number of residents

$Tid
   values valueTexts
1    2000       2000
2    2002       2002
3    2003       2003
4    2004       2004
5    2005       2005
6    2006       2006
7    2007       2007
8    2008       2008
9    2009       2009
10   2011       2011
11   2012       2012
12   2013       2013
13   2014       2014
14   2015       2015
15   2016       2016
16   2017       2017
17   2018       2018
18   2019       2019
19   2020       2020

attr(,"text")
      Region ContentsCode          Tid 
    "region"   "contents"       "year" 
attr(,"elimination")
      Region ContentsCode          Tid 
        TRUE        FALSE        FALSE 
attr(,"time")
      Region ContentsCode          Tid 
       FALSE        FALSE         TRUE 

Aggregations using filter agg:

PxWebApi has two more filters for groupings, agg: and vs:. You can see these filters in the code “API Query for this table” when you have made a table in PxWeb.

agg: is used for readymade aggregation groupings. This example shows the use of aggregation in age groups and aggregated timeseries for the new Norwegian municipality structure from 2020.

ApiData("http://data.ssb.no/api/v0/no/table/07459", 
        Region = list("agg:KommSummer", c("K-3001", "K-3002")), 
        Tid = 3i,
        Alder = list("agg:TodeltGrupperingB", c("H17", "H18")),
        Kjonn = TRUE)
$`07459: Befolkning, etter region, kjønn, alder, statistikkvariabel og år`
  region   kjønn             alder statistikkvariabel   år value
1 Halden    Menn           0-17 år           Personer 2019  3209
2 Halden    Menn           0-17 år           Personer 2020  3197
3 Halden    Menn           0-17 år           Personer 2021  3148
4 Halden    Menn 18 år eller eldre           Personer 2019 12509
5 Halden    Menn 18 år eller eldre           Personer 2020 12609
6 Halden    Menn 18 år eller eldre           Personer 2021 12674
7 Halden Kvinner           0-17 år           Personer 2019  3005
8 Halden Kvinner           0-17 år           Personer 2020  3023
 [ reached 'max' / getOption("max.print") -- omitted 16 rows ]

$dataset
  Region Kjonn Alder ContentsCode  Tid value
1 K-3001     1   H17    Personer1 2019  3209
2 K-3001     1   H17    Personer1 2020  3197
3 K-3001     1   H17    Personer1 2021  3148
4 K-3001     1   H18    Personer1 2019 12509
5 K-3001     1   H18    Personer1 2020 12609
6 K-3001     1   H18    Personer1 2021 12674
7 K-3001     2   H17    Personer1 2019  3005
8 K-3001     2   H17    Personer1 2020  3023
 [ reached 'max' / getOption("max.print") -- omitted 16 rows ]

There are two limitations in the PxWebApi here.

  1. The name of the filter and the IDs are not shown in metadata, only in the code “API Query for this table”.
  2. It is only possible to give single elements as input. Filter “all” "*" does not work with agg: and vs:

The other filter vs:, specify the grouping value sets, which is a part of the value pool. As it is only possible to give single elements as input, it is easier to query the value pool. Thar means that vs: is redundant.

In this example Region is the value pool and Fylker is the value set. These two will return the same result:

  Region = list("vs:Fylker",c("01","02"))  
  Region = list(c("01","02")).

Return the API query as JSON

In PxWebApi the original query is formulated in JSON. Using the parameter returnApiQuery can be useful for debugging.

ApiData("http://data.ssb.no/api/v0/en/table/04861",  returnApiQuery = TRUE)
{
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "item",
        "values": ["3001", "2399", "9999"]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": ["Areal", "Bosatte"]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "item",
        "values": ["2000", "2019", "2020"]
      }
    }
  ],
  "response": {
    "format": "json-stat"
  }
} 

Readymade datasets

Statistics Norway also provide an API with readymade datasets, available by http GET. Use the parameter getDataByGET = TRUE. By changing to lang=no you get the label version in Norwegian.

This dataset is from Economic trends forecasts.

x <- ApiData("https://data.ssb.no/api/v0/dataset/934516.json?lang=en", getDataByGET = TRUE)
x[[1]]
   year                                         contents value
1  2021                           Gross domestic product   3.0
2  2021                              GDP Mainland Norway   3.6
3  2021                                 Employed persons   0.7
4  2021                        Unemployment rate (level)   4.7
5  2021                      Wages per standard man-year   3.1
6  2021                       Consumer price index (CPI)   3.3
7  2021                                          CPI-ATE   1.9
8  2021                                   Housing prices   9.7
9  2021                        Money market rate (level)   0.5
10 2021 Import-weighted NOK exchange rate (44 countries)  -5.0
11 2022                           Gross domestic product   4.1
12 2022                              GDP Mainland Norway   3.8
13 2022                                 Employed persons   1.4
14 2022                        Unemployment rate (level)   4.4
15 2022                      Wages per standard man-year   3.1
16 2022                       Consumer price index (CPI)   1.9
 [ reached 'max' / getOption("max.print") -- omitted 24 rows ]

Practical example

We would like to extract the number of female R&D personel in the services sector of the Norwegian business life for the years 2017 and 2018.

  1. Locate the relevant table at https://www.ssb.no that contains information on R&D personel. Having obtained the relevant table, table 07964, we create the link https://data.ssb.no/api/v0/no/table/07964/

  2. Load the package.

library(PxWebApiData)
  1. Check which variables that exist in the data.
variables <- ApiData("https://data.ssb.no/api/v0/no/table/07964/", 
                     returnMetaFrames = TRUE)

names(variables)
## [1] "NACE2007"     "ContentsCode" "Tid"
  1. Check which values each variable contains.
values <- ApiData("https://data.ssb.no/api/v0/no/table/07964/", 
                  returnMetaData = TRUE)

values[[1]]$values
##  [1] "A-N"       "A03"       "B05-B09"   "B06_B09.1" "C"         "C10-C11"  
##  [7] "C13"       "C14-C15"   "C16"       "C17"       "C18"       "C19-C20"  
## [13] "C21"       "C22"       "C23"       "C24"       "C25"       "C26"      
## [19] "C26.3"     "C26.5"     "C27"       "C28"       "C29"       "C30"      
## [25] "C30.1"     "C31"       "C32"       "C32.5"     "C33"       "D35"      
## [31] "E36-E39"   "F41-F43"   "G-N"       "G46"       "H49-H53"   "J58"      
## [37] "J58.2"     "J59-J60"   "J61"       "J62"       "J63"       "K64-K66"  
## [43] "M70"       "M71"       "M72"       "M74.9"     "N82.9"
values[[2]]$values
## [1] "EnhetTot"           "EnheterFoU"         "FoUpersonale"      
## [4] "KvinneligFoUpers"   "FoUPersonaleUoHutd" "FoUPersonaleDoktor"
## [7] "FoUArsverk"         "FoUArsverkPers"     "FoUArsverkUtd"
values[[3]]$values
##  [1] "2007" "2008" "2009" "2010" "2011" "2012" "2013" "2014" "2015" "2016"
## [11] "2017" "2018" "2019"
  1. Define these variables in the query to sort out the values we want.
data <- ApiData("https://data.ssb.no/api/v0/en/table/07964/",
                Tid = c("2017", "2018"), # Define year to 2017 and 2018
                NACE2007 = "G-N", # Define the services sector
                ContentsCode = c("KvinneligFoUpers")) # Define women R&D personell

data <- data[[1]] # Extract the first list element, which contains full variable names.

head(data)
##   industry (SIC2007)             contents year value
## 1     Services total Female R&D personnel 2017  4408
## 2     Services total Female R&D personnel 2018  4528

Background

PxWeb and it’s API, PxWebApi is used as output database (Statbank) by many statistical agencies in the Nordic countries and several others, i.e. Statistics Norway, Statistics Finland, Statistics Sweden. See list of installations: https://www.scb.se/en/services/statistical-programs-for-px-files/px-web/pxweb-examples/