This package provides cleaned and formatted data for for entity resolution (record linkage or de-duplication) from the CD data set. The CD data set contains a total of 9763 records.
There are respective gold standard records that are labeled and can be considered as a unique identifier. The total size of the data set is 9763. The data set has a respective “gold” data set that provides information on which records are a match based on id.
# Install the development version from GitHub
devtools::install_github(“resteorts/cd”)