library(cbioportalR)
library(dplyr)
We will outline the main data retrieval workflow and functions using a case study based on two public sets of data:
Before accessing data you will need to connect to a cBioPortal database and set your base URL for the R session. In this example we will use data from the public cBioPortal database instance (https://www.cbioportal.org). You do not need a token to access this public website. If you are using a private instance of cBioPortal (like MSK’s institutional database), you will need to acquire a token and save it to your .Renviron file.
Note: If you are a MSK researcher working on IMPACT, you should connect to MSK’s cBioPortal instance to get the most up to date IMPACT data, and you must follow MSK-IMPACT publication guidelines when using the data.
To set the database url for your current R session use the
set_cbioportal_db()
function. To set it to the public
instance you can either provide the full URL to the function, or just
public
as a shortcut. This function will both check your
connection to the database and set the url
(www.cbioportal.org/api
) as your base url to connect to for
all future API calls during your session.
set_cbioportal_db("public")
#> ✔ You are successfully connected!
#> ✔ base_url for this R session is now set to "www.cbioportal.org/api"
You can use test_cbioportal_db
at any time throughout
your session to check your connection. This can be helpful when
troubleshooting issues with your API calls.
test_cbioportal_db()
#> ✔ You are successfully connected!
Now that we are successfully connected, we may want to view all
studies available for our chosen database to find the correct
study_id
corresponding to the data we want to pull. All
studies have a unique identifier in the database. You can view all
studies available in your database with the following:
<- available_studies()
all_studies
all_studies#> # A tibble: 348 × 13
#> studyId name description publicStudy groups status importDate allSampleCount readPermission cancerTypeId
#> <chr> <chr> <chr> <lgl> <chr> <int> <chr> <int> <lgl> <chr>
#> 1 acc_tcga Adre… "TCGA Adre… TRUE "PUBL… 0 2022-03-0… 92 TRUE acc
#> 2 blca_plasmacyt… Blad… "Whole exo… TRUE "" 0 2022-03-0… 34 TRUE blca
#> 3 bcc_unige_2016 Basa… "Whole-exo… TRUE "PUBL… 0 2022-03-0… 293 TRUE bcc
#> 4 all_stjude_2015 Acut… "Comprehen… TRUE "PUBL… 0 2022-03-0… 93 TRUE bll
#> 5 ampca_bcm_2016 Ampu… "Exome seq… TRUE "PUBL… 0 2022-03-0… 160 TRUE ampca
#> 6 blca_dfarber_m… Blad… "Whole exo… TRUE "PUBL… 0 2022-03-0… 50 TRUE blca
#> 7 blca_mskcc_sol… Blad… "Comprehen… TRUE "PUBL… 0 2022-03-0… 97 TRUE blca
#> 8 blca_bgi Blad… "Whole-exo… TRUE "PUBL… 0 2022-03-0… 99 TRUE blca
#> 9 blca_mskcc_sol… Blad… "Genomic P… TRUE "PUBL… 0 2022-03-0… 109 TRUE blca
#> 10 all_stjude_2013 Hypo… "Whole gen… TRUE "" 0 2022-03-0… 44 TRUE myeloid
#> # … with 338 more rows, and 3 more variables: referenceGenome <chr>, pmid <chr>, citation <chr>
By inspecting this data frame, we see the unique
study_id
for the NMIBC data set is
"blca_nmibc_2017"
and the unique study_id
for
the prostate cancer data set is "prad_msk_2019"
. To get
more information on our studies we can do the following:
Note: the transpose function t()
is just used here
to better view results
%>%
all_studies filter(studyId %in% c("blca_nmibc_2017", "prad_msk_2019"))
#> # A tibble: 2 × 13
#> studyId name description publicStudy groups status importDate allSampleCount readPermission cancerTypeId
#> <chr> <chr> <chr> <lgl> <chr> <int> <chr> <int> <lgl> <chr>
#> 1 blca_nmibc_2017 Nonmu… IMPACT seq… TRUE PUBLIC 0 2022-03-0… 105 TRUE blca
#> 2 prad_msk_2019 Prost… MSK-IMPACT… TRUE PUBLIC 0 2022-03-0… 18 TRUE prostate
#> # … with 3 more variables: referenceGenome <chr>, pmid <chr>, citation <chr>
More in-depth information about the study can be found with
get_study_info()
get_study_info("blca_nmibc_2017") %>%
t()
#> [,1]
#> name "Nonmuscle Invasive Bladder Cancer (MSK Eur Urol 2017)"
#> description "IMPACT sequencing of 105 High Risk Nonmuscle Invasive Bladder Cancer samples."
#> publicStudy "TRUE"
#> pmid "28583311"
#> citation "Pietzak et al. Eur Urol 2017"
#> groups "PUBLIC"
#> status "0"
#> importDate "2022-03-04 17:49:56"
#> allSampleCount "105"
#> sequencedSampleCount "105"
#> cnaSampleCount "105"
#> mrnaRnaSeqSampleCount "0"
#> mrnaRnaSeqV2SampleCount "0"
#> mrnaMicroarraySampleCount "0"
#> miRnaSampleCount "0"
#> methylationHm27SampleCount "0"
#> rppaSampleCount "0"
#> massSpectrometrySampleCount "0"
#> completeSampleCount "0"
#> readPermission "TRUE"
#> studyId "blca_nmibc_2017"
#> cancerTypeId "blca"
#> cancerType.name "Bladder Urothelial Carcinoma"
#> cancerType.dedicatedColor "Yellow"
#> cancerType.shortName "BLCA"
#> cancerType.parent "bladder"
#> cancerType.cancerTypeId "blca"
#> referenceGenome "hg19"
get_study_info("prad_msk_2019") %>%
t()
#> [,1]
#> name "Prostate Cancer (MSK, Cell Metab 2020)"
#> description "MSK-IMPACT Sequencing of 18 prostate cancer tumor/normal pairs."
#> publicStudy "TRUE"
#> pmid "31564440"
#> citation "Granlund et al. Cell Metab 2020"
#> groups "PUBLIC"
#> status "0"
#> importDate "2022-03-08 18:48:08"
#> allSampleCount "18"
#> sequencedSampleCount "18"
#> cnaSampleCount "18"
#> mrnaRnaSeqSampleCount "0"
#> mrnaRnaSeqV2SampleCount "0"
#> mrnaMicroarraySampleCount "0"
#> miRnaSampleCount "0"
#> methylationHm27SampleCount "0"
#> rppaSampleCount "0"
#> massSpectrometrySampleCount "0"
#> completeSampleCount "0"
#> readPermission "TRUE"
#> studyId "prad_msk_2019"
#> cancerTypeId "prostate"
#> cancerType.name "Prostate"
#> cancerType.dedicatedColor "Cyan"
#> cancerType.shortName "PROSTATE"
#> cancerType.parent "tissue"
#> cancerType.cancerTypeId "prostate"
#> referenceGenome "hg19"
Lastly, it is important to know what genomic data is available for our studies. Not all studies in your database will have data available on all types of genomic information. For example, it is common for studies not to provide data on fusions.
We can check available genomic data with
available_profiles()
.
available_profiles(study_id = "blca_nmibc_2017")
#> # A tibble: 3 × 8
#> molecularAlterationType datatype name description showProfileInAn… patientLevel molecularProfil… studyId
#> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <chr>
#> 1 COPY_NUMBER_ALTERATION DISCRETE Putative copy… Copy Numbe… TRUE FALSE blca_nmibc_2017… blca_n…
#> 2 MUTATION_EXTENDED MAF Mutations Mutation d… TRUE FALSE blca_nmibc_2017… blca_n…
#> 3 STRUCTURAL_VARIANT FUSION Fusions Fusions. TRUE FALSE blca_nmibc_2017… blca_n…
available_profiles(study_id = "prad_msk_2019")
#> # A tibble: 3 × 8
#> molecularAlterationType datatype name description showProfileInAn… patientLevel molecularProfil… studyId
#> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr> <chr>
#> 1 COPY_NUMBER_ALTERATION DISCRETE Putative copy… Putative c… TRUE FALSE prad_msk_2019_c… prad_m…
#> 2 MUTATION_EXTENDED MAF Mutations IMPACT468 … TRUE FALSE prad_msk_2019_m… prad_m…
#> 3 STRUCTURAL_VARIANT FUSION Fusions Fusion dat… TRUE FALSE prad_msk_2019_f… prad_m…
Luckily, in this example our studies have mutation, copy number
alteration and fusion data available. Each of these data types has a
unique molecular profile ID. The molecular profile ID usually takes the
form of <study_id>_mutations
,
<study_id>_fusion
,
<study_id>_cna
.
available_profiles(study_id = "blca_nmibc_2017") %>%
pull(molecularProfileId)
#> [1] "blca_nmibc_2017_cna" "blca_nmibc_2017_mutations" "blca_nmibc_2017_fusion"
Now that we have inspected our studies and confirmed the genomic data that is available, we will pull the data into our R environment. We will show two ways to do this:
get_genetics_by_study()
)get_genetics_by_sample()
)Pulling by study will give us genomic data for all genes/panels included in the study. These functions can only pull data one study ID at a time and will return all genomic data available for that study. Pulling by study ID can be efficient, and a good way to ensure you have all genomic information available in cBioPortal for a particular study.
If you are working across multiple studies, or only need a subset of samples from one or multiple studies, you may chose to pull by sample IDs instead of study ID. When you pull by sample IDs you can pull specific samples across multiple studies, but must also specify the studies they belong to. You may also pass a specific list of genes for which to return information. If you don’t specify a list of genes the function will default to returning all available gene data for each sample.
To pull by study ID, we can pull each data type individually.
<- get_mutations_by_study(study_id = "blca_nmibc_2017")
mut_blca #> ℹ Returning all data for the "blca_nmibc_2017_mutations" molecular profile in the "blca_nmibc_2017" study
<- get_cna_by_study(study_id = "blca_nmibc_2017")
cna_blca#> ℹ Returning all data for the "blca_nmibc_2017_cna" molecular profile in the "blca_nmibc_2017" study
<- get_fusions_by_study(study_id = "blca_nmibc_2017")
fus_blca #> ℹ Returning all data for the "blca_nmibc_2017_fusion" molecular profile in the "blca_nmibc_2017" study
<- get_mutations_by_study(study_id = "prad_msk_2019")
mut_prad #> ℹ Returning all data for the "prad_msk_2019_mutations" molecular profile in the "prad_msk_2019" study
<- get_cna_by_study(study_id = "prad_msk_2019")
cna_prad #> ℹ Returning all data for the "prad_msk_2019_cna" molecular profile in the "prad_msk_2019" study
<- get_fusions_by_study(study_id = "prad_msk_2019")
fus_prad #> ℹ Returning all data for the "prad_msk_2019_fusion" molecular profile in the "prad_msk_2019" study
Or we can pull all genomic data at the same time with
get_genetics_by_study()
<- get_genetics_by_study("blca_nmibc_2017")
all_genomic_blca #> ℹ Returning all data for the "blca_nmibc_2017_mutations" molecular profile in the "blca_nmibc_2017" study
#> ℹ Returning all data for the "blca_nmibc_2017_cna" molecular profile in the "blca_nmibc_2017" study
#> ℹ Returning all data for the "blca_nmibc_2017_fusion" molecular profile in the "blca_nmibc_2017" study
<- get_genetics_by_study("prad_msk_2019")
all_genomic_prad#> ℹ Returning all data for the "prad_msk_2019_mutations" molecular profile in the "prad_msk_2019" study
#> ℹ Returning all data for the "prad_msk_2019_cna" molecular profile in the "prad_msk_2019" study
#> ℹ Returning all data for the "prad_msk_2019_fusion" molecular profile in the "prad_msk_2019" study
all_equal(mut_blca, all_genomic_blca$mutation)
#> [1] TRUE
all_equal(cna_blca, all_genomic_blca$cna)
#> [1] TRUE
all_equal(fus_blca, all_genomic_blca$fusion)
#> [1] TRUE
Finally, we can join the two studies together
<- bind_rows(mut_blca, mut_prad)
mut_study <- bind_rows(cna_blca, cna_prad)
cna_study <- bind_rows(fus_blca, fus_prad) fus_study
When we pull by sample IDs, we can pull specific samples across
multiple studies. In the above example, we can pull from both studies at
the same time for a select set of samples using the
sample_study_pairs
argument in
get_genetics_by_sample()
.
Let’s pull data for the first 10 samples in each study. We first need to construct our dataframe to pass to the function:
Note: you can also run available_patients()
to only
pull patient IDs
<- available_samples("blca_nmibc_2017") %>%
s1 select(sampleId, patientId, studyId) %>%
head(10)
<- available_samples("prad_msk_2019") %>%
s2 select(sampleId, patientId, studyId) %>%
head(10)
<- bind_rows(s1, s2) %>%
df_pairs select(-patientId)
We need to rename the columns as per the functions documentation.
<- df_pairs %>%
df_pairs rename("sample_id" = sampleId,
"study_id" = studyId)
Now we pass this to get_genetics_by_sample()
<- get_genetics_by_sample(sample_study_pairs = df_pairs)
all_genomic #> Joining, by = "study_id"
#> The following parameters were used in query:
#> Study ID: "blca_nmibc_2017" and "prad_msk_2019"
#> Molecular Profile ID: blca_nmibc_2017_mutations and prad_msk_2019_mutations
#> Genes: "All available genes"
#> Joining, by = "study_id"
#> The following parameters were used in query:
#> Study ID: "blca_nmibc_2017" and "prad_msk_2019"
#> Molecular Profile ID: blca_nmibc_2017_cna and prad_msk_2019_cna
#> Genes: "All available genes"
#> Joining, by = "study_id"
#> The following parameters were used in query:
#> Study ID: "blca_nmibc_2017" and "prad_msk_2019"
#> Molecular Profile ID: blca_nmibc_2017_fusion and prad_msk_2019_fusion
#> Genes: "All available genes"
<- all_genomic$mutation mut_sample
Like with querying by study ID, you can also pull data individually by genomic data type:
<- get_mutations_by_sample(sample_study_pairs = df_pairs)
mut_only #> Joining, by = "study_id"
#> The following parameters were used in query:
#> Study ID: "blca_nmibc_2017" and "prad_msk_2019"
#> Molecular Profile ID: blca_nmibc_2017_mutations and prad_msk_2019_mutations
#> Genes: "All available genes"
identical(mut_only, mut_sample)
#> [1] TRUE
Let’s compare these results with the ones we got from pulling by study:
# filter to our subset used in sample query
<- mut_study %>%
mut_study_subset filter(sampleId %in% df_pairs$sample_id)
# arrange to compare
<- mut_study_subset %>%
mut_study_subset arrange(desc(sampleId))%>%
arrange(desc(entrezGeneId))
<- mut_sample %>%
mut_sample arrange(desc(sampleId)) %>%
arrange(desc(entrezGeneId)) %>%
# reorder so columns in same order
select(names(mut_study_subset))
all.equal(mut_study_subset, mut_sample)
#> [1] TRUE
Both results are equal.
We can also limit our results to a specific set of genes by passing a
vector of Entrez Gene IDs or Hugo Symbols to the gene
argument, or a specified panel by passing a panel ID to the
panel
argument (see available_gene_panels()
for supported panels). This can be useful if, for example, we want to
pull all IMPACT gene results for two studies but one of the two uses a
much larger panel. In that case, we can limit our query to just the
genes for which we want results:
<- get_mutations_by_sample(sample_study_pairs = df_pairs, genes = "TP53")
by_hugo #> Joining, by = "study_id"
#> The following parameters were used in query:
#> Study ID: "blca_nmibc_2017" and "prad_msk_2019"
#> Molecular Profile ID: blca_nmibc_2017_mutations and prad_msk_2019_mutations
#> Genes: "TP53"
<- get_mutations_by_sample(sample_study_pairs = df_pairs, genes = 7157)
by_gene_id #> Joining, by = "study_id"
#> The following parameters were used in query:
#> Study ID: "blca_nmibc_2017" and "prad_msk_2019"
#> Molecular Profile ID: blca_nmibc_2017_mutations and prad_msk_2019_mutations
#> Genes: 7157
identical(by_hugo, by_gene_id)
#> [1] TRUE
get_mutations_by_sample(
sample_study_pairs = df_pairs,
panel = "IMPACT468") %>%
head()
#> Joining, by = "study_id"
#> The following parameters were used in query:
#> Study ID: "blca_nmibc_2017" and "prad_msk_2019"
#> Molecular Profile ID: blca_nmibc_2017_mutations and prad_msk_2019_mutations
#> Genes: "IMPACT468"
#> # A tibble: 6 × 33
#> hugoGeneSymbol entrezGeneId uniqueSampleKey uniquePatientKey molecularProfil… sampleId patientId studyId center
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 TERT 7015 UC0wMDAxNDUzLVQwM… UC0wMDAxNDUzOmJ… blca_nmibc_2017… P-00014… P-0001453 blca_n… MSKCC
#> 2 SMAD4 4089 UC0wMDAxNDUzLVQwM… UC0wMDAxNDUzOmJ… blca_nmibc_2017… P-00014… P-0001453 blca_n… MSKCC
#> 3 ERBB4 2066 UC0wMDAxNDUzLVQwM… UC0wMDAxNDUzOmJ… blca_nmibc_2017… P-00014… P-0001453 blca_n… MSKCC
#> 4 CUL3 8452 UC0wMDAxNDUzLVQwM… UC0wMDAxNDUzOmJ… blca_nmibc_2017… P-00014… P-0001453 blca_n… MSKCC
#> 5 PBRM1 55193 UC0wMDAxNDUzLVQwM… UC0wMDAxNDUzOmJ… blca_nmibc_2017… P-00014… P-0001453 blca_n… MSKCC
#> 6 APC 324 UC0wMDAxNDUzLVQwM… UC0wMDAxNDUzOmJ… blca_nmibc_2017… P-00014… P-0001453 blca_n… MSKCC
#> # … with 24 more variables: mutationStatus <chr>, validationStatus <chr>, tumorAltCount <int>, tumorRefCount <int>,
#> # normalAltCount <int>, normalRefCount <int>, startPosition <int>, endPosition <int>, referenceAllele <chr>,
#> # proteinChange <chr>, mutationType <chr>, functionalImpactScore <chr>, fisValue <dbl>, linkXvar <chr>,
#> # linkPdb <chr>, linkMsa <chr>, ncbiBuild <chr>, variantType <chr>, chr <chr>, variantAllele <chr>,
#> # refseqMrnaId <chr>, proteinPosStart <int>, proteinPosEnd <int>, keyword <chr>
You can also pull clinical data by study ID, sample ID, or patient ID. Pulling by sample ID will pull all sample-level characteristics (e.g. sample site, tumor stage at sampling time and other variables collected at time of sampling that may be available). Pulling by patient ID will pull all patient-level characteristics (e.g. age, sex, etc.). Pulling by study ID will pull all sample and patient-level characteristics at once.
You can explore what clinical data is available a study using:
<- available_clinical_attributes("blca_nmibc_2017")
attr_blca <- available_clinical_attributes("prad_msk_2019")
attr_prad
attr_prad#> # A tibble: 13 × 7
#> displayName description datatype patientAttribute priority clinicalAttribu… studyId
#> <chr> <chr> <chr> <lgl> <chr> <chr> <chr>
#> 1 Cancer Type Cancer Type STRING FALSE 1 CANCER_TYPE prad_m…
#> 2 Cancer Type Detailed Cancer Type Detailed STRING FALSE 1 CANCER_TYPE_DET… prad_m…
#> 3 Fraction Genome Altered Fraction Genome Altered NUMBER FALSE 20 FRACTION_GENOME… prad_m…
#> 4 Gene Panel Gene Panel. STRING FALSE 1 GENE_PANEL prad_m…
#> 5 Mutation Count Mutation Count NUMBER FALSE 30 MUTATION_COUNT prad_m…
#> 6 Oncotree Code Oncotree Code STRING FALSE 1 ONCOTREE_CODE prad_m…
#> 7 Sample Class The sample classificat… STRING FALSE 1 SAMPLE_CLASS prad_m…
#> 8 Number of Samples Per Patient Number of Samples Per … STRING TRUE 1 SAMPLE_COUNT prad_m…
#> 9 Sample Type The type of sample (i.… STRING FALSE 1 SAMPLE_TYPE prad_m…
#> 10 Sex Sex STRING TRUE 1 SEX prad_m…
#> 11 Somatic Status Somatic Status STRING FALSE 1 SOMATIC_STATUS prad_m…
#> 12 Specimen Preservation Type The method used for pr… STRING FALSE 1 SPECIMEN_PRESER… prad_m…
#> 13 TMB (nonsynonymous) TMB (nonsynonymous) NUMBER FALSE 1 TMB_NONSYNONYMO… prad_m…
There are a select set available for both studies:
<- intersect(attr_blca$clinicalAttributeId, attr_prad$clinicalAttributeId) in_both
The below pulls data at the sample level:
<- get_clinical_by_sample(sample_id = s1$sampleId,
clinical_blca study_id = "blca_nmibc_2017",
clinical_attribute = in_both)
<- get_clinical_by_sample(sample_id = s2$sampleId,
clinical_prad study_id = "prad_msk_2019",
clinical_attribute = in_both)
<- bind_rows(clinical_blca, clinical_prad)
all_clinical
%>%
all_clinical select(-contains("unique")) %>%
head()
#> # A tibble: 6 × 5
#> sampleId patientId studyId clinicalAttributeId value
#> <chr> <chr> <chr> <chr> <chr>
#> 1 P-0001453-T01-IM3 P-0001453 blca_nmibc_2017 CANCER_TYPE Bladder Cancer
#> 2 P-0001453-T01-IM3 P-0001453 blca_nmibc_2017 CANCER_TYPE_DETAILED Bladder Urothelial Carcinoma
#> 3 P-0001453-T01-IM3 P-0001453 blca_nmibc_2017 FRACTION_GENOME_ALTERED 0.4448
#> 4 P-0001453-T01-IM3 P-0001453 blca_nmibc_2017 MUTATION_COUNT 11
#> 5 P-0001453-T01-IM3 P-0001453 blca_nmibc_2017 ONCOTREE_CODE BLCA
#> 6 P-0001453-T01-IM3 P-0001453 blca_nmibc_2017 SOMATIC_STATUS Matched
The below pulls data at the patient level:
<- available_patients("blca_nmibc_2017")
p1
<- get_clinical_by_patient(patient_id = s1$patientId,
clinical_blca study_id = "blca_nmibc_2017",
clinical_attribute = in_both)
<- get_clinical_by_sample(sample_id = s2$patientId,
clinical_prad study_id = "prad_msk_2019",
clinical_attribute = in_both)
<- bind_rows(clinical_blca, clinical_prad)
all_clinical
%>%
all_clinical select(-contains("unique")) %>%
head()
#> # A tibble: 6 × 4
#> patientId studyId clinicalAttributeId value
#> <chr> <chr> <chr> <chr>
#> 1 P-0001453 blca_nmibc_2017 SAMPLE_COUNT 1
#> 2 P-0001453 blca_nmibc_2017 SEX Male
#> 3 P-0002166 blca_nmibc_2017 SAMPLE_COUNT 1
#> 4 P-0002166 blca_nmibc_2017 SEX Male
#> 5 P-0003238 blca_nmibc_2017 SAMPLE_COUNT 1
#> 6 P-0003238 blca_nmibc_2017 SEX Male
Like with the genomic data pull functions, you can also pull clinical data by study ID - sample ID, or ID study ID - patient ID pairs:
<- bind_rows(s1, s2) %>%
df_pairs select(-sampleId)
<- df_pairs %>%
df_pairs select(patientId, studyId)
Now we pass this to get_genetics_by_sample()
<- get_clinical_by_patient(patient_study_pairs = df_pairs,
all_patient_clinical clinical_attribute = in_both)
%>%
all_patient_clinical select(-contains("unique"))
#> # A tibble: 34 × 4
#> patientId studyId clinicalAttributeId value
#> <chr> <chr> <chr> <chr>
#> 1 P-0001453 blca_nmibc_2017 SAMPLE_COUNT 1
#> 2 P-0001453 blca_nmibc_2017 SEX Male
#> 3 P-0002166 blca_nmibc_2017 SAMPLE_COUNT 1
#> 4 P-0002166 blca_nmibc_2017 SEX Male
#> 5 P-0003238 blca_nmibc_2017 SAMPLE_COUNT 1
#> 6 P-0003238 blca_nmibc_2017 SEX Male
#> 7 P-0003257 blca_nmibc_2017 SAMPLE_COUNT 1
#> 8 P-0003257 blca_nmibc_2017 SEX Female
#> 9 P-0003261 blca_nmibc_2017 SAMPLE_COUNT 1
#> 10 P-0003261 blca_nmibc_2017 SEX Male
#> # … with 24 more rows