Getting Started with baseballr

Saiem Gilani
@saiemgilani @saiemgilani

2022-04-21

Welcome folks,

I’m Saiem Gilani, one of the authors of baseballr, and I hope to give the community a high-quality resource for accessing men’s baseball data for statistical analysis, baseball research, and more. I am excited to show you some of what you can do with this edition of the package.

Installing R and RStudio

  1. Head to https://cran.r-project.org
  2. Select the appropriate link for your operating system (Windows, Mac OS X, or Linux)
  1. Head to RStudio.com
  2. Follow the associated download and installation instructions for RStudio.
  3. Start peering over the RStudio IDE Cheatsheet. An IDE is an integrated development environment.
  4. For Windows users: I recommend you install Rtools. This is not an R package! It is “a collection of resources for building packages for R under Microsoft Windows, or for building R itself”. Go to https://cran.r-project.org/bin/windows/Rtools/ and follow the directions for installation.

Install baseballr

# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
  install.packages('pacman')
}
pacman::p_load_current_gh("billpetti/baseballr")

The Data

There are generally speaking eight men’s baseball data sources accessible from this package:

Function names indicate the data source

As of baseballr v1.0.0, a function naming convention was implemented to have the data source indicator appear at the start of the function name:

  • Functions that use the baseballr-data repository will contain load_ or update_ in the function name and would be considered loading functions for the play-by-play data, team box scores, and player box scores.

  • Functions that use the MLB Stats API start with mlb_ by convention and should be assumed as get functions. As of baseballr version 1.2.0, the package exports ~88 functions covering the MLB Stats API.

  • Functions that use one of Baseball Savant’s Statcast APIs start with statcast_ by convention and should be assumed as get functions. These functions allow for live access to Statcast data for the MLB games in-progress. As of baseballr version 1.2.0, the package exports ~5 Statcast-related functions.

  • Functions that use Chadwick Bureau’s Public Register of Baseball Players start with chadwick_, playerid_, or playername_ by convention and should be assumed as get functions. These functions allow for access to the Bureau’s public register of baseball players. As of baseballr version 1.2.0, the package exports 3 functions sourced using the Chadwick Bureau’s public register of baseball players.

  • Functions that use Baseball Reference’s website start with bref_ by convention and should be assumed as get functions. As of baseballr version 1.2.0, the package exports ~4 functions covering Baseball Reference.

  • Functions that use FanGraphs’s baseball website start with fg_ by convention and should be assumed as get functions. As of baseballr version 1.2.0, the package exports ~11 functions covering FanGraphs.com.

  • Functions that use Retrosheet’s baseball data start with retrosheet_ by convention and should be assumed as get functions. As of baseballr version 1.2.0, the package exports 1 function for Retrosheet Data.

  • Functions that use the NCAA website start with ncaa_ by convention and should be assumed as get functions. As of baseballr version 1.2.0, the package exports ~8 function covering the NCAA Stats portal.

Follow the SportsDataverse on Twitter and star this repo

Twitter Follow

GitHub stars

Our Authors