Custom Simulations

Tan Ho

2022-08-14

ffsimulator also provides the component subfunctions (prefixed with ffs_) in case you want to customize and run certain components individually. This vignette will discuss various options and paths you might use in your own analysis, and discuss various customizations for simulating the 2021 Scott Fish Bowl.

You can also use the ffs_copy_template() function to quickly get started with your own custom simulation!

library(ffsimulator)
library(ffscrapr)
library(dplyr)
library(ggplot2)

Overview of ff_simulate()

Loosely speaking, the main ff_simulate function has the following components:

so we’ll reuse this structure in this vignette!

Importing Data

By default, ff_simulate imports the following data based on the ff_connect conn object that is passed in:

scoring_history <- ffscrapr::ff_scoringhistory(conn, seasons = 2012:2020)

This retrieves week-level fantasy scoring for the specified years, and is built from nflfastR weekly data combined with the platform specified league rules.

latest_rankings <- ffs_latest_rankings(type = "draft") # also "week", for inseason sims

This retrieves the latest FantasyPros positional rankings available from the DynastyProcess data repository. If you want to customize the rankings used in your simulation, you can construct and replace this latest_rankings dataframe with one of a similar structure and column naming - the important ones are (positional) “ecr”, “sd”, “bye”, and “fantasypros_id”.

rosters <- ffs_rosters(conn)

This retrieves rosters and attaches a fantasypros_id to them. You could run hypothetical scenarios such as trades by editing this rosters dataframe by hand and then running the simulation!

lineup_constraints <- ffs_starter_positions(conn)

This retrieves lineup constraints from your fantasy platform. You can edit these to test out hypothetical starting lineup settings and minimum requirements!

league_info <- ffscrapr::ff_league(conn)

This brings in league information that is primarily used for plot names.

Generating Projections

ff_simulate runs two functions to generate “projections” - the first one builds the population of weekly scores to resample from, and the second one runs the bootstrap resampling for n_seasons x n_weeks.

adp_outcomes <- ffs_adp_outcomes(
    scoring_history = scoring_history,
    gp_model = "simple", # or "none"
    pos_filter = c("QB","RB","WR","TE","K")
  )

This builds out the population of weekly outcomes for each positional adp rank, using the above-mentioned scoring history as well as fp_rankings_history (2012-2020 historical positional rankings) and fp_injury_table(an injury model).

projected_scores <- ffs_generate_projections(
    adp_outcomes = adp_outcomes,
    latest_rankings = latest_rankings,
    n_seasons = 100, # number of seasons
    weeks = 1:14, # specifies which weeks to generate projections for
    rosters = rosters # optional, reduces the sample to just rostered players
  )

This uses the adp_outcomes table, latest rankings, some parameters (number of seasons, specific weeks), and rosters, generating a dataframe of length n_seasons x n_weeks x nrow(latest_rankings) and automatically blanking out NFL bye weeks.

Calculating Roster Scores

This is a simple process conceptually, but probably the most computationally expensive part of the simulation: first, inner join the projected_scores for each player onto the rosters, then run a linear programming optimizer to determine the optimal lineup and calculate the week’s score.

roster_scores <- ffs_score_rosters(
    projected_scores = projected_scores,
    rosters = rosters
  )

This function performs an inner join of these two tables and calculates position rank for each player (based on the scores for each week).

optimal_scores <- ffs_optimise_lineups(
    roster_scores = roster_scores,
    lineup_constraints = lineup_constraints,
    lineup_efficiency_mean = 0.775,
    lineup_efficiency_sd = 0.05,
    best_ball = FALSE, # or TRUE
    pos_filter = c("QB","RB","WR","TE","K")
  )

This function runs the lineup optimisation and applies a small lineup efficiency model.

Lineup efficiency refers to the ratio of “actual lineup score” to “optimal lineup score”. Lineup efficiency is generated as a random number that is normally distributed around 0.775 (77.5%) and has a standard deviation of 0.05. This gives the usual lineup efficiency range to be somewhere between 0.67 and 0.87, which is (in my experience) the typical range of lineup efficiency. You can adjust the lineup efficiency model for yourself, or perhaps apply your own modelling afterwards. best_ball forces lineup efficiency to be 100% of the optimal score.

There are options to use parallel processing - in my experience, 100 seasons of a 12 team league is too small to see any benefit from parallel. I’d recommend it for running larger simulations, i.e. 12 x 1000, or 100 x 1920 (like SFB)!

Building Schedules

In order to calculate head to head wins, you need a schedule! Enter ffs_build_schedules():

  schedules <- ffs_build_schedules(
    n_seasons = n_seasons,
    n_weeks = n_weeks,
    seed = NULL,
    franchises = ffs_franchises(conn)
  )

This efficiently builds a randomized head to head schedule for a given number of seasons, teams, and weeks.

It starts with the circle method for round robin scheduling, grows or shrinks the schedule to match the required number of weeks, and then shuffles both the order that teams are assigned in and the order that weeks are generated. This doesn’t “guarantee” unique schedules, but there are n_teams! x n_weeks! permutations of the schedule so it’s very very likely that the schedules are unique (3x10^18 possible schedules for a 12 team league playing 13 weeks).

Aggregating Results

Now that we have a schedule, we can aggregate by week, and then by season, and then by simulation:

summary_week <- ffs_summarise_week(optimal_scores, schedules)
summary_season <- ffs_summarise_season(summary_week)
summary_simulation <- ffs_summarise_simulation(summary_season)

Each summary function feeds into the next summary function!

SFB Simulation

Okay! So now that we’ve done that, let’s have a look at how I’d customize these functions to simulate SFB11 - a 1,920 team contest spread over 20 league IDs. (This code typically takes about 30 minutes to run in parallel and eats up about 40GB of memory).

options(ffscrapr.cache = "filesystem")
library(ffsimulator)
library(ffscrapr)
library(tidyverse)
library(tictoc) # for timing!

set.seed(613)

Package listing. I set ffscrapr to cache to my hard drive, and set a seed for reproducibility.

Import Data

conn <- mfl_connect(2021, 47747) # a random SFB league to grab league info from

league_info <- ffscrapr::ff_league(conn)

scoring_history <- ffscrapr::ff_scoringhistory(conn, 2012:2020)

adp_outcomes <- ffs_adp_outcomes(scoring_history = scoring_history, gp_model = "simple",pos_filter = c("QB","RB","WR","TE","K"))

latest_rankings <- ffs_latest_rankings()
lineup_constraints <- ffs_starter_positions(conn)

We can use one league ID here to grab most of the historical scoring data and rules/lineups etc for the entire SFB contest (rather than running it once for each of the twenty league IDs).

conn2 <- mfl_connect(2021)

leagues <- mfl_getendpoint(conn2, "leagueSearch", SEARCH = "#SFB11") %>%
  pluck("content","leagues","league") %>%
  tibble() %>%
  unnest_wider(1) %>%
  filter(str_detect(name,"Mock|Copy|Satellite|Template",negate = TRUE))


get_rosters <- function(league_id){
  mfl_connect(2021, league_id) %>%
    ffs_rosters()
}
get_franchises <- function(league_id){
  mfl_connect(2021, league_id) %>%
    ff_franchises()
}

rosters_raw <- leagues %>%
  select(-homeURL) %>%
  mutate(
    rosters = map(id, get_rosters),
    franchises = map(id, get_franchises)
  )

franchises <- rosters_raw %>%
  select(league_id = id, franchises) %>%
  unnest(franchises) %>%
  select(league_id, franchise_id, division_name)

rosters <- rosters_raw %>%
  select(rosters) %>%
  unnest(rosters) %>%
  left_join(franchises,by = c("league_id","franchise_id"))

Because SFB is spread over multiple league IDs, we need to get a list of IDs from the leagueSearch endpoint, map over them with the get_rosters and get_franchises helper functions we just defined, and attach the division name.

n_seasons <- 100
n_weeks <- 13
projected_scores <- ffs_generate_projections(adp_outcomes = adp_outcomes,
                                             latest_rankings = latest_rankings,
                                             n_seasons = n_seasons,
                                             weeks = 1:14,
                                             rosters = rosters)

tictoc::tic(glue::glue("ffs_score_rosters {Sys.time()}"))
roster_scores <- ffs_score_rosters(projected_scores, rosters)
tictoc::toc()

tictoc::tic("ffs_optimize_lineups {Sys.time()}")
optimal_scores <- ffs_optimize_lineups(
  roster_scores = roster_scores,
  lineup_constraints = lineup_constraints,
  pos_filter = c("QB","RB","WR","TE","K"),
  best_ball = FALSE)
tictoc::toc()

These are pretty straight forward, I use tictoc here to time the most expensive parts of the computation so that I know how long it takes - on my machine, this takes between 20-25 minutes to compute.

schedules <- ffs_build_schedules(franchises = franchises,
                                 n_seasons = n_seasons,
                                 n_weeks = n_weeks)

summary_week <- ffs_summarise_week(optimal_scores, schedules)
summary_season <- ffs_summarise_season(summary_week)
summary_simulation <- ffs_summarise_simulation(summary_season)

By comparison, these are very fast to compute (a minute or two total).