This is a post explain how to optimize a Daily Fantasy Sports (DFS) lineup using R and associated R packages. This specific post will shows how to optimize a line for a 9 player MLB “classic” FanDuel contest. At the time of this writing (April 12, 2020) FanDuel is only offering Full Roster Sim contests because no live MLB games are being played at this time. Here is a link to the specific FanDuel contest I used when writing this guide. Use my referral code and we each get $10: https://fndl.co/t0u657t

Background: In DFS, contestants create a lineup of real professional or college sports players, called an entry. Each statistic a player accumulates during a game corresponds to a fantasy point value. For that day’s games, the sum of each player’s fantasy points in an entry’s lineup with be the total score of that entry. Contestants compete against other contestants to have the highest scoring lineup. There are constraints to the lineup making, such as only certain number of players at certain positions. Also, each player has a predetermined salary and the total salary of each entry must be kept under a specific constraint. One other thing to note is, typically the DFS contest you enter will provide you with a CSV list of players available to choose in for your lineup. This list contains player names, positions, salary and other information and we need to use this list to start the optimization.

When I say “optimize” a DFS lineup, I am referring to how to score the most possible fantasy points while staying under the salary cap as well as other constraints specific to the contest. The one big variable here is how to know how many fantasy points each player will score in their upcoming game. This post is not going to cover how to predict a player’s point total in an upcoming game. We are going to assume that in the upcoming game for each eligible player they will score their exact average of previous points scored per game. If you are familiar with R, you can import your own predictions and replace the FPPG column with your own predictions, so you can have your own 100% proprietary optimizer.

On a high level, here is what we will do: import the players csv list, make a couple quick changes to it, then build on it for it to be ready for the lineup to be optimized by maximum fantasy points while adhering to all the constraints. To do the actual optimization, we are going to use an R package called lpSolve using the lp function. We make a few additions to our players table, then let the lp function know the specifics of our constraints and it will return our ideal lineup.

The first step is importing the player list and making some easy changes to it. In this example, I remove non-probable starting pitchers because I don’t want to pick a pitcher who is not playing that day.

#Import the Player list
fd_raw <- read.csv('csv/FanDuel-MLB-2020-04-13-44972-players-list.csv', stringsAsFactors = FALSE)
#Remove Non Probable Pitchers
fd_raw <- fd_raw[!(fd_raw$Probable.Pitcher == '' & fd_raw$Position == 'P'),]
#Change variables to appropriate types
fd_raw$Position <- as.factor(fd_raw$Position)
#Focus only on the columns we need
fd_raw <- fd_raw[,c('Position', 'Nickname','FPPG','Salary')]

At this point the data looks like this:

|Position |Nickname         |     FPPG| Salary|
|:--------|:----------------|--------:|------:|
|P        |Justin Verlander | 47.26471|  11800|
|P        |Max Scherzer     | 41.79310|  11400|
|P        |Jacob deGrom     | 41.51515|  11000|
|P        |Lucas Giolito    | 40.17241|   9800|
|P        |Blake Snell      | 29.78261|   9000|
|P        |Corey Kluber     | 24.57143|   8700|

That contains all we need to know to optimize our lineup: the player’s name, position, FPPG, and salary. Position and salary will be constraints and FPPG will be what we are trying to maximize. Now we will set up a data frame in a way so the constraints can be used by the lp function. The lp function doesn’t understand specific positional constraints of a fantasy lineup. That is why we need to convert each position (and other constraints) to a binary indicator. To do that, first we add columns to our existing data frame to each position in our lineup. The first column we add is to be used for the maximum number of players constraint. All players need to be counted for the maximum number of players, so we add a column called “X” and make every value “1” for this column for every player. Then later, we will make sure total of this for every player in our lineup is 9, which is the number of players we need for this specific contest. Next, we will add a indicator column for each position in the lineup of our contest. These columns will have the names of the position as the column name with a “1” indicator if the player in that row is eligible at that position and a “0” if not. Here is how this block of code works. (The first thing I do is assign the FanDuel raw data to another data frame called time_for_lp.)

time_for_lp <- fd_raw
time_for_lp <- cbind(time_for_lp, X=1) #to be used for maximum number of players constraint
time_for_lp <- cbind(time_for_lp,'P' = ifelse(time_for_lp$Position=='P',1,0))
time_for_lp <- cbind(time_for_lp,'OF' = ifelse(time_for_lp$Position=='OF',1,0))
time_for_lp <- cbind(time_for_lp,'SS' = ifelse(time_for_lp$Position=='SS',1,0))
time_for_lp <- cbind(time_for_lp,'1BC' = ifelse(time_for_lp$Position=='1B' | time_for_lp$Position=='C',1,0))
time_for_lp <- cbind(time_for_lp,'2B' = ifelse(time_for_lp$Position=='2B',1,0))
time_for_lp <- cbind(time_for_lp,'3B' = ifelse(time_for_lp$Position=='3B',1,0))

Here is how various rows of the data look now:

|Position |Nickname         |      FPPG| Salary|  X|  P| OF| SS| 1BC| 2B| 3B|
|:--------|:----------------|---------:|------:|--:|--:|--:|--:|---:|--:|--:|
|P        |Justin Verlander | 47.264706|  11800|  1|  1|  0|  0|   0|  0|  0|
|SS       |Trea Turner      | 13.374590|   3900|  1|  0|  0|  1|   0|  0|  0|
|OF       |Hunter Dozier    | 11.071942|   3200|  1|  0|  1|  0|   0|  0|  0|
|P        |Matt Shoemaker   | 34.600000|   6900|  1|  1|  0|  0|   0|  0|  0|
|2B       |Scott Kingery    |  9.962699|   2300|  1|  0|  0|  0|   0|  1|  0|
|C        |Omar Narvaez     |  8.621970|   2100|  1|  0|  0|  0|   1|  0|  0|

Now we are going to use those binary indicators to create constraint matrix specifically to be used by the lp function. This matrix will have a column for every player, with the rows corresponding to their position eligibility as well as including the FPPG and salary.

constraint_matrix <- matrix(c(time_for_lp$'X', 
                              time_for_lp$'P', 
                              time_for_lp$'OF', 
                              time_for_lp$'SS', 
                              time_for_lp$'1BC', 
                              time_for_lp$'2B', 
                              time_for_lp$'3B', 
                              time_for_lp$'Salary') ,
                            nrow=8, byrow = TRUE)

There are 8 total constraints we consider for our lineup: the number of total players, the number of P, the number of OF, the number of SS, the number of 1BC, the number of 2B, the number of 3B, and the total salary. Next let’s set up the directions of the constraints and also the right hand side of those constraint comparisons.

constraint_direction <- c('==',
                          '==',
                          '>=',
                          '>=',
                          '>=',
                          '>=',
                          '>=',
                          '<=')
constraint_righthandside <- c(9,
                              1,
                              3,
                              1,
                              1,
                              1,
                              1,
                              35000)

These vectors have a slot for each constraint. The first slot is the number of players constraint, the second slot is the number of pitchers constraint, the third slot is the number of OF constraint, and so on. Meaning, for the first slot, the total number of players must equal 9. The number of pitchers must equal 1. The number of OF must be greater than or equal to 3 (this is to account for Util position in the lineup). In the 8th spot in the vector, we see salary must be less than or equal to 35,000. The last thing we do is set a variable for what our objective is: to have the most possible FPPG.

dfs_obj <- time_for_lp$FPPG

Now we are ready to perform the lp function

sol <- lp(direction = "max", dfs_obj, 
          constraint_matrix, constraint_direction, constraint_righthandside,   
          all.bin = TRUE)    

You can read more about how the lp function works in the documentation. For now just know that how it is entered above will provide you with the optimal DFS lineup. To see what this lineup looks like there is just a couple more steps to take. We take the optimal solution indicators and apply them to our original data frame to print the players names.

inds <- which(sol$solution == 1)
solution<-time_for_lp[inds, ]
solution

The solution looks like this

|Position |Nickname         |      FPPG| Salary|  X|  P| OF| SS| 1BC| 2B| 3B|
|:--------|:----------------|---------:|------:|--:|--:|--:|--:|---:|--:|--:|
|P        |Justin Verlander | 47.264706|  11800|  1|  1|  0|  0|   0|  0|  0|
|3B       |Alex Bregman     | 13.976923|   3500|  1|  0|  0|  0|   0|  0|  1|
|OF       |George Springer  | 14.321311|   3400|  1|  0|  1|  0|   0|  0|  0|
|OF       |Ronald Acuna Jr. | 14.159615|   3300|  1|  0|  1|  0|   0|  0|  0|
|OF       |Yordan Alvarez   | 14.202299|   3100|  1|  0|  1|  0|   0|  0|  0|
|2B       |DJ LeMahieu      | 12.522759|   3100|  1|  0|  0|  0|   0|  1|  0|
|C        |Mitch Garver     | 12.736559|   2800|  1|  0|  0|  0|   1|  0|  0|
|C        |Tom Murphy       |  9.623684|   2000|  1|  0|  0|  0|   1|  0|  0|
|SS       |J.P. Crawford    |  9.113978|   2000|  1|  0|  0|  1|   0|  0|  0|

If we knew that the FPPG column would be the actual amount each player would score in their upcoming game, then this would be the best possible lineup possible. In the case of this lineup, we see there is two catchers (1BC): this is because one can be used for the Util position while the other is for the 1BC position.

These two websites helped me a lot in figuring this out: Optimizing Fanduel in R - troyhernandez.com/2016/01/06/optimizing-fanduel-in-r/ lpsolve—Daily-Fantasy-Sports-Optimization - github.com/MattBrown88/lpsolve—Daily-Fantasy-Sports-Optimization DFS-Optimizers - https://github.com/Firkz/DFS-Optimizers