I have an R data frame that I need to perform a random binomial draw for each row. The n =
argument in the random binomial draw will be based on a value in a column of that row. Further, this operation should be within a case_when()
based upon a conditional in the data.
Note: R's rowwise()
function in tidyverse
is much too slow, the data frame is too large and is being performed at each timestep in a simulation model. Is there a way to quickly and efficiently do this?
Example:
library(tidyverse)df = data.frame(condition = c("A","B","A","B","C"), number = c(1000,1000,1000,1000,1))prob1 = 0.000517143prob2 = 0.000213472set.seed(1)df = df %>% mutate(output = case_when(condition == "A" ~ sum(rbinom(n = number, size = 1, prob = prob1)), condition == "B" ~ sum(rbinom(n = number, size = 1, prob = prob2)), TRUE ~ 0))print(df)#> condition number output#> 1 A 1000 0#> 2 B 1000 0#> 3 A 1000 0#> 4 B 1000 0#> 5 C 1 0
Here, it looks like the random binomial draws are being reused and returning all zeros.
For a check, here it is sampled repeatedly. Feasibly, the sum(df$output)
should be around 2 each draw.
for(i in 1:10){ df = df %>% mutate(output = case_when(condition == "A" ~ sum(rbinom(n = number, size = 1, prob = prob1)), condition == "B" ~ sum(rbinom(n = number, size = 1, prob = prob2)), TRUE ~ 0)) print(sum(df$output))}#> [1] 0#> [1] 0#> [1] 0#> [1] 0#> [1] 0#> [1] 0#> [1] 0#> [1] 0#> [1] 0#> [1] 0
Unsure of the way forward.