This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. The [R-markdown source code](`r paste0("https://raw.githubusercontent.com/hoehleatsu/hoehleatsu.github.io/master/_source/",current_input())`) of this blog is available under a [GNU General Public License (GPL v3)](https://www.gnu.org/licenses/gpl-3.0.html) license from GitHub. ## Introduction As a pandemic replacement for the cross-division-chance-encounter in the coffee kitchen, it has become common to use a virtual social mixer format, where the $n$ participants are randomly divided into groups of at least $m$ people, e.g., for a video meeting or Zoom breakout rooms. Software exists to institutionalize such [Random Lunches](https://de.wikipedia.org/wiki/Random_Lunch) for, e.g., [Slack](https://lunchroulette.co/). In this post we shall be interested in the properties of such group assignment algorithms. A practical experience is that all too often, the algorithm assigns me to a group where at least one of the participants was already in my group the last time. Before accusing the programmers of writing a poor grouping algorithm: can we determine mathematically what the probability for such a "reunion" is? How can we modify the algorithm to avoid this? #### Survey of the field Arranging a fixed set of $n=k\times m$ individuals into $k$ groups of size $m$ in subsequent rounds ensuring that no overlap occurs between the groups for as many rounds as possible is also known as the [social golfer problem](http://www.mathpuzzle.com/MAA/54-Golf%20Tournaments/mathgames_08_14_07.html) or round robin scheduling in combinatorics - and can be solved using [constraint programming](https://www.metalevel.at/mst.pdf). Compared to the social golfer problem, we have the additional challenge that the group of people between rounds may be subject to change. Furthermore, $m$ might not be a divisor of $n$. Hence, somewhat closer to our problem is the so called [maximally diverse grouping problem](http://grafo.etsii.urjc.es/index.php/category/maximally-diverse-grouping/), where the grouping is to be done such that the requirements on group size are met and a diversity score over the elements in each group is maximized. For our setup, the score could, e.g., be time since the last meet between individuals. Integer programming can be used to solve the NP-complete problem. However, in our application we will either ignore the diversity score altogether (i.e. set it to 1 always) or use a 0/1 diversity score (met last time or not). Furthermore, we are interested in random assignment algorithms selecting uniformly among the valid configurations. As a consequence, we choose a more probabilistic approach and instead use a custom group assignment algorithm as described in the next section. ### The Grouping ```{r make_participants, eval=FALSE} n <- 47 people <- tibble(id = str_c("id", sprintf("%0.2d",seq_len(n)))) ``` ```{r make_participants_block, results="hide"} set.seed(1234) <

*At 15:00 We will use the breakout room Zoom function that randomly divides the participants into smaller groups (5 persons). At 15:15 and 15:30 we will randomize to form new groups. The natural question that arises is: what is the probability that you and some other colleague spend the entire 45 minutes together?*

```{r compute_xmas_prob, cache=TRUE, message=FALSE, results="hide", warning=FALSE} # Simulation function for the above described Zoom Xmas party setup xmas_puzzle <- function(people) { # Draw the 3 breakout room assignments. We assume same ppl in all 3 sessions groups_r1 <- sample_groups(people, m=5) groups_r2 <- sample_groups(people, m=5) groups_r3 <- sample_groups(people, m=5) me <- people %>% slice(1) %>% pull(id) #Helper function to determine ids of those in the same group as me. my_group <- function(groups, me) { map(groups, function(g) if (me %in% g) return(g) else return(NULL)) %>% unlist() } idx1 <- my_group(groups=groups_r1, me=me) idx2 <- my_group(groups=groups_r2, me=me) idx3 <- my_group(groups=groups_r3, me=me) always_meet <- idx1 %>% intersect(idx2) %>% intersect(idx3) %>% setdiff(me) return(length(always_meet) > 0) } # Define participants xmas_people <- tibble(id = 1:30) # Parallelize samples - see e.g. https://www.jottr.org/2018/06/23/future.apply_1.0.0/ library(future.apply) plan(multiprocess, workers = 4) ## Parallelize using four cores p_always_meet <- future_replicate(1e4, xmas_puzzle(xmas_people), future.seed = TRUE) %>% mean() p_always_meet ``` Through simulations (details in the [code](`r paste0("https://raw.githubusercontent.com/hoehleatsu/hoehleatsu.github.io/master/_source/",current_input())`)) built on top of the `sample_groups` function we determine that the requested probability when $n=`r nrow(xmas_people)`$ participate in the X-mas party is `r scales::percent(p_always_meet, accuracy=0.1)`. [^1]: As an example consider the case $n=7$ and $m=4$, where we according to our specification would form 1 group with 7 members. [^2]: We will assume that the $l$ individuals you met last week all participate again this week. [^3]: Thanks to [u/antichain](https://www.reddit.com/r/math/comments/mjy7nx/probability_to_meet_someone_again_when_assigning/gtd33xi?utm_source=share&utm_medium=web2x&context=3) for the question and to [u/assiraN](https://www.reddit.com/r/math/comments/mjy7nx/probability_to_meet_someone_again_when_assigning/gtd4c8t?utm_source=share&utm_medium=web2x&context=3) for the nice answer with examples, which I've integrated into the post. ## Literature