There are two main issues in modeling this in any realistic way. Players are not equally skillful; and the set of players will change from one year to the next. If we ignore realism, and suppose that players are equally good and the same players compete each year, then it is easy to see heuristically that the number of coincidences, meaning the same 2 players meet in different years, should have approximiately Poisson(2) distribution, and simulations confirm this approximation (first row of table below).
Realistic modeling of skill variability is problematical. I use the model from section 1.3 of this paper. The model has a free "variability" parameter most easily interpreted via
variability = P(top seed beats second seed) in a matchaveraged over tournaments. So variability = 0.5 is the "equal skill" extreme. The table shows (from simulations) the distribution of number of coincidences
|variability = 0.5||0.14||0.27||0.27||0.20||0.07||0.4||0.1||0.0||0.0||1.95|
|variability = 0.6||0.09||0.22||0.26||0.22||0.12||0.06||0.02||0.01||0.0||2.37|
|variability = 0.65||0.05||0.14||0.23||0.23||0.17||0.10||0.05||0.02||0.01||2.99|
|variability = 0.7||0.02||0.06||0.17||0.22||0.21||0.15||0.08||0.04||0.05||3.87|
As intuition suggests, as variability increases the better players are more likely to get through early rounds and so are more likely to meet in later rounds, increasing the overall likely number of coincidences.
Regarding different players in different years, if a proportion p play both years, and if this is independent of skill, then we expect the effect to be that the mean number of coincidences is multiplied by p^2. But in reality it will surely be the better players who tend to remain for the next year.
The bottom line is that, as usual, coincidences are more likely than you think. The beta = 0.65 model seems a rough fit to Grand Slam tennis tournaments, so even allowing for player turnover, this coincidence is more likely than not.
Anyone willing to find some data?