Interest in this topic was apparently rekindled by this press release from the University of Adelaide whose key assertion was the likelihood that two people share the exact same face is in excess of [actually meaning "less than"] one in a trillion. This was based on the 2015 scientific paper Are human faces unique? A metric approach to finding single individuals without duplicates in large samples by Teghan Lucas and Maciej Henneberg. That paper examined a database of around 4,000 faces and made 8 measurements (e.g. ear length) of each individual, recorded to the nearest millimeter. By examining whether there were any two individuals with exactly the same values for 4 or 5 of those measurements, and then extrapolating via some kind of mathematical model, the following conclusion (edited for clarity) was stated:
(*) The probability of finding a duplicate of a given face (that is, these 8 measurements are identical) is less than 1/(Earth population), implying that this method of facial identification is as reliable as that of DNA (because it is unlikely that there exist any identical pair)A subsequent critique Reply to Lucas & Henneberg: Are human faces unique? by statisticians Ronald Meester et al argues (again edited for clarity)
A variety of related reasons show that this central claim (*) is unsubstantiated.I believe that most academic statisticians would agree with this critique, and therefore that (*) does not represent any kind of consensus scientific opinion.
(1) The absence of any mathematical model for facial profiles,
(2) problems in the determination of the match probability
(3) the extrapolation
(4) unsubstantiated claims in the popular press release.
The conceptual difficulty with any serious analysis is to specify what exactly it means to be a doppelganger, to say two faces "look very similar". If one judges by physical measurements, then any very precise measurementof two people would be different, even for identical twins (who are implicitly not counted as doppelgangers). So one needs to quantity "how similar", and that part of the Lucas & Henneberg paper is reasonable (they measured to the nearest millimeter). But judging by human perception might give quite different results.
Model. There is some unknown (and very large) number N of possible faces, meaning that any individual face is "very similar" to one of these possible faces, and that two individuals are deemed doppelgangers if each of their faces is "very similar" to one of the possible faces. Then suppose that each face in the world population (size M, approximately 8 billion) is like a random pick from the N possible faces.
Analysis.
The key mathematical point is that there are three quite different events,
with different probabilities, that one can consider within this model.
(E1) There is at least one doppelganger pair, somewhere in the world.
(E2) A typical person ("you") has at least one doppelganger.
(E3) Every person in the world has at least one doppelganger.
The probabilities of these events depend on M (which we know) and on N (which we don't know).
We know how to calculate such probabilities (in terms of M and N)
because this mathematical structure arises often.
In fact
(E1) is an instance of the
birthday problem
(E2) is an instance of the
binomial distribution
(E3) is a variant of the
coupon collector's problem (CCP).
It is important to note that N depends on how we judge "very similar" and that we would need some kind of
real world data to make a numerical estimate of N.
However we can make a start without data.
For each of these events there is some critical value of N, in the following sense:
if the true value of N is much less than the critical value, then the event is very likely,This is qualitatively obvious, in that the more possible faces there are, the less likely doppelgangers will be. Then within our model, we can calculate these critical values.
whereas if the true value of N is much larger than the critical value, then the event is very unlikely.
For (E1) the critical value of N is 5 x 10^{19} = 50 billion billion.
To me this seems implausibly large, so I personally am very happy to believe that some doppelgangers exist.
For (E2) the critical value of N is 11 billion.
For (E3) the critical value of N is 400 million.
To me, it is hard to guess whether N is larger or smaller than the values for (E2) or (E3) above.
To quote and edit from a standard account:
The CCP asks for the number of trials M
needed to collect all N different coupons, when each coupon is equally likely to be obtained in each trial.
The solution to the coupon collector's problem is
M = N * log(N) approximately.
In our setting we use this solution “backwards”.
We take
N = number of possible faces
M = world population.
Because we know M = 8 billion then we can use the solution to obtain N = 400 million.
Again, note this is all “approximate”.