Logic of the Mark Recapture Method
Imagine that there is a jar of marbles and you have been asked to guess how many there are in the jar. Let’s say you are not allowed to count them all, but you are permitted to sample them. A sample is a fraction of the entire group that hopefully represents the group. As it turns out there is a well-established method you can use to estimate the number of marbles. It is called Mark-Recapture. Here’s how it works:
First you remove a small subset of your marble population and mark them in some way that will not disappear. This is often called the first sample. Lets say you marked 20 marbles.
You then replace these in the jar with the remaining marbles and thoroughly mix them up.
Next, you randomly remove a set number of marbles and note whether they are marked or unmarked. Lets say you remove 10 marbles and 2 of these are marked.
The ratio of marked to unmarked marbles in this sample gives you a good clue as to how many marbles there are in the jar.
The logic is the same as that used in any situation dealing with proportions. If 20% of a jar consists of marked marbles then on average 2 out of every 10 marbles taken out of the jar should be marked. This is what our hypothetical sample tells us.
Recall a proportion can be calculated the following way:
# of Marked marbles
= Proportion of marked marbles
in jar
total # of marbles
You have gathered two of the three elements to this equation. You know how many marbles you marked (20 marbles). You also have an estimate of the proportion of marked marbles in the jar after sampling (20% or 0.20 = 2/10).
Using basic algebra you can calculate the last variable (x): the total # of marbles:
20 marbles = 0.20 {Multiply both sides by x}
x
20 marbles = 0.20(x) {divide both sides by 0.20 to get x by itself}
20 marbles = x = 100 marbles
0.20
Thus you have estimated that there are 100 marbles in the population. To make this estimate you have made a critical assumption. You have assumed that the proportion of marked marbles in your sample of 10 marbles is representative of the entire jar. If the jar is randomly mixed after adding the marked marbles and you randomly selected your ten marbles then this should be true on average (but see the discussion below on variance).
Scientists have created a standard formula for utilizing this method in real situations. The formula looks like this:
![]()
Where,
M is the number of individuals captured, marked and then released back into the population (first sample).
N is the total population size.
R is the number of marked individuals captured during the second sampling period.
n is the total number of all individuals captured during the second sampling period.
Thus, in words, the above equation states that the proportion of marked individuals in the total populations is equal to the proportion of marked individuals in the sample population.
For this relationship to be true there are several assumptions that must also be true.
Be sure that you fully understand these assumptions and how each one can affect the resulting estimate (over or under estimation). There are many other scenarios such as immigration and emigration that also lead to violations.
One Final Note: For statistical reasons beyond the scope of this class the Mark Recapture formula given above is biased; that is, it tends to overestimate the population. Don’t worry why. To correct this a small change is made to the formula. The equation below is the unbiased equation that is used by ecologists and that we shall use in this lab. Notice the variables have been algebraically rearranged to get the estimate of the population (Ni) alone.
The
Mark Recapture Formula
A single sample estimate is just that, an estimate. If you toss a penny in the air you know that 50% of the time you should get heads. However, you probably will not get 5 heads in 10 tosses. Instead you may get 4 of 10 or 7 of 10. This variation from the expected is due to random chance. This same situation exists in real populations even when all assumptions are true. Just by random chance you may find more or less marked individuals then you should have. That is why each group will likely have a different estimate for the population.
We can use everyone’s estimates to come up with a class average (Ng) that will hopefully be a closer estimate of the true population:
{this
is just like calculating any other average}
The summation symbol, ∑, simply means sum all the individual groups estimates of the population size. You then take this sum and divide by the total number of groups (n).
Now it is often a good idea to also get an estimate of how variable our class estimate is. What we mean by this is how much variation existed between the group estimates. Knowing this gives us a better picture of how good our estimate is. The calculation of this variability is a bit more complicated. In words, the variance is the sum of the differences between the class mean and each of the group’s mean estimate squared and all divided by the total number of groups. This is best calculated by constructing a table as follows:
|
Group |
Group estimate |
Difference in means |
Squared difference |
|
1 |
N1 |
Ng – N1 |
(Ng – N1)2 |
|
2 |
N2 |
Ng – N2 |
(Ng – N2)2 |
|
3 |
N3 |
Ng – N3 |
(Ng – N3)2 |
|
…n |
…Nn |
…Ng – Nn |
…(Ng – Nn)2 |
|
|
Ng = (N1+N2+N3+…)/n |
|
SSD = add the above results |
|
|
|
|
Variance = SSD / n |
To calculate the variance, add the Squared differences for
each group (last column). We call this
the summed squared difference or SSD.
Take the SSD and divide it by the total number of groups (n). The result is the variance often symbolized
as (s2). The square root of
this number is called the standard deviation: s =
. Another measure is
standard error: sx =
. The standard error
is a useful measure. It can be used to
calculate one more measure called the 95% confidence interval = Ng ± 1.96 (sx)
So if we estimate the population and 95% CI to be 150 ± 20, then the actual population size likely lies between 130 and 170 individuals… at least we are 95% sure that its in this range. Obviously, the narrower the range the better our estimate is, or the more precise it is.