Show Summary Details
Page of

Disease Frequency: Basic 

Disease Frequency: Basic
Disease Frequency: Basic

Noel S. Weiss

and Thomas D. Koepsell

Page of

PRINTED FROM OXFORD MEDICINE ONLINE ( © Oxford University Press, 2016. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Medicine Online for personal use (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 23 October 2019

In your otherwise beautiful poem, there is a verse which reads:

  • Every moment dies a man

  • Every moment one is born

It must be manifest that, were this true, the population of the world would be at a standstill. In truth the rate of birth is slightly in excess of that of death. I would suggest that in the next edition of your poem, you have it read:

  • Every moment dies a man

  • Every moment 1–1/16 is born

Strictly speaking this is not correct. The actual figure is a decimal so long that I cannot get it on the line, but I believe 1–1/16 will be sufficiently accurate for poetry. I am, etc.

charles babbage, inventor of the first programmable computer, in a letter to poet Alfred, Lord Tennyson

Quantitative measures of disease frequency in a population are some of the most basic tools of epidemiology, and any epidemiologist needs to be skilled in their use. One aspect of skill is being able to choose the right tool for a job. This chapter provides an overview of the most commonly used measures of disease frequency, including what kind of question each measure answers, what kind of information it requires as input, and examples of its use. Finer details are omitted for now in an attempt to convey the “big picture.”

Another aspect of skill is knowing your tools well. Chapter 4 discusses in more depth several of the measures introduced in this chapter, including some of their statistical properties and useful relationships among them.


In broad terms, most measures of disease frequency answer one of two kinds of questions. First, how common is a given disease as of a certain time? For example:

  • Suppose that a newly published study shows that surgical bypass of atherosclerotic lesions in the carotid arteries is effective for preventing stroke. The medical director of a health insurance plan wants to know: How many of our enrollees currently have carotid atherosclerosis that would make them candidates for this surgery?

  • Suppose that an international medical aid organization seeks to reduce chronic parasitic infection among children in villages in a less-developed country. They ask: What proportion of children are currently infected? In which villages is the proportion especially high?

These kinds of questions are answered by measures of disease prevalence, which quantifies the frequency of disease as of a certain time. Prevalence is a static measure: time is “frozen.” Anyone who qualifies as being in the diseased state at the specified time is counted as a case. In the two examples above, the time point of interest is the present, but prevalence can also be applied to time on any of several time scales, including calendar time, age, or time after some salient event.

Second, how frequently do new cases of disease arise in a population as time passes? For example:

  • Suppose that a health insurance plan has decided to fund a self-care program, designed to train people with newly diagnosed diabetes about how to monitor their glucose level and adjust their diet and insulin dosage. The plan’s medical director asks: How many new cases of diabetes would be expected over the next year?

  • Suppose that a few cases of connective-tissue disease have been reported in women with silicone breast implants. Affected women and their doctors may ask: Is connective-tissue disease any more likely to develop over time among women with such implants than among women without them?

These kinds of questions are best answered by measures of disease incidence, which concerns how often disease events occur over a period of time. Incidence is a dynamic measure, always involving the passage of time. The disease event of interest may be an instantaneous occurrence, such as death, or it may be the onset of a more persistent disease state, such as onset of diabetes. Sometimes only the frequency of fatal cases is of interest (or is the only information available). If so, measures of mortality—really a subtype of incidence that considers only fatal cases—are used.

In terms of the state/transition disease models considered in Chapter 2, prevalence concerns how a population of interest is distributed among the compartments (e.g., Figure 2.4) at some point in time: in particular, the number of people or the proportion of the population in the diseased state at that time. Incidence concerns the rate of flow along the arrow from the susceptible compartment to the diseased compartment. Each time someone changes from the susceptible state to the diseased state, a new disease event or incident case occurs.

Regardless of whether the need calls for prevalence or incidence, sometimes just counting the number of disease cases is sufficient. For example, the health plan medical director mentioned above simply needed to know the number of existing cases of carotid atherosclerosis, or the number of new cases of diabetes to expect over some time period.

But in many other situations, comparisons of disease frequency need to be made between populations of different sizes and/or that are observed over different periods of time. Case counts alone would not account for these differences. Instead, more valid comparisons require measures that relate the number of cases to the size of the population at risk and (for incidence) the amount of time over which they are observed. Most such measures take the form of a fraction: the numerator is the number of cases, and the denominator is the “base” for that number of cases. For prevalence, the denominator is simply the size of the population. For incidence, the denominator quantifies in some way the amount of at-risk experience that generated those cases. Depending on the type of incidence measure, the denominator can be either the number of people initially at risk, or the total amount of person-time at risk experienced by population members during the time period of interest.

Sometimes information needed for the desired denominator—such as the size of the true population at risk—is unobtainable. Instead, a proxy denominator may be available that is better than no denominator at all, particularly if it is safe to assume that the proxy denominator will be approximately proportional to the true one.

All of the measures described below can be applied either to a full population or to each of several subpopulations within it—e.g., to each of several age groups, or to males and females separately. Doing so permits comparisons of disease frequency among those subpopulations, even if they have different sizes. Whenever a subpopulation-specific measure is computed, both its numerator and its denominator (if there is one) are restricted to members of that subpopulation. In other words, each subpopulation is treated as a mini-population in its own right.


The count of prevalent cases of a disease is the number of people who are in the diseased state at a specified time. Prevalence is a proportion, obtained by dividing the count of prevalent cases by the population size at that time:

Prevalence=Number of prevalent casesSize of population

Prevalence can be visualized in terms of line diagrams. On an appropriate time scale, the point in time at which prevalence is assessed determines the horizontal position of a vertical line that cuts across the time lines for all population members. In Figure 3.1, two of five people are in the diseased state at that time, so the prevalence would be 2/5 = 40%.

Example 3-1. Concerned about infections acquired in health care settings, the Veterans Health Administration commissioned a study of the prevalence of nursing home–acquired infections in its 133 nursing homes (Tsan et al., 2008). On November 9, 2005, personnel at all such VA facilities reviewed the medical records of all 11,475 residents on that day to determine each resident’s infection status, according to a standardized case definition. A total of 591 residents qualified as having at least one nursing home–acquired infection on that day, for an overall prevalence of 591/11,475 = 5.2%. The three most common types of infection were symptomatic urinary tract infection (181 cases, 1.6%), asymptomatic bacteriuria (79 cases, 0.7%), and pneumonia (60 cases, 0.5%). All of these prevalence estimates pertained to a single calendar day: November 9, 2005—in effect, a point in calendar time.

Prevalence involves “stopping the clock” and assessing disease frequency at a point in time. However, the point in time need not necessarily be a point in calendar time. It can can refer instead to a point on another relevant time scale, as illustrated in the following example:

Example 3-2. In 1944, the cities of Newburgh and Kingston, New York, took part in a study of the effects of water fluoridation for prevention of tooth decay in children (Ast and Schlesinger, 1956). Initially, the water in both cities had low fluoride concentration. In 1945, Newburgh began adding fluoride to its water to increase the fluoride concentration tenfold, while Kingston left its water supply unchanged. At baseline, the frequency of dental caries among children in both cities appeared to be similar. To assess the effect of water fluoridation, a dental health survey was conducted among all schoolchildren in certain grades in both cities during the 1954–1955 school year. One measure of dental decay in children aged 6–9 years was whether at least one of a child’s deciduous cuspids or first or second deciduous molars was missing or had clinical or X-ray evidence of caries. Of the 216 first-graders examined in Kingston, 192 had decay by this definition, compared with 116 of the 184 first-graders examined in Newburgh.

Overall, there were 192 prevalent cases of dental decay among first-graders in Kingston at the time of the study and 116 in Newburgh. These counts themselves could be useful to local health officials for estimating the number of dental personnel and other resources needed to provide restorative dental care for children in each city. However, a fair comparison of the frequency of dental decay in the two cities would need to account for differences in the number of children examined. Prevalence serves this purpose. The prevalence of dental decay was 192/216 = 89% in Kingston and 116/184 = 63% in Newburgh.

Figure 3.2 diagrams the data collection process in, say, Newburgh for the dental-decay example. A total of 184 first-graders were examined, each corresponding to a row in the figure, arranged here in chronological order by examination date. The dental-decay status of each child was known only at his or her survey examination, shown as a small “window” through which we glimpse a tiny portion of the child’s dental-disease time line. As in Chapter 2, that line is thick if the child was a case at the time, and thin if not. Given the brevity of the examination in relation to the pace at which dental decay develops, in effect each child’s disease status was assessed at a point in time. The rest of his or her time line was unobserved, as implied by dots to the left and right of the window.

Figure 3.2 Diagram of data collection for dental survey of Newburgh, NY, first-graders.

Figure 3.2
Diagram of data collection for dental survey of Newburgh, NY, first-graders.

The figure shows that these examinations were not all done simultaneously—which would have required 184 examination teams—or even on the same day. Instead, they were distributed over several months during the school year as the examiners worked their way through different schools and grades. The point in time to which the prevalence refers is thus not a point in calendar time. Nor is it a point on the age time scale: the first-graders were examined at various ages, albeit within a fairly narrow range. Rather, the point in time is the time of examination for each child. The key feature is the fact that each child’s disease status was observed only as of one point in that child’s lifetime, not monitored over a period of time. The calendar time period and age range over which the examinations were done are relevant as descriptors, along with place and other population-defining characteristics, to put the prevalence estimate in its proper context. Prevalence can be compared across time periods or age groups, just as can other disease frequency measures.

The format in which prevalence is expressed can be chosen for convenience to avoid an awkward number of leading or trailing zeros, or for ease of comparison with other published estimates. For example, the prevalence of dental decay among Newburgh first-graders could be expressed as 63%, as 0.63, as 630 per 1000, as 6,300 per 10,000, etc.


Incidence concerns how frequently people who are at risk for disease become disease cases during a defined period of observation. Incidence is based on disease events, each representing a transition from being at risk to being diseased.

An incident case occurs when an individual changes from being susceptible to being diseased, by the study’s case definition. The count of incident cases is the number of such events that occur in a defined population during a specified time period. Recurrent disease events in the same person may or may not qualify as incident cases, depending on the study’s purpose and case definition.

A simple count of incident cases can sometimes be sufficient to quantify the extent of a problem or to guide health planning. For example, knowing the number of lower-extremity amputations per year in a certain health plan could be used to project the number of limb prostheses likely to be needed. Comparing the counts of incident cases across different diseases can also reflect patterns of relative incidence if the diseases in question share essentially the same population at risk.

Example 3-3. In 2010, 1,307,893 new cases of genital Chlamydia trachomatis infection were reported to the United States Centers for Disease Control and Prevention, compared with 309,341 new cases of gonorrhea (Centers for Disease Control and Prevention, 2011). Assuming similar completeness of reporting for both diseases, these counts by themselves would support a conclusion that the incidence of genital C. trachomatis infection was about 4.2 times as high as the incidence of gonorrhea. This is because the sizes of the populations at risk for each disease would be the same (or nearly so, after subtracting prevalent cases).

Counts may also be adequate for comparing incidence among populations that can safely be assumed to be of similar size.

Example 3-4. Over a two-year period, Gruska et al. (2005) identified all episodes of out-of-hospital cardiac arrest in greater Vienna, Austria, through the Municipal Ambulance Service, which handles almost all calls for emergency medical assistance in the city. The 1,498 arrest episodes were distributed among the days of the week as shown in Figure 3.3. Significantly more cardiac arrests occurred on a Monday than on any other day of the week.

Figure 3.3 Occurrence of out-of-hospital cardiac arrests by day of the week: Vienna, Austria, 1995–1996.

Figure 3.3
Occurrence of out-of-hospital cardiac arrests by day of the week: Vienna, Austria, 1995–1996.

(Based on data from Gruska et al., 2005)

Although no attempt was made to quantify the size of the Viennese population at risk that generated these cases, it is probably safe to assume that the size of population at risk was approximately constant among days of the week. Thus, even without a denominator, it is very likely that the day-to-day variation in case counts alone reflects true variation in incidence, highest on Mondays.

Very often, however, one needs to compare the frequency of new disease occurrence between populations of different sizes, or over time periods of different durations. Then a simple count of incident cases is inadequate; a denominator is needed. Two main approaches to quantifying incidence are used in such situations: cumulative incidence and incidence rate. The approach used is driven chiefly by whether the defined population is closed or open.

Cumulative Incidence

Cumulative incidence is the proportion of initially susceptible individuals in a closed population who become incident cases during a specified time period.

Cumulative incidence=Number of incident casesNumber of persons initially at risk

Cumulative incidence is also sometimes called incidence proportion or attack rate. It is the simplest measure of incidence that accounts explicitly for the size of the population at risk.

Example 3-5. A jumbo jet full of tourists bound from Tokyo to Copenhagen stopped at Anchorage, Alaska, for refueling and reprovisioning. Upon reaching cruising altitude again, the crew served breakfast. Somewhere over the polar ice cap, an illness characterized by cramps, vomiting, and diarrhea swept through the plane, and by the time they reached Copenhagen, 196/344 = 57% of passengers had become ill. Epidemiologists who investigated the outbreak used interview data and food service records to calculate the cumulative incidence of illness among those who did and those who did not eat various food items. Eating ham proved to be strongly associated with becoming ill. Among those who ate ham that had been prepared by a particular cook, 86% got sick, compared with none of those who ate ham prepared by a different cook. Microbiological tests found heavy staphylococcal contamination of the suspected ham, which was eventually found to have resulted from improper food handling by one of the cooks (Eisenberg et al., 1975).

Figure 2.10 in Chapter 2 illustrates how cumulative incidence relates to line diagrams. It shows the occurrence of gastrointestinal illness in a closed population of picnic attendees. Cumulative incidence can be calculated at any of various time points after the picnic (time 0). Initially, ten people were at risk, which determines the denominator. The one-, two-, three-, and four-day cumulative incidences would be 0/10, 2/10, 3/10, and 3/10, respectively.

Two key features of cumulative incidence are related to the fact that it is a proportion. First, cumulative incidence can be measured directly only in a closed population: the population cannot gain or lose members during the period of follow-up, except for losses that occur after the person has already become a case. Otherwise, the case count in the numerator would not correspond to the defined population at risk in the denominator. For example, if a new member were to join the population partway through follow-up and then become a case, he or she would be added to the numerator, even though he or she had not been counted as a member of the denominator population at risk. (Chapter 4 describes how cumulative incidence can sometimes be estimated indirectly, under certain assumptions, when some of the original population members are lost to follow-up during the study period.)

Second, each person who becomes a case is counted only once in the numerator, even if he/she has more than one disease episode during the period of interest. Without this rule, at the extreme, the numerator could conceivably exceed the denominator, yielding a ratio that would clearly not be interpretable as a proportion.

Ideally, the time period to which cumulative incidence refers should be constant for all members of the study population. For example, the proportion of patients undergoing a surgical procedure who develop deep venous thrombosis during the two weeks after surgery could be termed the “two-week cumulative incidence” of that complication. Nonetheless, cumulative incidence is sometimes calculated when the time period is not strictly constant among individuals. For example, the cumulative incidence of death before discharge among hospitalized patients is sometimes used as a measure of disease severity or outcome. Because of differences in length of hospital stay, the amount of time at risk for death actually varies somewhat among patients.

Incidence Rate

The need for a another kind of incidence measure is motivated by the following hypothetical example.

Example 3-6. Suppose that a sawmill recently installed a fast new saw that greatly improves the rate at which logs can be turned into finished lumber. Unfortunately, from time to time, the blade catches on a flaw in a log and expels wood back toward the operator, sometimes causing injury.

During the first 20 days of machine use, three workers operate the machine. Worker #1 runs it for 7 days until injured on Day 7. Worker #2 runs it for the next 7 days until injured on Day 14. Worker #3 runs it without injury through Day 20. Figure 3.4 shows this information graphically.

Figure 3.4 Two injuries among three workers.

Figure 3.4
Two injuries among three workers.

In response to workers’ concern, the sly sawmill owner makes no changes to the machine but alters employee assignments. For the next 20 workdays, a different worker operates the new saw each day. Injuries occur on days 7 and 14, as shown in Figure 3.5. But at the end of this period, the sawmill owner claims a big reduction in the incidence of injury, from 2/3 = 67% to 2/20 = 10%.

Figure 3.5 Two injuries among 20 workers.

Figure 3.5
Two injuries among 20 workers.

The sawmill owner’s comparison is clearly not a fair one, because the amount of time at risk for injury per worker is much less during the second period than during the first. Another measure of incidence—the incidence rate—can account for this difference.

The incidence rate is the count of incident cases divided by the aggregate amount of at-risk experience from which they arose. Its denominator is usually measured in units of person-time at risk.

Incidence rate=Number of incident casesAmount of at-risk experience

Recurrent disease events in the same person may or may not be counted in the numerator, depending on the study’s purpose and case definition, as discussed in Chapter 2.

The incidence rate also goes by several other names, including incidence density, person-time incidence rate, or sometimes simply incidence. The idea behind it is straightforward. Other things being equal, the number of new cases of disease in a population should be proportional to (1) the size of the population at risk and (2) the amount of time over which susceptible individuals are observed for occurrence of new cases. The denominator of the incidence rate combines these two factors. The number of cases in the numerator is a unitless count, while the denominator is expressed in time units. Hence the incidence rate has units of (time unit)−1, such as years−1 or days−1.

Example 3-7. Gardner et al. (1999) studied on-the-job back sprains and strains among 31,076 material handlers employed by a large retail merchandise chain. Payroll data for the 21-month study period were linked with job injury claims, which provided data on the timing of each injury, body part injured, and mechanism of injury. A total of 767 qualifying back injuries occurred during 54,845,247 work hours, yielding an incidence rate of 1.40 back injuries per 100,000 worker-hours. Higher incidence was found among males and among employees whose work was more physically demanding.

The work force in this example was an open population. Thousands of workers joined or left the company during the study period. Only on-the-job back injuries were of interest, so each worker’s at-risk experience consisted of many discontinuous time periods at work, separated by periods away from work. These features of the situation made an incidence-rate approach to measuring disease frequency attractive and a good match to the available data.

Incidence rates can be used in a wide range of epidemiologic research situations. They can be applied to both closed and open populations, with or without detailed information on time at risk for each individual population member, and for both recurrent or non-recurrent disease events—circumstances in which cumulative incidence may be impossible to apply.

Estimating the Incidence Rate with Detailed Data on Individual Times at Risk

In some research situations, detailed information can be obtained on the amount of time at risk for each individual population member and the timing of each disease event. In the back-injury example above, payroll records provided each worker’s time on the job right down to the hour, and injury claims provided the number and timing of back injuries. The numerator was the total number of qualifying back injuries, and the denominator was the sum of hours worked across all workers.

As noted earlier, recurrent disease events in the same person may or may not qualify for inclusion, depending on the study’s purpose. Whether recurrent cases count or not clearly can affect the numerator. Less obviously, it also can also affect the denominator. Consider the line diagram on the left side of Figure 3.6. In this example, horizontal lines indicate the timespan over which each of five population members is observed, and black dots indicate when cases occur. The table at right in the figure shows the contribution of each person to the numerator and denominator of the incidence rate, depending on whether recurrent cases in the same individual do or do not qualify for inclusion.

Figure 3.6 Calculation of number of cases and person-time, depending on whether recurrent cases do or do not count.

Figure 3.6
Calculation of number of cases and person-time, depending on whether recurrent cases do or do not count.

Contributions to the numerator differ only for person #5, whose second disease event would count under one case definition but not the other. But also note that if only initial cases count, then any person-time after the occurrence of a person’s first disease event is not added to the denominator. This is because any recurrent disease event during that time would not be counted as a case, and therefore that person-time is not time at risk for a qualifying disease event. This rule affects the person-time contribution of any person who has an initial disease event. If the disease is rare, such that only a very small proportion of individuals have disease events, the effect of the difference on total person-time may be small. But for more common diseases, it can be too large to ignore.

Estimating the Incidence Rate Without Detailed Data on Individual Times at Risk

Often detailed information about each population member’s time at risk is unknown and not feasibly obtainable. This problem often arises, for example, when the defined population of interest consists of residents of a geographic area over some time period. The number of incident cases may be readily available, but the larger challenge is to estimate the total amount of person-time at risk from which those cases arose.

Two approaches to estimating total person-time at risk can be explained and justified algebraically. Again, a line-diagram example helps in visualizing how they work. Consider a simple example in which detailed information about each person’s time at risk actually is available. Figure 3.7 shows a small open population of eight people who are at risk over different time intervals during a certain ten-day period. To simplify the arithmetic, we require that on any given day, each person is at risk for the entire day or for none of it, so the number of days at risk for a person can take on only integer values. (In principle, time could be divided into arbitrarily small intervals to sidestep this restriction without changing the method.)

Figure 3.7 Line diagram showing two approaches to calculating total person-time.

Figure 3.7
Line diagram showing two approaches to calculating total person-time.

Total person-time at risk—here, 32 person-days—can be calculated either of two ways. First, each individual’s contribution of time at risk can be determined, then these values can be summed across all individuals. This method corresponds to the column of numbers at the right. Second, the number of individuals at risk on each day can be determined, then these values can be summed across all days. This method corresponds to the row of numbers at the bottom.

More formally, let xij = 1 if person i is at risk on day j, or 0 otherwise. Let N be the total number of individuals who contribute any person-time (here, N = 8); let D be the duration of the study time period (here, D = 10 days); and let T be total person-time at risk (here, T = 32 person-days). Then:

which are the two approaches just described for calculating T.

Now define:

d¯=average time at risk per person=T/Nn¯=average number of people at risk per day=T/N

In the example, d¯=32/8=4 days at risk per person, and n¯=32/10=3.2 persons at risk per day. These definitions can be rearranged to give two expressions for T: (3.1)


Equations (3.1) and (3.2) lead to two strategies for estimating T without detailed information on each person:

  1. 1. N may be known exactly and ̄d approximately. For example, a published paper may report that 120 people were monitored for deep vein thrombosis for an average of 2.8 years, and that 6 cases were detected. Even without knowing exactly how long each study participant was followed, the incidence rate of deep vein thrombosis can be estimated as 6120×2.8 years=0.018=1.8 cases per 100 person-years.

  2. 2. D may be known exactly and ̄n approximately. For example, suppose that 12 new cases of West Nile virus infection are identified during a given year in a certain state, and that the state’s population at mid-year is known from census data to be about 2.4 million people. Technically, 2.4 million may not be the exact average of the state’s population during that year, but it is probably a pretty good approximation. The incidence rate of West Nile virus infection can then be estimated as 122,400,000×year=5 cases per million person-years.

Sometimes more than a single mid-period estimate of the size of the population at risk may be available, permitting a more refined estimate of total person-time based on method 2. For example, suppose that estimates of the size of the population at risk are available for the beginning and end of a period, but not at any intermediate time points, as shown graphically in Figure 3.8. Total person-time corresponds to the area of the shaded trapezoid, which would be 1000+8002×2 person-years. The first factor can be seen to be the average population at risk during the time period, assuming a linear decline over time.

Figure 3.8 Estimating person-time when only start-of-period and end-of-period estimates of population at risk are available.

Figure 3.8
Estimating person-time when only start-of-period and end-of-period estimates of population at risk are available.

This method of approximation can be extended to make use of multiple population-size estimates over time, possibly at irregular time intervals. In Figure 3.9, total person-time at risk would be approximated as the summed area of the five trapezoids:

T=[ N0+N12×(t1t0) ]+[ N1+N22×(t2t1) ]++[ N4+N52×(t5t4) ]

Figure 3.9 Estimating person-time from estimates of population at risk at multiple time points.

Figure 3.9
Estimating person-time from estimates of population at risk at multiple time points.

For some diseases, the prevalence of disease may be high enough, or a separate not-at-risk state common enough, that the discrepancy between total population size and size of the true population at risk is too large to ignore. Corrections may then need to be based on the estimated prevalence of disease or the estimated proportion of the population that is not at risk. For example, the estimated incidence of dementia in the elderly has been found to increase considerably when prevalent cases of dementia are subtracted from the denominator (Rocca et al., 1998). For uterine cancer, higher and almost certainly more accurate incidence estimates have been obtained when the estimated number of women who no longer have a uterus (due to prior hysterectomy) were subtracted from the denominator (Marrett, 1980).

Denominators Other Than Person-Time

In some areas of epidemiologic research, including study of injuries, denominators other than person-time are often used to quantify the amount of at-risk experience from which a set of incident cases arose. For example, the incidence of motor-vehicle collision injuries can be expressed as injuries per 100,000 person-years, as injuries per 100,000 licensed-driver-years, or as injuries per million vehicle-miles traveled. The extent to which older adults are a high-risk group for motor-vehicle collision injuries has been shown to depend on which denominator is used (Massie et al., 1995). At the time of the study, a smaller percentage of older adults than of younger adults had a valid driver’s license. Moreover, population surveys showed that even those older adults who did have a driver’s licence drove fewer miles per year than did younger drivers. Hence the increase in incidence by age was more marked when the denominator was vehicle-miles traveled.

Comparison of Cumulative Incidence and Incidence Rate

The differences between cumulative incidence and the incidence rate are both conceptual and statistical. These distinctions were appreciated by early epidemiologists and health statisticians (Vandenbroucke, 1985). Table 3.1 summarizes and contrasts several properties of these two measures of incidence. Despite the differences, the generic term incidence is widely applied to both cumulative incidence and incidence rate throughout the medical literature. The specific kind of incidence being discussed must often be inferred from the context.

Table 3.1. Comparison of Cumulative Incidence and Incidence Rate


Cumulative incidence

Incidence rate



(Time unit)−1




Directly calculable by:

Observing a closed population over time

Observing a closed or open population over time with detailed data on individual times at risk

Indirectly calculable by:

Survival-analysis methods in presence of censoringa

Estimating total person-time as (average size of population at risk) × (duration of observation period)

Individual-level counterpart

Risk (probability)

Hazard ratea

a Discussed in Chapter 4

Chapter 4 describes how confidence limits for incidence rates can be obtained; how cumulative incidence and the incidence rate are related mathematically; how, under certain assumptions, one can be computed from the other; and how incidence rates in a population have a counterpart (the hazard rate) at the individual level.

Variants of Incidence

Incidence can actually be regarded as a family of disease-frequency measures. Some members of this family traditionally go by names of their own, but on closer examination they can be seen to be just special types of incidence.


Mortality is the incidence of fatal cases of a disease in the population at risk for dying of the disease. The denominator includes both prevalent cases of the disease as well as persons who are at risk for developing the disease. Subtypes are cumulative mortality and mortality rate. Death rate and mortality density are essentially synonyms for the mortality rate.

Example 3-8. Over the 5-year period from 2002–2006, some 258,205 deaths due to traumatic brain injury were recorded in the United States (Faul et al., 2010). Essentially the entire United States population was considered to be at non-zero risk for dying of traumatic brain injury, although the level of risk clearly varied greatly from person to person. Hence the denominator for the mortality rate was (estimated average size of the United States population during 2002–2006) × (length of observation period) = 293,235,577 × 5 years. The mortality rate for traumatic brain injury during that period was thus 258,205/(293,235,577 × 5 person-years) = 17.6 deaths per 100,000 person-years.

Case Fatality

Case fatality refers to the incidence of death from a disease among persons who develop the disease. Case fatality reflects the prognosis of the disease among cases, while mortality reflects the burden of deaths from the disease in the population as a whole.

In principle, cumulative fatality and fatality rate could be defined as special types of cumulative incidence and incidence rate, respectively, with appropriate restrictions on who counts toward the numerator and denominator. In practice, these terms are rarely used, although the underlying theory would still apply. Instead, case fatality is most commonly used:

Case fatality=Number of fatal casesTotal number of cases
Case fatality can be viewed as the cumulative incidence of death due to the disease among those who develop it. A fixed time period after disease onset may or may not be explicitly specified and must often be inferred from the context. As a variant of cumulative incidence, case fatality is most readily applied for diseases of relatively short duration, in which there are few losses to follow-up or deaths from other causes.

Example 3-9. The National Highway Traffic Safety Administration (2010) reported that 4,092 deaths occurred in the United States during 2009 when a pedestrian was struck and killed by a motor vehicle. They estimate that 59,000 pedestrians were injured in pedestrian/motor-vehicle collisions during that year. Based on these data, the case fatality of pedestrian/motor-vehicle collision injury in 2009 was 4,092/59,000 = 6.9%.

Proxy Measures of Incidence

Sometimes good denominator data for the desired measure of incidence cannot feasibly be obtained. Yet case counts alone are likely to be inadequate for comparing incidence between populations that differ in size or other key characteristics. Under those circumstances, a proxy denominator may be better than none at all.

Proportional Mortality

The proportional mortality for a disease is:

Proportional mortality=Deaths from the diseaseDeaths from all causes

As its name implies, it is simply the proportion of all deaths that are due to a particular cause in a specified population and time period of interest. This proportion can provide useful descriptive information in its own right: for example, the statement that heart disease accounted for 25% of all deaths among Americans in 2007 refers to proportional mortality (National Center for Health Statistics, 2011).

For purposes of comparing disease frequency between populations, the main advantage of proportional mortality is that its denominator—total number of deaths—can usually be ascertained from the same source that furnishes its numerator. The count of all deaths serves as a proxy for person-time at risk under the assumption that, other things being equal, one would expect total deaths to vary in proportion to population size and duration of the monitoring period.

A potential limitation of comparing proportional mortality between populations or subpopulations can be illustrated by an example:

Example 3-10. Berkel and de Waard (1983) studied mortality among Seventh-Day Adventists (SDA) in the Netherlands over a ten-year period. Members of that church are prohibited from using tobacco or alcoholic beverages and are encouraged to follow a vegetarian diet. Hence the investigators expected a reduced death rate among SDA from cancer (particularly lung cancer, which is strongly related to smoking) and heart disease.

The results are summarized in Table 3.2. Among deaths in SDA for which a cause of death could be ascertained, 2.5% were due to lung cancer and 47.1% to cardiovascular disease. Those percentages were found to be quite similar to the corresponding proportional-mortality percentages for the Netherlands population of similar age and gender, suggesting little or no health benefit for SDA.

Table 3.2. Proportional Mortality and Mortality Rate Analyses of Deaths Among Dutch Seventh-Day Adventists (SDA)

Number of deaths in SDA

Proportional mortalitya

Cause of death




Expected from Netherlands mortality rates

Lung cancer





Other cancer





Cardiovascular disease





Other known cause







All causes





a As percent of deaths with known cause

(Based on data from Berkel and de Waard [1983])

But in this instance, the investigators also had detailed year-by-year data on the size of the SDA population, from which they could determine person-time at risk contributed by SDA during the study period by age and gender. They obtained the age- and gender-specific mortality rates for the Netherlands as a whole from published sources. By applying these published mortality rates to the SDA person-time data, they estimated how many deaths would have been expected among SDA if this group had experienced the mortality rates observed among all Netherlands residents of similar age and gender. The rightmost two columns of Table 3.2 shows these results, which lead to quite a different conclusion. The observed numbers of lung cancer and cardiovascular disease deaths in SDA were in fact sharply lower than the number of such deaths expected based on mortality rates for the general population. But deaths from other causes were also substantially lower than expected among SDA. Hence the proportions of SDA deaths from lung cancer and heart disease differed very little from those in the Netherlands as a whole. Results from the proportional-mortality analysis alone would have been misleading in this instance because total number of deaths was a poor proxy for person-time at risk.

Proportional Incidence

Proportional incidence is based on the same basic idea, but ignoring whether cases are fatal or not. For example, hospital admissions for diabetes may be expressed as a proportion of all hospital admissions if no good data are available on the size of the true population at risk for hospitalization. Similarly, incident cases of colon cancer may be expressed as a proportion of all incident cancer cases. Note that the denominator of proportional incidence may be less obvious than for proportional mortality. But in general, the proportional incidence of disease X, say, can be defined as:

Proportional incidence of X=Incident cases of XIncident cases in a larger disease category that contains X
The same potential pitfall applies. The comparisons mentioned above could be misleading if the overall hospitalization rate or the overall cancer incidence rate were to differ between populations being compared. Contrasts based on proxy measures must therefore be cautiously interpreted.

Fetal Death Ratio

In perinatal epidemiology, the frequency of fetal death in a certain population over a specified time period is often expressed as:

Fetal death ratio=Number of fetal deathsNumber of live births

The denominator for a cumulative-incidence measure of fetal death would be the total number of pregnancies. But some pregnant women may undergo spontaneous or elective abortions that can be difficult to ascertain and to count. Hence the number of live births is used as a proxy for the total number of pregnancies.

In contrast to proportional mortality, the fetal death ratio and other analogues that do not include the numerator as part of the denominator are not proportions and thus require different data analysis methods.

Period Prevalence

Earlier, prevalence was described as reflecting the frequency of the diseased state at a specified point in time. Hence the term point prevalence is sometimes used to emphasize this feature (Porta, 2008). In contrast, period prevalence is really a hybrid of prevalence and cumulative incidence. Like cumulative incidence, it refers to a period of time rather than a point in time. However, cases counted in its numerator include both (1) prevalent cases that already exist at the beginning of the observation period, and (2) new cases that occur during the period. Referring to Figure 3.10, persons #1, #3, #4, and #5 would all count as cases. The denominator includes both (1) persons at risk when the period starts, as well as (2) extant cases when the period starts. For Figure 3.10, the period prevalence would thus be 4/5 = 0.8.

Figure 3.10 Diagram of period prevalence.

Figure 3.10
Diagram of period prevalence.

Period prevalence is essentially uninterpretable except in a closed population, for the same reasons that apply to cumulative incidence. For a closed population, if P = point prevalence when the observation period starts and CI = cumulative incidence among individuals at risk at that time, then period prevalence can be seen to be: (3.3)

Period prevalence = P+(1P)CI
For Figure 3.10, this would be 1/5 + (1 −1/5) × 3/4 = 0.8.

The main limitation of period prevalence is that point prevalence and cumulative incidence convey very different kinds of information about disease frequency. The distinction is lost when they are combined in this way, which limits the usefulness of period prevalence as a summary measure. When possible, point prevalence and cumulative incidence are generally better kept separate as two more interpretable components.

Nonetheless, sometimes this separation cannot be made from the data available, and “period prevalence” may be an accurate descriptive term for a frequency measure that can be calculated. For example, the United States Centers for Disease Control (1998) reported that 25.3 per 1,000 United States women who delivered a liveborn infant during 1993–1995 had diabetes during the pregnancy, according to data on the baby’s birth certificate. Some of these mothers had diabetes before becoming pregnant, while others developed diabetes during pregnancy. In any event, all reportedly had diabetes sometime during the period of pregnancy, so 25.3/1,000 is probably best regarded as a period prevalence. A subsequent revision of the birth certificate in most states allowed distinguishing between pre-existing diabetes and diabetes with onset during pregnancy (gestational diabetes) (Osterman et al., 2009).


Figure 3.11 shows a classification scheme for most of the measures covered in this chapter and some of the key distinctions among them.

Figure 3.11 Overview of measures of disease frequency.

Figure 3.11
Overview of measures of disease frequency.


Ast DB, Schlesinger ER. The conclusion of a ten-year study of water fluoridation. Am J Public Health 1956; 46:265–271.Find this resource:

Babbage C. Letter to Alfred, Lord Tennyson. Quoted in: Newman JR (ed.). The world of mathematics. Volume 3, p. 1487. New York: Simon and Schuster, 1956.Find this resource:

    Berkel J, de Waard F. Mortality pattern and life expectancy of Seventh-Day Adventists in the Netherlands. Int J Epidemiol 1983; 12:455–459.Find this resource:

    Centers for Disease Control and Prevention. Diabetes during pregnancy—United States, 1993–1995. MMWR 1998; 47:408–414.Find this resource:

      Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2010. Atlanta, GA: U.S. Department of Health and Human Services, 2011.Find this resource:

        Eisenberg MS, Gaarslev K, Brown W, Horwitz M, Hill D. Staphylococcal food poisoning aboard a commercial aircraft. Lancet 1975; 2:595–599.Find this resource:

        Faul M, Xu L, Wald MM, Coronado VG. Traumatic brain injury in the United States: emergency department visits, hospitalizations and deaths 2002–2006. Atlanta, GA: Centers for Disease Control and Prevention, National Center for Injury Prevention and Control, 2010.Find this resource:

          Gardner LI, Landsittel DP, Nelson NA. Risk factors for back injury in 31,076 retail merchandise store workers. Am J Epidemiol 1999; 150:825–833.Find this resource:

          Gruska M, Gaul GB, Winkler M, Levnaic S, Reiter C, Voracek M, et al. Increased occurrence of out-of-hospital cardiac arrest on Mondays in a community-based study. Chronobiol Int 2005; 22:107–120.Find this resource:

          Guskiewicz KM, McCrea M, Marshall SW, Cantu RC, Randolph C, Barr W, et al. Cumulative effects associated with recurrent concussion in collegiate football players: the NCAA Concussion Study. JAMA 2003; 290:2549–2555.Find this resource:

          Landon MB, Spong CY, Thom E, Hauth JC, Bloom SL, Varner MW, et al. Risk of uterine rupture with a trial of labor in women with multiple and single prior cesarean delivery. Obstet Gynecol 2006; 108:12–20.Find this resource:

          Lavados PM, Sacks C, Prina L, Escobar A, Tossi C, Araya F, et al. Incidence, 30-day case-fatality rate, and prognosis of stroke in Iquique, Chile: a 2-year community-based prospective study (PISCIS project). Lancet 2005; 365:2206–2215.Find this resource:

          Marrett LD. Estimates of the true population at risk of uterine disease and an application to incidence data for cancer of the uterine corpus in Connecticut. Am J Epidemiol 1980; 111:373–378.Find this resource:

          Massie DL, Campbell KL, Williams AF. Traffic accident involvement rates by driver age and gender. Accid Anal Prev 1995; 27:73–87.Find this resource:

          National Center for Health Statistics. Health, United States, 2010 with special feature on death and dying. Hyattsville, MD: National Center for Health Statistics, 2011.Find this resource:

            National Highway Traffic Safety Administration. Traffic Safety Facts: 2009 data. Washington, D.C.: U.S. Department of Transportation, 2010.Find this resource:

              Osterman MJK, Martin JA, Menacker F. Expanded health data from the new birth certificate, 2006. National Vital Statistics Reports, Vol. 58, No. 5. Hyattsville, MD: National Center for Health Statistics, 2009.Find this resource:

                Porta M (ed.) A dictionary of epidemiology (5th edition). New York: Oxford, 2008.Find this resource:

                  Rocca WA, Cha RH, Waring SC, Kokmen E. Incidence of dementia and Alzheimer’s disease. A reanalysis of data from Rochester, Minnesota, 1975–1984. Am J Epidemiol 1998; 148:51–62.Find this resource:

                  Sloan JH, Kellermann AL, Reay DT, Ferris JA, Koepsell T, Rivara FP, et al. Handgun regulations, crime, assaults, and homicide. A tale of two cities. N Engl J Med 1988; 319:1256–1262.Find this resource:

                  Tsan L, Davis C, Langberg R, Hojlo C, Pierce J, Miller M, et al. Prevalence of nursing home-associated infections in the Department of Veterans Affairs nursing home care units. Am J Infect Control 2008; 36:173–179.Find this resource:

                  Vandenbroucke JP. On the rediscovery of a distinction. Am J Epidemiol 1985; 121:627–628.Find this resource:

                  Van Landingham MJ. Murder rates in New Orleans, LA, 2004–2006. Am J Public Health 2007; 97:1614–1616.Find this resource:


                  1. 1. A study of time trends in the frequency of HIV infection is being planned in a small African country. Due to the high cost of testing a truly random sample of the population, the study will involve testing sentinel groups that can be accessed at low cost. Inmates in federal prisons have been chosen as a captive population.

                    Over the last few years, the incarcerated population has averaged 1,200 persons. The study team plans to obtain a serum sample from each incoming prisoner over a 2-year period. It is estimated that 800 new prisoners will be incarcerated during the study period. Suppose that 40 of the serum samples test positive for HIV, and that all 800 incoming prisoners are different individuals (i.e., none are repeat offenders). Which of the following will the investigators be able to calculate from the study data? For each that can be calculated, do so using the hypothetical results, and name the defined population to which it applies.

                    1. (a) Incidence rate of HIV infection

                    2. (b) Cumulative incidence of HIV infection

                    3. (c) Prevalence of HIV infection

                  2. 2. A large study by Landon et al. (2006) sought to determine the risk of uterine rupture during labor among women attempting to deliver a baby vaginally, in relation to the number of previous deliveries each had had by Cesarean section (C-section). Among 16,915 women with a single prior C-section, 115 experienced uterine rupture during the attempt to deliver vaginally. By comparison, 9 uterine ruptures occurred among 975 women who had had two or more C-sections before trying to deliver vaginally.

                    1. (a) What kind of disease-frequency measure would be appropriate here to express the frequency of uterine rupture in each group of mothers?

                    2. (b) Was the risk of uterine rupture higher or lower among women who had had multiple prior C-sections, compared to women with one prior C-section?

                  3. 3. If a hen and a half lay an egg and a half in a day and a half, how many eggs would one hen lay in three days?

                  4. 4. Iquique, Chile, is a coastal city of about 200,000 people with a predominantly Hispanic-Mestizo population. An epidemiologic study (Lavados et al., 2005) sought to estimate the incidence of first strokes in Iquique for comparison with rates observed in other settings.

                    During a 2-year period from July 1, 2000, to June 30, 2002, all patients who were admitted with their first-ever stroke at any hospital in the city were identified and counted.

                    1. (a) The breakdown of these first-ever stroke cases by age was:

                      Do these results indicate that the risk of experiencing a first stroke in that setting was greatest among persons 55–64 years of age? Explain briefly.

                    2. (b) A published report from the project states:

                      In 2000, the population of [Iquique] was 181,984 according to the projections of the 1992 national census, and in 2002, it was 214,526 according to the 2002 national census. … We calculated incidence rates using the sum of the two populations as the denominator—i.e., the projected population in Iquique in 2000, according to the 1992 national census projections, plus the actual population of 2002, according to the 2002 national census.

                      The two population estimates were summed for each age group, yielding the following totals:

                      We will ignore the slight discrepancy between this total and the sum 181,984 + 214,526 = 396,510 from the previous paragraph. Note, however, that either number is roughly twice the population size of Iquique.

                      The investigators described these totals as the “number at risk” in each age group. They then calculated the incidence of first stroke in each age group by dividing the number of cases in that age group (from the first table) by the corresponding “number at risk” (from the second table).

                      In principle, a better approach to quantifying incidence in this open population would be to calculate an incidence rate for each age group as (no. of cases)/(person-years at risk). Assume for present purposes that essentially the entire population was at risk for a first stroke. From the data given, can you calculate the appropriate age-specific incidence rates? If so, do so; if not, why not?

                    3. (c) The investigators reported that the overall incidence of first-ever strokes was 292/396,712 = 73.6 per 100,000. In the same study, the investigators also identified 68 persons with recurrent strokes and reported that the incidence of recurrent stroke was 68/396,712 = 17.1 per 100,000.

                      1. i. Does this imply that, in this setting, once a person had a first stroke, the risk of having another one was actually lower than the risk of having the initial one? Why or why not?

                      2. ii. Suppose that a community survey in Iquique showed that 5% of its population had a past history of stroke. How could this information be used to obtain better estimates of the overall incidence of first strokes and of recurrent strokes? (Hint: what is the population at risk for each type of event?)

                  5. 5. During 2004, there were 264 recorded murders in New Orleans, Louisiana, for a homicide rate of 57.1 per 100,000 person-years—about four times the national rate. The number of murders in New Orleans fell to 210 during 2005. However, Hurricane Katrina struck on August 29, 2005, and the city was virtually depopulated for several weeks before some former residents began to return.

                    The best available estimates of the city’s population during 2005 come from Census Bureau estimates before Katrina hit and from the New Orleans Emergency Operations Center and the Louisiana Recovery Authority afterward. An approximate summary is:

                    1. (a) From January 1–August 28, the city’s population size was fairly stable at an estimated 462,269 persons (the 2004 mid-year population estimate).

                    2. (b) From August 29–September 30, New Orleans had been almost completely evacuated.

                    3. (c) During October, people returned steadily, and by the end of the month, about 71,000 residents had come back.

                    4. (d) During November and December, the influx of returnees was more gradual. By the end of 2005, the estimated population of the city was 91,000.

                    Estimate the homicide rate for New Orleans during 2005, and compare it to the homicide rate for the previous year.

                  6. 6. Vancouver, British Columbia and Seattle are geographically near each other and are quite similar with regard to population size and several measures of socioeconomic status. For the period 1980–1986, the following data were obtained from the respective police departments concerning homicides, according to weapon used.

                    A newspaper reporter is sitting beside you when these data are shown at a press conference. He voices his conclusion that a Seattle resident may be more likely than a Vancouver resident to be shot to death by someone else, but that Seattleites can at least take comfort in knowing that they are less likely to be stabbed to death or killed by other weapons than are Vancouver residents. Do you agree? Why or why not?

                  7. 7. CDC estimates that about 300,000 sports-related concussions occur each year in the United States. A study of concussion among collegiate football players involved surveillance for concussion among 4,251 player-seasons (Guskiewicz et al., 2003). (A single player monitored for a full football season contributed one player-season.) Overall, 184 players experienced at least one concussion, and 12 players experienced a repeat concussion within the same season.

                    Assume for present purposes that all of the study data were collected during a single playing season. From these results, can you determine whether the incidence of repeat concussion is greater than the incidence of a first concussion? Explain your answer briefly.

                  8. 8. Atrial fibrillation (AF) is a heart rhythm abnormality that can be either chronic or “paroxysmal” (occurring in repeated episodes). AF increases the risk of stroke, but the excess risk can be significantly reduced by taking anticoagulants.

                    To estimate the prevalence of AF among older adults in a certain region of England, 4,843 persons were sampled at random from a list of all persons aged 65 years or older who were registered with a National Health Service primary care physician. Of the 3,678 who participated and had an electrocardiogram, 207 were found to have AF.

                    To check for participation bias, medical records were also reviewed for a sample of participants and for a sample of nonparticipants. A diagnosis of AF was found somewhere in the medical record for 139/1,413 in the participant sample and for 40/382 in nonparticipants.

                    1. (a) Based on these results, what is your best estimate of the prevalence of AF among older adults in the region?

                    2. (b) Do the results from medical records review for a subsample of participants and nonparticipants suggest that persons with AF were any more or less likely to be surveyed?

                    3. (c) Why do you think the percentage of patients with AF in the medical records substudy was so much higher than the percentage found to have AF in the survey?

                  Table 3.3.

                  Age group

                  No. of patients





























                  Table 3.4.

                  Age group

                  2000 + 2002 population



















                  Table 3.5. Percentage of Homicides Committed Using Each Weapon Type

                  Type of weapon















                  1. 1. Only (c) can be calculated, and it is 40/800 = 5%. It is the prevalence of HIV infection among incoming prisoners at the time of entry into prison.

                    The length of the study period and the size of the total prison population are not really relevant here. The prevalence figure given above tells us nothing about HIV infection among inmates who were already in prison, since none were tested. Although the tests are to be performed over a 2-year period, each incoming inmate being tested is observed only once, not over a period of time after being determined to be at risk, as would be required for any measure of incidence. Hence the appropriate time scale is not calendar time, but time since prison entry, and we observe all study subjects at time 0 on this scale.

                  2. 2.

                    1. (a) Cumulative incidence would be a natural choice. The issue is how frequently women experience a change in state from having an intact uterus to having a ruptured uterus during the time period of labor. The study population was a closed one—no women would join the study in mid-labor, and few if any would be lost to follow-up for such a short-term outcome. The duration of labor may differ among women, but what matters most clinically is whether the uterus ruptures during childbirth, not the rupture rate per minute or the timing of rupture. Cumulative incidence has the nice feature of being readily interpretable as an estimate of the probability of rupture for a randomly chosen woman from the group on which cumulative incidence was calculated.

                    2. (b) The cumulative incidence of uterine rupture was higher among those with two or more previous C-sections: 9/975 = .0092 = 9.2 per 1,000 women at risk, versus 115/16,915 = .0068 = 6.8 per 1,000 women. Incidentally, however, the observed difference could easily have occurred by chance in the absence of any true difference: p = 0.49 by chi-square test (with continuity correction).

                  3. 3. This familiar riddle is actually an incidence rate problem. The number of eggs laid should be proportional to the number of hens and to the amount of time spent waiting for eggs. The “incidence rate” of egg-laying is 1.5 eggs/(1.5 hens × 1.5 days) = 2/3 eggs/hen-day. One hen on the job for three days amounts to 3 hen-days, so we would expect 3 × 2/3 = 2 eggs.

                  4. 4.

                    1. (a) No, these are just case counts (“numerator data”) and do not take into account the size of the population at risk in each age group. The small number of cases among adults age 85+ years, for example, might simply result from there being relatively few people that old in Iquique.

                    2. (b) Although the investigators described the 2000 + 2002 population totals as “numbers at risk” and described them as numbers of people, in this case these numbers can more properly be interpreted as person-years at risk. To see this, consider the following diagram for a single age group:

                      The investigators calculated their denominator as A + B. The desired denominator, in person-years, would be the area of the shaded trapezoid, which is (A+B)2×T. But in this case, T = 2 years, the duration of the study. Hence (A+B)2×2=A+B, so the two approaches yield the same numerical values, albeit with different units and interpretations.

                      We now have the necessary numerator and denominator data and can divide them to obtain the desired age-specific incidence rates (Table 3.6).

                      Note that the incidence of first strokes actually increased steadily with age, in contrast to the percentages in the first table, which were based simply on case counts.

                    3. (c)

                      1. i. No. The problem is that the total population of Iquique was used as the denominator for both incidence rates. This treated everyone in the city as being at risk both for a first stroke and for a recurrent stroke. In fact, only persons who had no history of a previous stroke would be at risk for a first stroke, and only those who previously had at least one stroke would be at risk for a recurrent stroke.

                        The discrepancy between the total population and the true population at risk may not have been very great for first strokes—a large majority of Iquique residents probably had never had a stroke and thus were indeed at risk for a first one. But the discrepancy would have been much larger for recurrent strokes. This problem makes the reported incidence of recurrent stroke largely uninterpretable in terms of risk.

                      2. ii. The population at risk for a first stroke is people who have never had a stroke. The survey indicates that 95% of the city’s population falls into that category. A corrected estimate of the incidence of first strokes would then be 292/(0.95 × 396,712) = 77.5 cases per 100,000 person-years.

                        The population at risk for a recurrent stroke is people who have previously had a stroke. The survey indicates that 5% of the city’s population falls into that category. A corrected estimate of the incidence of recurrent strokes would then be 68/(0.05 × 396,712) = 342.8 per 100,000 person-years. Although the survey results are hypothetical, these calculations show that it is quite possible, given the data presented, for the risk of having another stroke to be sharply elevated among those who have already had at least one.

                  5. 5. Figure 3.13 shows the changing size of the New Orleans population during 2005:

                    The total number of person-years at risk corresponds graphically to the combined areas of the rectangle, triangle, and trapezoid in the figure. This quantity can be calculated as shown in Table 3.7.

                    The homicide rate for 2005 was thus 210/320,290 = 65.6 homicides per 100,000 person-years—about 15% higher than the rate in 2004, despite the smaller number of murders in 2005.

                    This problem was based on a paper by Van Landingham (2007), which used a slightly different method to estimate person-years at risk during the last three months of 2005 but reached a similar conclusion.

                  6. 6. The table concerns only proportional-mortality data on the distribution of homicides by weapon type. It does not show whether the actual incidence (mortality) of homicides, overall or of any type, is higher in one city than in the other.

                    Here are the actual homicide incidence rates from the two cities during those years:

                    The overall incidence of homicide was higher in Seattle, and the difference in rates for firearms accounted for most of the excess. The incidence of homicide carried out with knives was slightly higher in Vancouver, but the incidence of murder involving other weapons was actually higher in Seattle than in Vancouver (Sloan et al., 1988).

                  7. 7. We wish to compare two incidence rates, one for first concussions and the other for repeat concussions during the same season. Identifying the two numerators is straightforward: there were 184 players with a first concussion, and 12 with a repeat concussion.

                    Obtaining denominators for these rates is more challenging. The only players at risk for a first concussion are those who have had no concussions so far during the season. The only players at risk for a repeat concussion are those who have already had a concussion during the season. Those are the only two possibilities, so all player-time must accrue toward one denominator or the other. In other words, the denominators of the two rates of interest must add up to 4,251 player-seasons. Say that the number of player-seasons at risk for repeat concussion is x; then the number of player-seasons at risk for first concussion must be 4,251 − x.

                    Calculating x would be easy in principle if we had detailed data about how much playing time each athlete with a first concussion accrued during the rest of that season. However, even without such detailed information, we can set bounds on x. As noted above, the only athletes who contributed any player-time toward x were the 184 with a first concussion. At one extreme, if all 184 athletes had their first concussion at the very beginning of the season, and all continued to play for the rest of the season, they would accrue a maximum of 184 player-seasons at risk for a repeat concussion. At the other extreme, if all 184 athletes had their first concussion very near the end of the season, they would accrue very little time at risk for a repeat concussion. The lower bound for x is 0.

                    Now suppose that x = 184 player-seasons, its maximum possible value. Then the incidence rate of first concussions would be 184/(4,251 − 184) = 4.5 first concussions per 100 player-seasons at risk. The incidence rate of repeat concussions would be 12/184 = 6.5 repeat concussions per 100 player-seasons at risk. At any smaller value of x, the first rate would decrease while the second would increase. Thus we can safely conclude that the incidence of repeat concussions must have exceeded the incidence of first concussions.

                  8. 8.

                    1. (a) Prevalence = 207/3,678 = 0.056

                    2. (b) AF was found for 139/1,413 = 9.8% of participants and 40/382 = 10.5% of nonparticipants, suggesting little participation bias.

                    3. (c) The kind of prevalence measured in the community survey was point prevalence as of the time the electrocardiogram was taken for each participant. The kind of prevalence measured in the medical record review is better considered period prevalence. It referred not to the proportion of patients who had AF at a particular point in time, but over the period of time during which patients had received care from the clinic whose medical record was reviewed.

                  Table 3.6.

                  Age (years)

                  No. of cases

                  Person-years at risk

                  Incidence ratea

































                  a Cases per 100,000 person-years

                  Table 3.7.


                  Duration (years) (A)

                  Average population (B)

                  Person-years (A × B)

                  Jan 1 – Aug 28




                  Aug 29 – Sep 30




                  Oct 1 – Oct 31




                  Nov 1 – Dec 31







                  Table 3.8.

                  Type of weapon

                  Homicide ratea












                  All types



                  a Homicides per 100,000 person-years

                  (Based on data from Sloan et al. (1988))