I started working at a medical laboratory when I was 14 as an apprentice. That’s how you start a lot of professions in Mexico. For two years, I showed up on my days off and practiced drawing blood, spinning it down, and preparing the serum or plasma for tests. I also learned how to do Gram stains and read urine analysis strips. It was great fun, and it came in handy when I applied to the medical technology program in El Paso.
A lot of the clients in that little lab came straight from their healthcare provider to have a test done. The most common one was fasting blood sugar, so a lot of our blood drawing was done early in the day. Because of the equipment we used, it would take most of the rest of the morning to get the tests done. (Remember, this was early 1990′s Mexico. Having a glucometer was a luxury.) Complicated tests would get shipped off to a larger lab across the border, and they could cost the person an insane amount of money once you adjust for wages in Mexico.
One of those tests that we shipped off was the HIV antibody test. There were no at-home kits to test for HIV at that time. If the healthcare provider thought the person was at risk for HIV, the person would come in, get their blood drawn, and wait for about a week or so to get the results. We would give them the results in a sealed envelope with a disclaimer that they should give the unopened results to their provider since the provider would be the most qualified person to interpret the results. Surely, none of us in the lab were there to tell people that they had tested positive for what, at the time, was a sure death sentence. And none of us wanted to deal with a false positive, where the person tested positive when they were really not infected. Or even a false negative, where the person tested negative when they were really infected.
Why the false results? Because screening tests are not 100% accurate. By their very nature, screening tests are subject to error. Theoretically, the only tests that are not subject to error, called “gold standard” tests, are those that are 100% accurate all the time. In reality, even those tests are subject to error, but the error doesn’t appear to be bound to variants in the population being tested. In screening tests, the number of false results is bound to variants in the population.
Before we dive deeper into screening tests, we need to talk about prevalence. You may remember that prevalence is the total number of people with the disease or condition divided by the total population. For the rest of this lesson, we’ll look at cystic fibrosis as the disease we’re working on. Cystic fibrosis (CF) is a genetic disease that affects about 30,000 children in the United States. Based on that number, we find that the prevalence of CF in the country is about 0.0098%. That is a very small prevalence.
While there are many screening tests for CF, we will use our own theoretical test in this lesson. Our test is 99% sensitive and 99% specific. Sensitivity is the probability that the test will be positive if the person really does have CF. Specificity is the probability that the test will be negative if the person really doesn’t have CF. With those high percentages, you would think that the number of false positives and false negatives would be small, right?
There are also two other things we need to discuss about screening tests. A test’s positive predictive value (PPV) is something that interests healthcare providers a lot. This is the probability that the positive test really is positive. A test with a low PPV is of little to no use to a provider because the probability of that test in their hand really diagnosing the person is low. On the other hand, a test’s negative predictive value (NPV) is the probability that a negative test really is negative. This is also of interest to providers. A high NPV gives good probability that the negative test you got really means that you are negative. A low NPV makes negative tests be in doubt.
As it turns out, there is a relationship between a test’s sensitivity and specificity and the PPV and NPV… And prevalence. Prevalence, you see, really does affect a test’s PPV in a big way. How? Let me show you with math.
First, let’s set up our 2-by-2 table for analyzing the screening test we’re using:
In the table, sensitivity is the number in the “true positive” cell divided by the number in the “all positive” cell. Remember, sensitivity is the probability that the test will be positive if the person really is positive. Likewise, specificity is the number in the “true negative” cell divided by the number in the “all negatives” cell. Positive predictive value is given when you divide the “true positive” cell by the “all positive tests” cell. Remember, PPV is the proportion of all positive tests that correctly identified the positive person. Likewise, NPV is the number in the “true negatives” cell divided by the number in the “all negative tests” cell because NPV is the proportion of all negative tests that correctly identified a negative person.
Take a few moments to familiarize yourself with that table since we’ll be using it for the rest of this example.
Remember that the theoretical test that we’re using is 99% sensitive and 99% specific. So any number in the “true positive” cell will equal 99% of what is the “all positives” cell. Also, the number in the “true negative” cell will equal 99% of what is in the “all negatives” cell. Now, let’s say we screen the entire population of the United States for CF with our test. If we screen the whole of the United States for CF, these would be our results:
If you were a regular person in the US, would you like to be tested for CF? If you got a positive test, would you believe it if that positive had less than a 1% chance of being a real positive? What if you got a negative test? Would you believe it since it has an almost 100% chance of being real? (you see 100% in the table above because of rounding.) What are the implications for public health and for private medical practice? Discuss in the comment section, if you want, and include what you’d do with those millions of false positives.
Now, let me give you another example. In this example, we’re only screening children in the US. According to the US Census Bureau, there are a little over 74 million children under the age of 18 in the US. We plug in those numbers and get this:
A little better, but not much. Notice that our prevalence is now 0.04% (30,000 divided by 74,181,467), as opposed to 0.01% in the previous example. That increase in prevalence only brought up the PPV to 3.85% from 0.96%. Would you trust a positive test now? How about a negative test? What do you do with those hundreds of thousands who test positive?
Okay, so you’re not interested in all children being tested. You read that studies have shown that it’s children under the age of 5 that are likely to show the first signs and symptoms of CF (in our example) and that White children seem to be the ones getting CF more so than other children. You go to the Census Bureau and decide to test the 14.6 million children who are under 5 and White. (We keep the 30,000 the same for this example.) Now your prevalence is 0.2% and your test’s performance:
Are you getting the gist of it? We are increasing prevalence ever-so-slightly by being more judicious about who we test, and our test is performing better with regards to PPV, although the NPV is coming down slightly as well. If you’re a physician, you trust this PPV a little more than you’d trust a less than 1 in 100 chance of it being a true positive.
Now, let’s hike up the prevalence and say that CF is all over the place. Let’s say that 10% of all children under 5 have it. What do you think will happen to the test’s performance? Here, have a look:
Look at that! Your chance of a positive test being positive is better than 91%. The chance of a negative test being a true negative is still better than 99%. You still have a lot of false positives, though. What do you do with them? (In case you haven’t noticed, I keep asking that, and I’ll tell you in a moment what you do with them.)
As you can see, the prevalence of a disease or condition (or a gene you’re screening for) has a big influence on the PPV of a test. This matters a lot to you if you’re a healthcare provider wanting to know whether you should treat your patient or not. And this is precisely why it is recommended that healthcare providers be the ones to order or recommend screening tests for you. Their experience with similar signs and symptoms to yours helps them decide if you fall into the category of people whose characteristics increase the PPV of a screening test and, if the test is positive/negative, what to do next.
In the case of many screening tests for serious conditions, such as HIV infection, providers will usually order a confirmatory test (if the lab doesn’t do one already as part of a panel of tests). These confirmatory tests have better sensitivity and specificity than the screening tests. Also, because they re-test positives, they’re increasing the prevalence in the population being tested, boosting even more the PPV of the confirmatory test.
You may have heard recently that a genetic testing company was ordered by the Food and Drug Administration to halt sales of its testing kit citing concerns with the interpretation of positive results. There wasn’t much of a concern over the test’s performance. Like our theoretical screening test, it has good sensitivity, specificity, and accuracy. The problem is the predictive values of having everyone and their mother (sometimes literally) tested for genetic conditions of very low prevalence in the general population, especially without first consulting a trained medical professional to see if screening is indicated.
For example, if you have a family history of breast cancer, where many of the females (and some of the males) in your family developed it, you may want to get screened for the BRCA gene because you are in a population where the gene may be in high prevalence. If no one in your family had breast cancer, and you got a positive result, based on what I’ve explained above, would you trust it? (Indeed, there are stories coming out of women demanding mastectomies [surgery to remove the breast tissue] based on a positive result, not on family history and a positive result.)
Of course, there are times when we do screen everyone for things that are in low prevalence. Most newborns in the United States get screened for phenylketonuria even though it has a very low prevalence. Why? Because missing it can hurt the child severely within the first few years of life. A negative test is of no consequence, but a positive one leads to confirmation. That, and the probability of false negatives is very low.
The take-home message is that screening tests need to be applied judiciously to people whose signs and symptoms, family history, or some other “hint” lead their healthcare provider to believe that they belong to a population whose prevalence of the disease or condition is high enough to overcome the limitations of screening tests. For example, you wouldn’t test a 60 year-old woman with belly pain for pregnancy. You would test an 18 year-old woman. From a public health perspective, we need to be able to deal with the limitations of these tests. If we screen everyone, everywhere for something, are we ready to deal with the consequences of false positives? Instead, is it better to recommend the screening for people in high-risk groups based on our previous knowledge? Likewise, if we have no previous knowledge, how widely should we test?
In short, don’t go testing nilly-willy.
If you want to play with these numbers, here is an Excel spreadsheet I created.