Pay-For-Performance Report

page: 1 | 2 | 3 | 4 | 5 | 6

Report cards based on outcome measures

Outcome measures are expensive and invade patient privacy because they must be risk-adjusted in order to be accurate. Adjusting for differences in patient health nearly always requires access to patient medical records (as opposed to "administrative data," typically defined as data found in claim forms collected by insurers and/or discharge reports prepared by hospitals). The New York State Department of Health report card on heart surgeons is a good example. It uses mortality rates following coronary artery bypass grafts (an outcome measure) as the measure of quality, and it does expensive risk adjustment using patient medical records. Examples of "patient risk factors" that are taken into account by the New York Department of Health include several measures of left ventricular function (ejection fraction, heart attack within the last seven days, and congestive heart failure) and the presence of several comorbidities (including diabetes and obesity). (8)

Risk adjustment of outcome measures that will be used to punish or reward physicians (either with adjustments to reimbursement or with more or less market share) is necessary primarily to protect patients, and secondarily to ensure fair treatment of doctors. If risk adjustment is insufficiently accurate, or if it is even perceived by physicians to be insufficiently accurate, physicians will be under pressure to avoid sicker patients. The following statement by Fowles et al. was made about capitation, but it applies as well to PFP because PFP, like capitation, exposes physicians to risk of lost income if their patients are sicker than average:

Unless capitated payment rates can be adjusted for enrollee health status, physicians or physician groups with patient populations that are sicker than average will be at a financial disadvantage. Sick patients, too, will be in jeopardy. Without adequate risk adjusters, risk-bearing organizations have a financial incentive to select healthier individuals or jettison sicker ones. (9)

Despite the consensus that risk adjustment of outcome measures is required, few of the report cards now touted by the insurance industry and large employer groups are adjusted for anything other than age and sex. According to Hofer et al., "The profiling approach most commonly used by payers and administrators is to calculate simple age- and sex-adjusted measures that are averaged by physician to generate a physician profile."(10) In view of the scarcity of research on the effect of implementing PFP without accurate report cards, it is fair to say the managed-care industry is once again implementing a managed-care method without first testing it to assure it is safe and effective and worth its cost.

The few reliable studies that have been done confirm what theory and common sense predict: that providers whose income is contingent upon their performance as measured by inaccurate report cards are under great pressure to cherry pick. Shen found that providers of substance abuse treatment avoided sicker patients following Maine's implementation of "performance-based contracting," a system that threatened low-scoring providers with loss of their contracts and promised high-performing providers more funding. The scores that Maine used to assess performance were not adjusted for risk (11). But even report cards with sophisticated risk adjustment may cause some providers to avoid the sickest patients. The New York bypass report card, as sophisticated as it is, may not be accurate enough to take the pressure off surgeons to cherry pick. In any case, some surgeons in New York think it is not, and, as a result, there is some evidence that quality of care for the sickest patients has declined as surgeons who do not trust the risk-adjustment methodology find ways to avoid sick patients in order not to have those patients drive their mortality rates up. (12)

Some experts have suggested that outcomes cannot be measured accurately enough to eliminate the incentive to reject sicker patients. Hofer et al. examined the accuracy of an outcome measure commonly used in physician report cards today – hemoglobin A1C (HbA1c) levels in patients with type 2 diabetes. After adjusting the HbA1c levels for differences in patient health and socioeconomic status (adjustments far more sophisticated and expensive than those used by the average insurer today), the investigators found that physicians would still be better off getting rid of their one to three sickest patients (out of an average diabetic base of about 21 patients per doctor) (13). Doing so would improve their score "dramatically," said the authors. "This advantage from gaming could not be prevented by even detailed case-mix adjustment," they concluded. (14)

Hofer et al. blamed the inaccuracy of the HbA1c outcome measure on two factors: the relatively small effect that differences in physician practice styles have on HbA1c levels, and the small number of diabetic patients seen by individual physicians. The authors determined that only 3 percent of the variation in physicians' average HbA1c levels could be attributed to physician behavior; the rest was caused by factors outside of physician control, including patient behavior and chance. They stated that "at least 100 patients [per doctor] would be needed to reach 80 percent reliability (often considered the minimum for making decisions about individuals)." They took note of the possibility that patient pools and differences among physician practice styles might be larger for other diseases, then minimized that possibility with this observation: "However, diabetes is one of the most common diseases in the United States. Apart from hypertension, it is difficult to imagine that there would be enough cases per primary care physician to construct disease-specific profiles for almost any other chronic condition." (15)

Of course, the question of whether PFP should proceed is dependent not just on how the question about report card accuracy is resolved. Two other questions must also be asked: What does it cost to produce accurate report cards based on outcome measures and to carry out the other tasks necessary to implement a PFP program, and does that cost outweigh the benefits achieved by the PFP program? The Forum should make sure it has sound, evidence-based answers to these questions before recommending PFP. What little evidence we have on the cost question is not encouraging. For example, one of the oldest report card projects, the Cleveland Health Quality Choice program, was terminated four years ago because the Cleveland Clinic concluded the cost was not worth it. After spending $2 million a year for a decade, the Clinic withdrew its nine hospitals from the project on the ground that the report cards' effect on quality was too insubstantial to warrant $2 million a year. (16) Report cards on physician performance in hospitals are probably less expensive to risk adjust than report cards on outpatient care because records of inpatient medical care are more centralized. Collecting the data necessary to risk adjust outcomes for services provided on an outpatient basis will unquestionably be more expensive because patient medical records are scattered over many more sites.

page: 1 | 2 | 3 | 4 | 5 | 6