Measuring high-quality depression care
ACP's Performance Measurement Committee looked at eight quality measures for major depressive disorder (MDD) and voted to support one on suicide risk assessment.
Performance measures have emerged as a way to evaluate care quality, but do they work, and are they worth the added administrative burden?
ACP's Performance Measurement Committee encourages high-quality, evidence-based performance measures to guide improvement of care and evaluates performance measures developed by other organizations. It recently took a close look at eight existing measures for major depressive disorder (MDD) and voted to support only one, on suicide risk assessment. The committee's work was published as a position paper in the April Annals of Internal Medicine.
It's not that the unsupported measures, on screening, follow-up, and treatment of MDD, were recommending substandard or inappropriate care, explained Caroline L. Goldzweig, MD, MSHS, FACP, current Vice Chair of the Performance Measurement Committee and a position paper coauthor. It's that they were difficult to implement effectively and weren't always validated at every level of clinical practice.
“Quality measurement is really a tricky thing, in some ways,” she said. “Everyone wants to be able to say that they're high quality, and people want to be able to demonstrate that they're high quality, but developing measures [that can do that successfully] is actually really, really hard.”
I.M. Matters recently talked to Dr. Goldzweig, who is the chief medical officer of Cedars-Sinai Medical Care Foundation in Los Angeles, to learn more about performance measurement in general and the College's work specifically.
Q: Why did the Performance Measurement Committee choose to study depression measures?
A: We are very interested in looking at measures that affect our membership, and we look at measures also where the College has guidelines. … The College did put out a guideline around depression, and depression is a pretty important concept, a key clinical concept, for anyone doing internal medicine.
Q: The committee reviewed eight depression measures. How did members decide which ones to evaluate?
A: In our process, we search key websites that are known for publishing performance measures and that have done some vetting or evaluation of them, for instance, the National Quality Forum, or CMS, or the National Committee for Quality Assurance. We then go through a pretty rigorous process of evaluating the measures against multiple different domains and grading them using a consensus methodology that was developed years ago at RAND. The depression measures we reviewed were from the CMS Merit-based Incentive Payment System (MIPS) and from the National Quality Forum.
Q: The committee supported only one measure, on suicide risk assessment. What went into that decision, and what made that measure different from the others?
A: We supported that measure, which called for suicide risk assessment during all adult patient visits with a new diagnosis of MDD or a new diagnosis of recurrent MDD, because it had been tested at all of the different levels of attribution. One of the things that often happens with these performance measures is they're being measured, let's say, at the health plan level, but then they might get measured at the group level, or they might get measured at the individual physician level, but the measure itself has not necessarily been validated at all of those levels. When that doesn't happen, we usually downgrade it when it comes to what's called attribution quality.
But the measure on suicide risk assessment had actually been reviewed at both the individual and group practice levels, and it was based on strong evidence. We felt that, from a clinical standpoint, it made a lot of sense. It had very reliable results from the testing that was done, and then it also had a very clear way that as a physician, you could demonstrate that you'd actually done it, so it met all the checkboxes for a high-quality performance measure.
Sometimes, you'll have a performance measure that is based on sound evidence, and maybe it was even tested at the level of attribution, but it may be really, really hard to prove performance. You end up, as a medical group or as an individual physician, having to jump through lots of hoops to be able to prove that you actually did the right thing.
One example of this among the measures we looked at was screening for depression and following up on positive results. That's one where it's often very hard to demonstrate that you did the right thing, because mental health care can be very fragmented. For instance, you might, in a primary care clinic, screen a patient and find that they've got depression, and so you give them instructions to make an appointment with a mental health provider who sits outside of your system. If you're submitting the data on this, there's no way for you to demonstrate that the mental health visit actually happened. In a fragmented health care system, some of these measures just operationally don't play out too well.
Groups developing quality measures have to make sure that they're only including the right patients, that they're excluding patients for whom the measure shouldn't apply. There are timeframes that you have to look at. How do you identify who's actually eligible? Is it through codes or another mechanism? It makes that feasibility aspect of these measures really, really difficult. How do you operationalize that in a busy practice?
Q: Should physicians choose not to report on measures that are unsupported?
A: That's the conundrum. First of all, if anyone's reporting on the depression measures the committee did not support and they're doing well, it's probably because they put in place processes to conform to the measure specifications, and they should continue doing that. We're not trying to necessarily say to individual medical groups, or individual physicians and practices, “Oh, don't report on this,” just because the measure may be flawed, if that works for you and your clinic and the way that you are delivering care.
What we're trying to do at the American College of Physicians with the Performance Measurement Committee is to point out to the developers, to the quality entities that oversee quality measurement, that some of these measures are flawed or have room for improvement. If you're going to put a measure out there, and you're going to prioritize it for a program and for patients, we should make sure that that measure is really a solid one and that it's feasible to implement for physicians in practice, because in the end, what we're trying to balance is an ability to assess that the right things are getting done with the tremendous administrative burdens that physicians are dealing with every single day. … A lot of our endorsement is really aimed at those organizations, and our goal and our desire is to work with them to improve these measures, particularly where there's really strong evidence that this is the right thing to do.
Q: Are the types of issues you mentioned regarding unsupported measures fairly common in performance measurement?
A: They are. When you think about the delivery of health care and practicing medicine, it's a complicated algorithm, and you can't always reduce it down to something that's easy to describe or measure. It's often much more nuanced and not always easy to pick up from our documentation, particularly because most of our clinical notes are not codified, or inputted into a computer as a data element that can easily be identified and analyzed. To demonstrate that you have met a standard, we are often asked to code in a particular way, which is a hard thing to remember to do.
Over time, one would hope that with artificial intelligence technology and natural language processing it would become much easier. But right now, we're still living in this world where we rely a lot on capturing actions electronically, in ways that aren't necessarily consistent with how people normally practice medicine or how they document, and then you have the issue of multiple different electronic systems that don't always talk to each other.
Given that example of the primary care physician who's screening the patient for depression and then maybe referring them to another entity for mental health care, that doctor's EHR might not get any message from the mental health EHR that says, “Oh, yeah, the patient got an appointment and is being scheduled to be seen for depression.” That just doesn't happen, right? We don't really have that integration or interoperability across all of these different systems.
Q: Are there any other takeaways for physicians?
A: We don't want to throw out all performance measures, because every performance measure has some problem or another. We just want to make sure that there's recognition of the pitfalls in performance measurement, and in particular that when organizations endorse and use certain performance measures, ideally these measures would meet most of the established criteria for a high-quality performance measure.