From co-pilots to clinical judgment: the next phase of AI in digital behavioral health

Over the past two years, much of the conversation around AI in digital behavioral health has focused on operational co-pilots. We’ve seen real progress in areas like documentation, administrative workflows, and intake. These tools make a significant difference for providers by reducing burden and improving efficiency. For many doctors, late-night check-ins that used to extend into the night are finally starting to ease.

https://omg10.com/4/10736335

But in behavioral health care, some of the most important decisions don’t lie in documentation or workflows. They live with judgment in terms of how doctors interpret information, apply knowledge, and make decisions in situations that are often ambiguous. Two experienced doctors can analyze the same case and reasonably come to different conclusions. Sometimes that is appropriate, but often that variation reflects unclear criteria, inconsistent application, or differences in training and experience.

What is starting to emerge is a shift from thinking of AI simply as an operational tool that helps complete tasks, toward thinking of it as part of a system that helps make more consistent decisions over time. The goal is not to replace doctors, but for doctors and agents to work together to make clinical reasoning more explicit and consistent so that every patient receives the benefit of their doctor’s best thinking.

Moving from tools to systems

Most people still think of AI as a one-to-one relationship with a doctor and an AI assistant. But in practice, decisions, especially in areas such as admission and clinical appropriateness, are determined by multiple inputs, different interpretations of the criteria, and sometimes disagreements. A more useful way to think about AI in this context is as a layer within a broader decision-making system. This could look like a doctor making an initial judgment, an AI layer that structures and tests that reasoning using standardized clinical criteria and historical patterns, and a clear AI escalation path, as well as human oversight that remains responsible for the final decision.

Clinical Setting as a Real World Example

Clinical determination of appropriateness is a good example of how this can work in practice. In many digital behavioral health environments, clinical evaluators conduct intake assessments and make independent determinations about whether a patient is suitable for care. A more sophisticated approach introduces AI-backed layers that generate structured results based on the information collected, such as recommendations, confidence levels, clearly defined criteria, and prompts to clarify missing or ambiguous details.

In most cases, the evaluator and the AI agree. This alone is helpful because it reinforces consistency and gives the team a shared point of reference. But the most useful cases are those in which there is disagreement.

Disagreement improves the system

When the evaluator and AI disagree, a structured escalation can be triggered in which the case is passed to a second layer of AI, or supervising agent, who provides another structured perspective, along with a human supervisor who makes the final decision. The result is a layered decision that incorporates the evaluator’s original judgment, the initial AI output, a second supervisory-level AI perspective, and the supervisor’s own review, all feeding into a final decision.

Over time, these disagreements reveal patterns that improve the entire system in terms of how criteria are applied, how AI interprets cases, and where underlying rules need to be fine-tuned. This means that every disagreement improves the system by helping to calibrate doctors, refine the AI, and clarify the underlying rules.

Be deliberate about orchestration and where humans sit.

The “human involved” approach is fundamental to responsible patient care in digital behavioral health, and the term itself has become a standard way of signaling safety. A key question related to this approach is where the human is in the loop. This is because not all parts of a workflow require the same level of human involvement. For example, data organization and question generation can be strongly supported by AI. These capabilities are increasingly based on clinical patterns and structured data from specific domains. But with decisions that affect access to care or treatment planning, especially in ambiguous or higher-risk situations, humans remain the final decision makers while actively partnering with AI to challenge assumptions, expose blind spots, and sharpen their reasoning in real time.

The design choices behind this orchestration are as important as the technology itself. It is necessary to define in advance which decisions belong to AI, which to humans, and which require both to work in sequence, with clinical and technical leadership working closely together. This is particularly important when the system learns from real-world clinical input; The goal is not just to use the data, but to continually refine how it is applied to real-world decisions. And these options should be reviewed periodically as the system matures, because a robust orchestration model evolves as it learns.

Why governance is important

None of this works without strong railings. That means keeping patient data in secure, compliant environments, maintaining clear human accountability, monitoring performance over time, and creating feedback loops that allow the system to safely improve. If done well, this builds trust over time.

What this unlocks

When a system like this works well, the effects manifest in multiple dimensions. Clinically, decisions become more consistent, the rationale becomes clearer, and extreme cases are handled more rigorously. Operationally, escalations become more focused, fewer in number, and richer in detail, and higher rates of agreement between raters and AI mean clinicians can spend more time on the decisions that actually require them. And over time, the system learns from disagreements and patterns.

The next phase of AI in healthcare is to build systems where clinicians, supervisors, and AI can contribute, challenge each other, and improve the way decisions are made. The organizations that get this right will be the ones that are more intentional about where AI fits, where humans lead, and how the two refine each other over time.

Photo: Irina_Strelnikova, Getty Images

Parker Phillips is dedicated to leveraging technology and artificial intelligence to expand access to high-quality mental health care for young people. As CTO, he developed the company’s technology vision and led the adoption of AI to drive its innovative insurance-based virtual care model for anxiety and OCD. The platform is designed to enhance therapeutic impact and drive operational scale, demonstrating that a value-based model can deliver superior care and strong economics. Leveraging experience building teams and technology at Commure and Palantir, Phillips is focused on creating systems that address the critical need for accessible, evidence-based mental health treatment.

Dr. Kathryn (“Kat”) Boger is a certified child and adolescent psychologist dedicated to helping youth with anxiety and OCD through innovative, evidence-based care. She co-founded the McLean Anxiety Mastery Program (MAMP) at McLean Hospital, a nationally recognized intensive treatment program, and served as an assistant professor of psychology at Harvard Medical School. Dr. Boger has published peer-reviewed research, given national talks, including a TEDx, and trained hospitals, schools, and communities. In 2024, she was named one of the top 50 frontline heroes of digital health. She also co-founded Health at a brisk pace Expand access to timely and effective care for youth and young adults.

This post appears via MedCity Influencers program. Anyone can post their perspective on business and healthcare innovation to MedCity News through MedCity Influencers. Click here to find out how.