HEAL @ CHI 2025
Human-centered Evaluation and
Auditing of Language Models

Yokohama, Japan | April 26, 2025

Submission Deadline: February 17 February 24, 2025 (AOE)

Submission Site

Overview

HEAL is back for CHI 2025! This workshop aims to address the current ''evaluation crisis'' in LLM research and practice by bringing together HCI and AI researchers and practitioners to rethink LLM evaluation and auditing from a human-centered perspective. The recent advancements in Large Language Models (LLMs) have significantly impacted numerous and will impact more, real-world applications. However, these models also pose significant risks to individuals and society. To mitigate these issues and guide future model development, responsible evaluation and auditing of LLMs are essential.

The CHI 2025 Workshop on Human-centered Evaluation and Auditing of Language Models (HEAL@CHI'25) will explore topics around understanding stakeholders' needs and goals with evaluation and auditing LMs, establishing human-centered evaluation and auditing methods, developing tools and resources to support these methods, building community, and fostering collaboration

Special Theme - Mind the Context: For this year’s HEAL, we introduce the theme of ''mind the context'' to encourage attendees to engage with specific contexts in LLM evaluation and auditing. This theme involves various topics: the usage contexts of LLMs (e.g., evaluating the capabilities and limitations of LLM applications in mental wellness care, or translation in high-stakes scenarios), the context of the evaluation/auditing itself (e.g., who are using LLM evaluation tools, and how should we design these tools with this context in mind?), and more. We purposefully leave ''context'' open for interpretation by participants, so to encourage diversity in how participants conceptualize and operationalize this key concept in LLM evaluation and auditing.

Keynote Speakers

Afternoon Keynote Speaker

Dr. Su Lin Blodgett

Dr. Su Lin Blodgett is a senior researcher in the Fairness, Accountability, Transparency, and Ethics in AI (FATE) group at Microsoft Research Montréal. She is broadly interested in examining the social and ethical implications of natural language processing technologies; She develops approaches for anticipating, measuring, and mitigating harms arising from language technologies, focusing on the complexities of language and language technologies in their social contexts, and on supporting NLP practitioners in their ethical work. She has also worked on using NLP approaches to examine language variation and change (computational sociolinguistics), for example developing models to identify language variation on social media.

Afternoon Keynote Speaker

Dr. Gagan Bansal

Gagan Bansal is a researcher at Microsoft Research, where he is part of the AI Frontiers group and co-leads research on AutoGen, a framework for building multi-agent AI systems. His work lies at the intersection of Artificial Intelligence and Human-Computer Interaction, with a focus on making AI systems more capable, interactive, and useful to people. Before joining Microsoft Research in 2022, Gagan completed his Ph.D. in Computer Science at the University of Washington, advised by Dan Weld. At UW, he was part of the Lab for Human-AI Interaction, where he studied how AI systems can complement human decision-making. At Microsoft, Gagan has been a driving force behind several open-source agentic projects, including:

  • AutoGen, a widely adopted framework for multi-agent applications
  • AutoGen Studio, a low-code interface for creating agentic workflows
  • Magentic-One, a state-of-the-art multi-agent team for solving complex tasks
  • MarkitDown, a tool for converting large sets of files to markdown for LLMs

Agenda

The primary goal of this one-day workshop is to bring together HCI and AI researchers from academia, industry, and non-profits to share their ongoing efforts around evaluating and auditing language models.

All times displayed in the program are in local time (Yokohama, Japan).

Key Information

Submission deadline: February 17, 2025 (AoE) Extended to February 24, 2025 (AoE)

Notification of acceptance: March 17, 2025 (AoE) Extended to March 27, 2025 (AoE)

Workshop date: April 26, 2025

Workshop location: Yokohama, Japan (Hybrid)

Contact: heal.workshop@gmail.com

Call for Participation

We welcome participants who work on topics related to supporting human-centered evaluation and auditing of language models. Interested participants will be asked to contribute a short paper to the workshop. Topics of interest include, but not limited to:

Special Theme: Mind the Context. We invite authors to engage with specific contexts in LLM evaluation and auditing. This theme could involve various topics: the usage contexts of LLMs (e.g., evaluating the capabilities and limitations of LLM applications in mental wellness care, or translation in high-stakes scenarios), the context of the evaluation/auditing itself (e.g., who are using LLM evaluation tools, and how should we design these tools with this context in mind?), and more. The term ''context'' is left open for interpretation, so to encourage diversity in how this this key concept is conceptualized and operationalized by workshop participants. Papers under this theme will be given a dedicated lightning talk session, as well as a special spotlight during the workshop's poster session.

Submission Format: 2 - 6 pages ACM double-column, excluding references.

Submission Types: Position papers, full or in-progress empirical studies, literature reviews, system demos, method descriptions, or encore of published work. The submission will be non-archival.

Review Process: Double-blind. Papers will be selected based on the quality of the submission and diversity of perspectives to allow for a meaningful exchange of knowledge between a broad range of stakeholders.

Templates: [Word] [LaTex] [Overleaf]

Notes:

→ Submission Site

Organizers