[DL] CFP MultiClinAI (SMM4H-HeaRD/ACL2026): Multilingual Clinical Entity Annotation Projection and Extraction Shared task
Martin Krallinger
krallinger.martin at gmail.com
Mon Feb 16 08:48:12 CET 2026
Call for Participation MultiClinAI Shared Task (SMM4H-HeaRD in the ACL
2026)
Multilingual Clinical Entity Annotation Projection and Extraction
https://temu.bsc.es/MultiClinAI/
New Multilingual-resources and Annotation guidelines are now available
MultiClinAI is the first shared task focused on (1) the automatic creation
of comparable multilingual corpora and (2) the automatic detection of key
clinical concepts (diseases, symptoms, and procedures) in seven languages:
Spanish, English, Italian, Dutch, Romanian, and Czech. MultiClinAI will be
held as part of the #SMM4H-HeaRD Workshop at the ACL 2026 conference
(online).
Key information:
-
Web: https://temu.bsc.es/MultiClinAI/
<https://mailtrack.io/trace/link/b82969cdff5285721c515b5fe3b2af2c28abd34e?url=https%3A%2F%2Ftemu.bsc.es%2Fdistemist%2F&userId=20950&signature=6d043df376d10380>
-
Data:
<https://mailtrack.io/trace/link/5ccb81bce7b07971c4c79b11f735aa6eee7cb0f8?url=https%3A%2F%2Fdoi.org%2F10.5281%2Fzenodo.6408476&userId=20950&signature=92636c599716c6f5>
https://doi.org/10.5281/zenodo.18508039
-
Annotation guidelines:
<https://mailtrack.io/trace/link/92a08d209c6530f84b412c380ae19e8760b8d31b?url=https%3A%2F%2Fdoi.org%2F10.5281%2Fzenodo.6458078&userId=20950&signature=3e0d63151f201b2b>
https://zenodo.org/records/13151040
-
Registration:
<https://mailtrack.io/trace/link/0dfb76782b3e7e5cd3ca2e60e694cabccb150191?url=https%3A%2F%2Ftemu.bsc.es%2Fdistemist%2Fregistration%2F&userId=20950&signature=2bfdd9856c072a69>
https://temu.bsc.es/MultiClinAI/registration/
Motivation
There has been considerable progress in clinical language technology
solutions, resulting in a variety of highly relevant practical applications
to cope with the growing amount of biomedical and healthcare-related data
and unstructured content sources. In particular, the automatic extraction
of key clinical entities such as diseases, symptoms, and procedures is
extremely valuable for most clinical data analytics and predictive modeling
use cases. Nevertheless, there are very few high-quality annotated corpora,
datasets, and annotation guidelines available for the training or robust
evaluation of advanced NLP- or LLM-based clinical entity recognition
systems, which are typically limited to monolingual scenarios covering only
a single language.
Thus, there is a clear need to foster more efficient strategies to generate
not only annotated datasets in multiple languages, but also to ensure that
they align in terms of the underlying annotation criteria, in order to
generate comparable labeled datasets across languages and promote comparable
entity extraction systems across languages. Multilinguality is clinically
important because healthcare systems document patient information in local
languages. However, most clinical NLP tools are developed primarily for
English, limiting their applicability in non-English-speaking contexts.
Developing multilingual models helps reduce linguistic bias and improves
the global applicability of clinical language technologies. Such models
enable more equitable AI deployment across different regions and healthcare
systems.
Multilingual clinical NLP has numerous important use cases including:
- In international clinical trials, it can be used to extract structured
data from trial sites across different countries and to ensure consistent
outcome definitions across languages.
- For cohort identification, it enables the identification of eligible
patients from unstructured electronic health records (EHRs) and the
extraction of phenotypes for observational studies.
-In disease surveillance, multilingual systems can help detect rare
diseases or emerging health trends and identify post-marketing drug safety
signals.
- For epidemic monitoring, they support the early detection of infectious
disease patterns and the analysis of multilingual emergency department
notes.
-In cardiovascular and chronic disease monitoring, such systems can track
symptom progression across large multilingual datasets and study treatment
adherence patterns.
- They also contribute to data standardization by converting free text into
structured, interoperable datasets and enabling the secondary use of EHR
data.
- Finally, multilingual knowledge graphs can link extracted entities across
languages and support federated learning across institutions.
In this context, the MultiClinAI (Multilingual Clinical Entity Annotation
Projection and Extraction) shared task addresses the creation and
evaluation of comparable multilingual clinical resources across seven
languages, focusing on three key entity types: diseases, symptoms, and
procedures.
-
MultiClinNER subtask: multilingual clinical named entity recognition
across expert-annotated gold-standard datasets.
-
MultiClinCorpus subtask: automatic generation of comparable multilingual
clinical corpora through annotation projection techniques.
This setup will enable a robust benchmarking scenario for multilingual
clinical NLP approaches.
Schedule
-
MultiClinAI Shared Task – training set release (February 6, 2026)
-
MultiClinNER test set release (March 18, 2026)
-
MultiClinCorpus test set release (March 25, 2026)
-
MultiClinNER test set prediction submissions (March 27, 2026)
-
MultiClinCorpus test set prediction submissions (April 9, 2026)
-
Result / evaluation returned to teams (April 14, 2026)
-
Participant proceedings due (April 24, 2026)
-
Notification of acceptance (May 15, 2026)
-
Camera-ready papers due (May 25, 2026)
-
ACL Proceedings due (hard deadline) (June 1, 2026)
-
Workshop (online) (July 2–3, 2026)
Publications and SMM4H-HeaRD in the ACL 2026 workshop
Teams participating in MultiClinAI will be invited to contribute a systems
description paper for the ACL 2026 Working Notes proceedings and a short
presentation of their approach at the ACL 2026 workshop (online).
Main Organizers
-
Salvador Lima-López, Barcelona Supercomputing Center (BSC), Spain.
-
Fernando Gallego-Donoso, Barcelona Supercomputing Center (BSC), Spain.
-
Jan Rodríguez-Miret, Barcelona Supercomputing Center (BSC), Spain.
-
Judith Rosell, Barcelona Supercomputing Center (BSC), Spain.
-
Martin Krallinger, Barcelona Supercomputing Center (BSC), Spain.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.zih.tu-dresden.de/pipermail/dl/attachments/20260216/475a38e7/attachment-0001.htm>
More information about the dl
mailing list