Pathfinder MineClever: process mapping and prospects for implementation of a clinical and epidemiological data mining tool.

 

The Host Study

MineClever Logo

MineClever: Clinical and Epidemiological Information Miner for Emerging Respiratory Viral Diseases is an artificial intelligence tool developed through collaboration amongst researchers from Fiocruz Pernambuco, Scientific Computing Programme (PROCC) of Fiocruz Rio de Janeiro, Johns Hopkins Hospital and the Fiocruz team at The Global Health Network Latin America and the Caribbean.

Developed during the period from June 2024 to September 2025, the project seeks to mitigate a critical barrier identified during the COVID-19 pandemic by the study NeuroCOVID (2020-2023) the lack of access to structured clinical data in electronic health records, failing to meet not only care needs but also the scientific community's quest to generate evidence to guide rapid decision-making for pandemic control. Thus, the MineClever tool was designed to extract and analyse primary clinical and laboratory data from acute viral respiratory syndromes in electronic health records of the Brazilian Health System (SUS). This project was funded by the Fiocruz Innovation Programme - Public Health Emergencies (No. 1/2024 3rd Call).

How the Tool was Built

The project began with obtaining 315 electronic health records of people hospitalised with COVID-19 between April and June 2020 in two reference units in Pernambuco. Researchers manually examined 104 health records to structure standardised procedures and identify terminological variability used in clinical description, resulting in the documentation of 287 different words, acronyms or abbreviations: 60% referring to respiratory manifestations (with "dyspnoea" and "mechanical ventilation" presenting greater variability), 20% to neurological symptoms and 12% to general symptoms.

Based on this terminological structure, a programme was developed in Python language and Natural Language Processing (NLP) model with two specialised pipelines (clinical and laboratory) using Optical Character Recognition (OCR) techniques to extract textual information from health records in PDF. MineClever V1.0 was tested on 21 health records, presenting overall human-machine concordance of 68% (Cohen's Kappa 0.452), currently in the phase of identifying inconsistencies and improvement.

The programme received substantial theoretical-practical contribution from professors of the Postgraduate Programme in Health Technology at the Pontifical Catholic University of Paraná (PUC PR), resulting in MineClever V1.1 implemented on open source platforms to ensure security and confidentiality.

Flow diagram with rectangular shapes in shades of purple and lavender.MineClever Mental Map.

Home | Host study | Pathfinder | Objectives | Resources

Pathfinder

This project was supported by the Wellcome Trust (Grant 226688/Z/22/Z).

The development of MineClever has highlighted not only the potential of artificial intelligence for data mining in the healthcare sector, but also the complexity involved in creating, validating, and potentially implementing digital tools. In this context, the MineClever study’s use of the Pathfinder methodology serves as an initiative to document the tool’s development process, including the challenges, barriers, and solutions encountered along the way. By systematizing this experience, Pathfinder aims to contribute to methodological reproducibility and support future applications of the tool in other medical conditions and contexts.

General objective

To map and analyse the development process of the MineClever tool, as well as to investigate opportunities to improve its methodological framework, usability, and applicability for healthcare professionals, in addition to exploring perspectives for its implementation within Brazil’s Unified Health System (SUS).

Specific objectives

  • Retrospective Mapping and Documentation
    To retrospectively map the processes undertaken in the host study and document the development of the tool, including its barriers and solutions, to contribute to the reproducibility of the method in other health conditions.
  • Support for Prospective Development
    To support the team’s prospective work plan in refining the tool, including validation processes and the development of its front-end interface.
  • Development of Support and Guidance Materials
    To understand the usability of the tool with a view to developing support and guidance materials for the use of MineClever.
  • Feasibility of Implementation within the SUS
    To explore opportunities and challenges relating to the implementation of the tool within the SUS, taking into consideration ethical, legal, and market potential aspects associated with the tool.

Hybrid Learning Sessions – Building the Pathfinder MineClever: from concept to practice, January to February 2026

The sessions were held as part of the activities of the Pathfinder group within the Fiocruz team at TGHN LAC, with the aim of developing conceptual understanding and beginning to apply the Pathfinder methodology to the main MineClever study, through a structured process of collaborative learning.