Logo

WG-CE Workshop
Eva Weicken
04:53
Welcome to the first Workshop of the working Group on Clinical Evaluation!
Eva Weicken
05:22
This is the link to the updated agenda:
Eva Weicken
05:29
https://docs.google.com/document/d/1KLME9WQTyMsMr4e__n1ZyYZumHXO0wVtBg41CIBcotQ/edit
Irv Loh
05:47
Good morning, afternoon, or evening All
Matthew Fenech
12:22
would you be able to repost the agenda, please? Many thanks
Eva Weicken
14:33
https://docs.google.com/document/d/1KLME9WQTyMsMr4e__n1ZyYZumHXO0wVtBg41CIBcotQ/edit
Matthew Fenech
14:49
Thank you!
Irv Loh
15:23
Irv Loh MD
Michał Kurtys
15:51
Michal Kurtys, MD, Data Scientist and Programmer @ Infermedica
Jesse Ehrenfeld (AMA)
16:19
Jesse Ehrenfeld MD, Anesthesiologist / Informatician & Board of the American Medical Association
Audrey Menezes
16:22
Audrey Menezes - GP, Clinical Lead at Your.MD
Anastacia Simonchik
16:30
Anastacia Simonchik, Product manager Visiba Group AB
Agata Piekut
16:37
Hi, I'm Agata - I run digital skills programs and research on patient engagement (Poland/NGO sector)
Kassandra Karpathakis
16:54
Hello! I'm Kassandra Karpathakis, Head of AI Policy in the UK NHS AI Lab. Background in product for health and evaluation of health tech. Interested in how to teach and support tech and design teams learn how to evaluate health technology.
LUÍS LAPÃO
16:56
Luís Lapão. Professor of Health Information Systems, Universidade Nova de Lisboa. Coordinator CPLP Telemedicine Network.
Arnav Mahajan
16:57
Hi there! My name is Arnav Mahajan, I am a 4th year medical student studying in Ireland. I have an interest in the intersection of global surgery and digital health
Kyle Thomson
17:32
Kyle Thomson - Senior Attorney for Digital Health at the American Medical Association
jane elizabeth carolan
17:33
Hi All, I'm a Health Economist at University College London, based at a medical software company in Oxford. I have a particular interest in better understanding how comparative efficacy will be evaluated as this pertains to my work. Further, an interest is AI/ML Software as Medical Device (SaMD) - both fixed and continuous learning algorithms. Thank you for the opportunity to be here.
Daniel Fürstenau
17:42
Hi. I am Daniel Fürstenau. I am an Assistant Professor at the Copenhagen Business School Department of Digitalization, also an Affiliated Professor at the Einstein Center Digital Future in Berlin and the FU Berlin. I am currently running work with my colleagues from Charité and the Einstein Center as well as an exploratory project on AI in nursing care. I look for potential collaborations and thank Johannes Strahlinger for making this group known to me.
Robert Gaudin
17:47
Hi everyone I am Robert Gaudin, Medical Doctor and Dentist, founder of startups in Medical imaging and am resident at Charité OMFS Department
Nijuscha Gruhn
17:49
Nijuscha Gruhn @DermaDigital, currently planning/outlining a clinical trial for our NLP&MD app
Pradeep Balachandran
17:56
Hello Everybody, I am Pradeep Balachandran working as technical consultant in the domain of digital health. I am a member of the DAISAM WG within the ITU AI4H FG. Looking forward to be a part of and contribute to the objectives of WG-CE.
Irv Loh
17:57
Irv: cardiologist, CMO/cofounder of Infermedica, clinical research, practice 2 days a week, work with Shubs on AI4H TG on symptom assessment, chair tech & innovation committee for Calif Chapter of American College of Cardiology
Jana Fehr
18:00
Jana Fehr, PhD candidate at ML&Digital Health group at HPI (Potsdam). Also member of FGAI4H DAISAM group. Evaluated CAD4TB, an algorithm for Tuberculosis detection and triaging. Happy to learn more about clinical validation.
Stephen Gilbert
18:15
Stephen Gilbert, Director of Clinical Evaluation, Ada, Berlin.
Matthew Fenech
18:15
Hi everyone, I’m Matt Fenech, Medical Safety Lead at Ada Health. My background is in hospital medicine, but over the last 4 years I’ve been involved in policy development/regulatory issues around the use of AI in healthcare. I am also trained in Computational Cognitive Science.
Martin Cansdale
18:51
Martin Cansdale, Principal Data Scientist at Your.MD, and involved in the Symptom Assessment Topic Group.
Xiao Liu
20:13
Hi everyone - Xiao Liu, ophthalmologist in UK, interested in all parts of clinical evaluation, reporting of evidence, safety and post market surveillance. I co-led SPIRIT-AI and CONSORT-AI (to be discussed today)
Mathan Karuppiah
20:15
Hi, This is Dr.Mathan , MPH Scholar at ICMR India. Co founder of a Digital health startup . Interested in Role of AI in real world healthcare problems.
Lauren
20:18
Lauren Willgeroth (Berlin, Germany) - I’m in Medical Device Consulting, mostly for start-ups. Looking to understand clinical context and regulatory implications of ML products
Jum'atil Fajar
21:21
Jum'atil Fajar, Head of Medical Care Department in District General Hospital in Indonesia. Learning about AI and COVID-19.
LUÍS LAPÃO
22:51
AI: First step is to promote the quality of health data. Otherwise we will keep working with very smaller samples.
Steffen Vogler
26:09
Hi everyone, I’m Steffen Vogler, a neurobiologist and computer vision expert working at Bayer as a Senior Data Scientist in “Decision Science for Radiology”. Also participating in AI4H “Assessment Platform” with Marc.
Xiao Liu
34:50
SPIRIT-AI and CONSORT-AI initiative websitewww.clinical-trials.aiSPIRIT-AI paperhttps://doi.org/10.1038/s41591-020-1037-7CONSORT-AI paperhttps://doi.org/10.1038/s41591-020-1034-xSTARD-AI in development (for studies of diagnostic accuracy)www.nature.com/articles/s41591-020-0941-1TRIPOD-AI in development (for AI prediction and prognostic models)https://doi.org/10.1016/S0140-6736(19)30037-6
Xiao Liu
01:10:00
apologies, I have to drop off. Really looking forward to continuing discussions with this group! thanks
Shubhanan Upadhyay
01:10:14
Thank you!
Naomi Lee
01:10:17
Thank you Xiao
Eva Weicken
01:11:49
Thank you so much for joining, Xiao
Shubhanan Upadhyay
01:17:43
Eva - are you able to share the presentation from your screen?
Shubhanan Upadhyay
01:17:47
I seem to have. A block
Monique Kuglitsch
01:23:19
It’s wonderful to see so many people interested in clinical evaluation for AI for health! We would love to have you share your experience and expertise with the ITU/WHO Focus Group on AI for Health. To become a member, please follow the instructions outlined in our onboarding document: https://itu.int/en/ITU-T/focusgroups/ai4h/Documents/ITU_WHO_AI4H_Onboarding.pdf
Monique Kuglitsch
01:26:00
(**Note, even if you are already involved with a working group—e.g., on regulatory considerations—you still need to follow these steps to become a Focus Group member and to receive news/updates, access documents, or join meetings)
Naomi Lee
01:27:24
Thanks Monique!
Monique Kuglitsch
01:31:55
🤓
Naomi Lee
01:35:38
Please feel free to use the chat for questions to our excellent panel!
Oommen John
01:39:05
RCTs typically look at primary and secondary endpoints from a very narrow clinical outcomes perspective, wouldnt clinical evaluation of AI using the traditional approach not limit the scope of its potential. What are the gold standards against which the outcomes in AI based approaches will be measured?
Daniel Fürstenau
01:39:22
+1
Daniel Fürstenau
01:40:00
Rethink the "clinical"
Luis Oala
01:41:29
+1
Jesse Ehrenfeld (AMA)
01:42:19
Don't forget the patient too!
Nijuscha Gruhn
01:42:22
I wondered what would be an adequate placebo for AI Systems. would devlopers need to develop a second system on "random data"?
Stephen Gilbert
01:42:41
Question for panel: I propose that the current and in-progress and published guidelines, as described by Dr. Xiaoxuan Li, are enough for now. They need time to bed in but new guidelines on top are not needed. Would this group not be best contributing to the international ‘elephant in the room’ problem: Clinical Evaluation (not only studies) in light of the algorithm/software change problem, as identified by Thomas Wiegland in his introduction. We should research, contribute-to, and lobby for a good solution to this problem, so that the field can advance in a positive and clinical evidence supported way.
Luis Oala
01:42:58
what about potential and limits of e.g. in-silico clinical studies, use of statistical guarantees of ml models for risk analysis etc
Naomi Lee
01:43:06
Would you want a placebo control or a 'standard of care' control?
Irv Loh
01:43:30
The only way to know your AI is useable is to study it retrospectively against the clinical “gold standards”, i.e., known and validated diagnoses. Look at how predictively accurate your AI was in making that assessment. This has been a pivotal discussion in our AI for Symptom Assessment Topic Group, as Shubs will attest! We don’t need to make this process more complicated than necessary since the technologies itself is complex enough.
Matthew Fenech
01:45:07
+1 to Naomi’s point—need to prove superiority (or at least non-inferiority, but some other benefit) to current standard of care
Naomi Lee
01:45:17
Are there use cases where randomised evidence is appropriate and others where observational data is enough? Or perhaps others where retrospective studies of accuracy are ok? How would we decide what those are?
Stephanie Kuku
01:46:39
Retrospective data can never be enough - the performance has to be tested prospectively on real world data in the intended clinical pathway to truly test performance and reveal unintended consequences
LUÍS LAPÃO
01:47:00
Gold standards comparisons are fundamental. But the use of AI in clinical setting is mostly na organizational issue. Does it will imply a change in the practice? a change in the relationship with your patients?
Irv Loh
01:49:05
The keyword is “clinical”. There are too many unknowns that cannot be quantified or even identified. Artificial models thus do not reflect reality. So we need to train AI systems on retrospectively validated data, but tested prospectively. This is a global strategy for almost all scenarios, not just bespoke applications.
Arnav Mahajan
01:49:21
Hi Stephanie! I am from Thailand and have had conversations with some physicians in the country that had interacted with the Google AI tool from a distance. A few of them suggested that there must be a process of reverse innovation that must occur in which all digital tools/AI should be developed in low-resource settings first, and only after defining success in these settings should they be scaled up to wider settings. What are your thoughts around this idea? I know Johannes had talked a little about the "Who" and mentioned that a creation of a data set in India may not be applicable for scenarios in Germany. But the reverse is true as well in which a data set created in a high-resource setting will not be very applicable in potentially rural settings in India. Is there a need of prioritizing in which countries/resource settings are we collecting our data?
Matthew Fenech
01:50:50
It would also be good to get the panel’s perspective on how we deal with the ‘low evidence’ situation, such as that currently being experienced with the COVID-19 pandemic (especially at the beginning of the pandemic earlier this year). There’s been a lot written about how AI has ‘disappointed’ in terms of having a meaningful impact on the management of the pandemic (and there’s a lot to unpack there). But two questions emerge in my view: 1) how do we do better next time? 2) how do we gain trust for interventions which necessarily need to be released rapidly and perhaps with low clinical evidence? (My argument has been for extremely robust/rapid/responsive PMS, but I’d be keen to hear other views)
Irv Loh
01:52:53
RCTs are tools to assess interventions, e.g., drugs or devices (though cannot be blinded). AI is not an intervention, but a process, so more difficult to assess via RCTs. AI is adjunctive to clinical work, not a competitive or alternative.
Matthew Fenech
01:59:19
This working group should also remain mindful of the greatest incentive for doing anything there has ever been: money (am I being slightly too cynical? :) ) What are the requirements set by the DiGA in Germany? How did the CMS determine reimbursability of Viz.ai’s tool? We should ensure we learn from the experience of decisions taken already.
Stephanie Kuku
02:02:44
No Matthew - I am just as cynical as you. Hence the crux of clinical evaluation for AI SaMD should be DO NO HARM and CAN WE AFFORD THIS BASED ON OUR HEALTHCARE PRIORITIES?
LUÍS LAPÃO
02:03:16
The issue of the validity of AI is similar to the use of any Electronic Health Records. In most places EHR usage is a kind of wild!
Shubhanan Upadhyay
02:03:40
An overarching question is this: prospective clinical studies are high trust but are slow. Sometimes by the time a study is published, that version is already out of date. On the other side, internal benchmarking by AI developers occurs quickly but is not trusted. The aim of the Focus group is to have a trusted, standardised benchmarking framework on independent, representative data that can give contextual metrics and answers to decision makers.
Stephanie Kuku
02:03:43
VIZ.AI Tool was based on time saving from a prospective well designed study in the intended clinical pathway - n <200
Shubhanan Upadhyay
02:04:07
The question is that is there a place in the clinical evaluation spectrum for this?
Robert Gaudin
02:04:32
Is there actually a „gold standard“ for training an algorithm if there is no possibility to get ground truth data? Tripple or multiple annotations?
Naomi Lee
02:04:47
Yes Shubs. What exists between internal benchmarking and an RCT!
LUÍS LAPÃO
02:06:39
I am currently working on the translation of ISO/TS 13131 on Telemedecine. Most of this issues are already there.
Luis Oala
02:07:26
@robert: i think the thriving discourse on missing ground truths in ai evaluation brings to light an interesting issue in healthcare overall, also non-ai: low inter-annotator agreements
Thomas Wiegand
02:08:24
Dear all, great meeting! I have to attend another meeting. Best, Thomas
Luis Oala
02:08:37
e.g. histopathology, diabetic retinopathy
Naomi Lee
02:08:38
Thanks Thomas!
Eva Weicken
02:08:58
Thanks, Thomas for joining!
Shubhanan Upadhyay
02:09:43
And slow!
Naomi Lee
02:10:01
And in selected populations!
Kassandra Karpathakis
02:10:15
RCT use needs to be tempered with the evidence requirements nationally and locally
Oommen John
02:10:48
Also Gold standard evidence out there might not be free of intrinsic biases..a lot of evidence has been generated by "interested parties"...this is why we see oscillating evidence..
Oommen John
02:14:05
Fluids in Sepsis, Tight versus loose glycemic control and its impact long term outcomes...how does one now bench mark an algorithm against these shifting goal posts?
Matthew Fenech
02:15:45
I’m not sure we can be so prescriptive that AI is a ‘only’ a process. Besides Stephanie’s examples, there are examples of therapies provided by AI e.g. chatbots in mental health. Couching AI in terms of healthcare interventions leads naturally to the required discussions we need to have with patients/users about risk-benefit balances. There’s an opportunity here to incorporate this work into the wider efforts to empower patients/users.
Oommen John
02:17:58
Or Affective computing that helps identify emotions to triage those at risk for depression (or mental health disorders) how does that benchmark against the PHQ9/12?
Jesse Ehrenfeld (AMA)
02:18:55
Remember, most currently deployed AI today in healthcare is focused on improving healthcare operations. Interesting that we're now seeing products like viz.ai that are blending clinical interventions with healthcare operations to show a benefit
Shubhanan Upadhyay
02:18:57
Basically we want to continue this conversation in the actual deliverable so please contribute!
Luis Oala
02:23:28
can you put spreadsheet link in chat(:?
Luis Oala
02:23:38
i tried clicking my zoom screen frantically
Luis Oala
02:23:41
:P
Johannes Starlinger
02:25:54
+1 Luis
Luis Oala
02:29:02
a question to all and the co-chairs in particular: would it be possible to have a joint subgroup where members of this new wg work together with members of wg-daisam to evaluate actual use cases? we would love to have this type of input!
Ferath Kherif
02:35:06
Good idea Luis !
Eva Weicken
02:35:15
That’s a great point, Luis!
Shubhanan Upadhyay
02:36:32
Go for it MAtt
Naomi Lee
02:37:01
Let me know here if you would like to comment.
Arnav Mahajan
02:37:07
Considerations for LMICs?
Naomi Lee
02:37:14
Or just shout out!
Eva Weicken
02:37:27
This is the link for assignment for the subgroups:https://docs.google.com/spreadsheets/d/1GQiJNG1Dg8QKUjQaRGhuekdopKL3o7jhSXdIllE7NmM/edit#gid=0
Eva Weicken
02:39:13
Here is the link to the link to DEL07.4 (ITU-registration required) https://extranet.itu.int/sites/itu-t/focusgroups/ai4h/wg/_layouts/15/WopiFrame.aspx?sourcedoc=%7BDC981460-DD88-4C71-8AD7-1A000D5A9DEB%7D&file=DEL07_4_meeting_J.docx&action=default&CT=1602607095869&OR=DocLibClassicUI
Naomi Lee
02:39:13
Thanks Eva!
Johannes Starlinger
02:40:04
Make sure to check out the FDAs "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) - Discussion Paper and Request for Feedback"" at https://www.fda.gov/media/122535/download
Shubhanan Upadhyay
02:40:14
Arnav - the global perspective and LMICs - absolutely key, this is in our list of considerations. Making sure that the independent test datasets are representative of different nuances and contexts - and so need input from all settings - esp LMICs. There are even context changes within a country, or even within a city
Johannes Starlinger
02:40:25
... especially regarding Post Deployment Clinical Evaluation
Agata Piekut
02:42:59
@Eva what's the deadline to fill the spreadsheet? My specialization can be connected to both subgroups and I need a moment to consider where I can help most.
Arnav Mahajan
02:43:50
@Shubhanan, that’s great to hear thanks! You’re completely right on the fact that there are tons of context changes even within a city!
Eva Weicken
02:44:30
Sure, that’s great. There is no particular deadline
Agata Piekut
02:44:41
Thank you
Eva Weicken
02:44:58
Thank you for contributing
Shubhanan Upadhyay
02:45:40
++ to Stephen and Naomi
Shubhanan Upadhyay
02:46:14
@arnav - will be great to have you contribute to make sure we follow through with this
Eva Weicken
02:50:55
https://docs.google.com/spreadsheets/d/1GQiJNG1Dg8QKUjQaRGhuekdopKL3o7jhSXdIllE7NmM/edit#gid=0
Eva Weicken
02:51:05
The link for assignment
Shubhanan Upadhyay
02:51:17
Thank you everyone for your engagement and input
Ferath Kherif
02:51:30
thanks
Jesse Ehrenfeld (AMA)
02:51:33
Terrific session. Nice to "meet" everyone! :-)
Audrey Menezes
02:51:35
Thank you all - this has been wonderful
Naomi Lee
02:51:38
Thank you all!
Arnav Mahajan
02:51:39
Thank you!
Johannes Starlinger
02:51:41
Thanks to everyone!
Agata Piekut
02:51:46
Thank you!
Stephanie Kuku
02:51:48
Thanks all
LUÍS LAPÃO
02:51:49
Thanks. Great meeting!
Kassandra Karpathakis
02:51:52
Thank you for the invitation
Oommen John
02:52:06
thank you!
Shan Xu
02:52:23
congratulation to the progress and wonderful discussion, thank you all.
ALIXANDRO WERNECK
02:52:41
Thanks for all!
Luis Oala
02:52:46
thanks shubs, naomi, eva!
Ines Sousa
02:52:49
Thanks
Eva Weicken
02:52:49
Thank you all
Anastacia Simonchik
02:52:50
Guys please send out the links and the recording! Thank you for the great meeting! Bye :)