I am amused that people are slow to realise that large language models (ChatGPT etc) do not understand what they are saying, or that they make things up — that is, they hallucinate. Performance on “surface layer” testing does not equate to competence. Anybody who has taught medical students knows that humans are quite capable of exhibiting the same behaviour. It was one of the values of the old fashioned viva. You could demonstrate the large gulf between understanding (sense)on the one hand, and rote — and fluent rote at that — simulation on the other (garbage).
The medical educationalists, obsessed as they are with statistical reliability, never realised that the viva’s main function was for the benefit of teachers rather than learners. It is called feedback.
The medical student as ChatGPT