Extracting attention values

JH Julien Heitmann
AG Alban Glangetas
JD Jonathan Doenz
JD Juliane Dervaux
DS Deeksha M. Shama
DG Daniel Hinjos Garcia
MB Mohamed Rida Benissa
AC Aymeric Cantais
AP Alexandre Perez
DM Daniel Müller
TC Tatjana Chavdarova
IR Isabelle Ruchonnet-Metrailler
JS Johan N. Siebert
LL Laurence Lacroix
MJ Martin Jaggi
AG Alain Gervaix
MH Mary-Anne Hartley
ask Ask a question
Favorite

DeepBreath is interpretable by design. At the level of the CNN model, we can plot the segment-level predictions {p(xi)}i=1T, and the attention values {g(xi)}i=1T. These values can identify the parts of the recording that are most deterministic for the prediction over the time dimension. Comparing these values to segment-level annotations made by medical doctors (identifying inspiration and expiration), we can visualize how the model interprets disease over the breath cycle and thus allow clinicians to interrogate the model’s alignment to physiology.

Every CNN audio classifier passes through a recording and computes segment-level outputs, before aggregating those intermediate outputs to return a single clip-level prediction. The duration captured by a single segment is determined by the size of the receptive field of the CNN architecture. The receptive field of the final convolutional layer has a width of 78, which corresponds to a duration of 1296 ms. For every segment, the CNN model computes an attention value g(xi) and a prediction p(xi). The attention value g(xi) determines how much the segment prediction p(xi) is attended in the overall clip-level output p(x). Plotting {g(xi)}i=1T allows us to identify parts of the recording (hence the respiration) that have a high contribution to the clip-level prediction. In order to interpret these singled-out parts, we made use of annotations of breath sounds, that were provided for the recordings from Geneva. With those annotations we can evaluate whether there is a similarity between the way medical experts label breath sounds, and the way respiration is perceived by a model (that was trained for diagnosis prediction without any knowledge of respiration phases or sounds).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A