Select Page

There is a lot of buzz around artificial intelligence and medical images and confidence in Artificial Intelligence (AI) driven diagnosis from radiology images, with some people even declaring that the radiologists will not be needed in the next 5 years.

Is this possible?

I am a trained vascular surgeon and, in the last 5 years, I turned into an artificial intelligence scientist. So I have a “schism” inside me that has split my brain in two.

The doctor inside me screams, “How can we use machines to replace people that need 10 years of specialised training to perform the very important and responsible task of diagnosis that affects a person’s life? “

But the AI scientist part of my brain whispers to me, “This demands a calm approach, driven by data and evidence. Data talks, everything else walks… Carefully look at AI results and then come back to the conversation.”

So let’s take a look into the data and then make a decision.

One approach is to build a deep neural network that will diagnose, and to dissect the neural network’s brain to understand how it analyzes data. For the sake of this experiment I will use publicly available data from last year’s Kaggle competition on estimating the end systolic and end diastolic volume of the left ventricle of the heart. These volumes are important parameters because they determine how well the heart pumps the blood and as a result determine if there is heart failure or not. End systolic and end diastolic volume takes a radiologist few minutes to do but it is repetitive and can lead to cognitive fatigue. Imagine doing same measurement 20-30 times a day as a cardiovascular radiologist, which creates a lot of cognitive burden. If you wish to learn more details about the technical implementation of our convnet for this task please go to then end of this article.

This is how the MRI scan looks like for a radiologist looking at it:

So what does the deep neural network see? I mean how can it take the medical scan images and output a number which is close to the end systolic volume by just looking into the images of the scan?

A trained (for many years) radiologist would look into the contours of the heart in different slices and then make a calculation of the volume for the slice and again for the next slice and so on, to add up the calculated volumes for the final number. Sounds arduous? Well, it is.

But what does the neural network “see” to do that?

In other words, if the neural network could speak to us, what would it tell us about its “methodology” to calculate the end systolic volume of the heart?

The problem with neural networks is that, there is no direct way to explain how and why they work. This is the reason behind lots of fear about their potential to do harm. Their interpretability is very difficult although there are lots of efforts to explain them.

In our effort to make the interpretability transparent we looked into the activations of neural network’s different layers, which is a way to indirectly ask “what are they paying attention to” when they go through an image. (Nerd spoiler alert: deep learning is a multilayered stack of simulated neurons that have convolution and pooling functions as inputs for the activation functions of their synapses i.e. the connection to the next layer of neurons).

These activations are like filters that have a preference for specific features of the image and propagate their preferences down the road for the next “neuronal” layer. By looking into these activations we might have an indirect explanation of what the “AI radiologist” sees.

So let’s dissect the AI radiologist’s brain layer by layer and see what specifically in the data is stimulating it.

The layers of a deep neural network are designed to “mimic” the columnar architecture of a mammal’s visual cortex in an abstract way at least.

Visualization of cortical columns of a mouse brain (picture from this paper). By neuroscience nomenclature, we define the top layers as the cortex layers (upper layers in the image above) and the bottom layers as the deep cortical layers in the brain. The same nomenclature persisted in deep neural network design, where the layers that are the bottom ones in the code are actually called the “top layers” of the deep neural network.

Let’s take a look into the activations of some of the bottom layers of our AI radiologist.

5th layer from the bottom

11th layer

Another activation from the 11th layer

These bottom layers’ activations might be actually focusing on the heart structure directly.

To compare it with how a mammalian brain works, let’s take a look into a seminal work from Allen institute. They recorded the visual cortex activity of a mouse brain and created a representation of the response to different visual stimuli. They specifically mention that:

“Taking cues from biology, the most powerful computational algorithms for this task are layered artificial neural networks. These networks use a feedforward structure composed of “simple” and “complex” feature summation and pooling units. Examples include HMAX and deep convolutional neural networks.”

They also mention that the models we have right now do not seem to represent the actual neuronal patterns they have been observing in mice.

So what did they find during this approach?

(A) Comparison of the receptive field of a mouse LGN cell using the mean firing rate. At one of 16 Å~ 8 spatial pixels, (A, Upper) a black or (A, Lower) white square was flashed onto an otherwise gray screen. Column 1 shows the spatial receptive field recorded during the test period: a cell with distinct (Upper) off and (Lower) on subfields. Column 2 shows spatial receptive field recorded during the training period from which the models are constructed. The explained variance between test and training sets is R2 =0.51. In column 3, the holdout set responses of a single LNP model trained barely reconstruct the visual stimulus (R2 =0.02), whereas a cascade model with multiple LNP channels in column 4 has a vastly better performance with R2 =0.50.

In plain english, they looked into the activations of a mouse brain’s receptive field to a visual stimuli based on the activations of the neurons and created the above shown visualizations.

The bottom activations of our deep neural network look similar, to a frightening degree, to the receptive fields of a mouse brain from the Allen Institute’s paper. Any direct correlation would be computationally wrong, but they do look similar.

Let’s now go into higher activation layers of our AI radiologist’s brain…

Layer 18

This image looks like it activates mostly on the outline of the ventricle of the heart along with other structures around it.

This layer seems to be focusing on the aortic arch

This layer seems to be focusing on the pulmonary tissue.

So what might be really happening in our deep neural network is that the layers work as filters that activate on specific parts of the medical scan. When we request that the neural network correlates these activations of the layers with a specific output e.g. the systolic volume of the heart, then, through all the iterations, it tries to correlate which layers seem to be more important for the task at hand and it gives them a higher weight on the overall process.

This is one of the reasons that if we provide the region of interest (e.g. in this case focus on the heart) then the deep neural network will create better results. It will be able to create a better representation of the heart and then correlate this to the output instead of having to encode more structures that are maybe irrelevant to the task.

This is how it looks though the eyes of an “AI radiologist”. It builds a representation of the medical scan and correlates these representations with the output, in an effort to find the ones that might be better at predicting the needed end result. This is very exciting!

On the other hand we will need a person to interpret the results of a medical scan and take a responsibility over a life. It is not the interpretation only in medicine. It is the “take the responsibility over your decision” factor that plays a huge role. Legally, solely relying on an AI algorithm to give crucial life-dependent diagnostic outputs is very difficult. In case, the output is wrong who is responsible for it: the radiologist, the hospital or the company which has built the AI algorithm.

Nevertheless and based on the above data,

I believe that we live in the most exciting time in history. We can build algorithms that look like our inner brain workings and apply them to very difficult problems with very good results. Anyone can use a GPU on their home desktop and recreate the above results.

We still have a long way ahead of us to find the algorithms that exactly mimic the neurons of our brain. On the other hand we can also argue that

…we have built flying machines that do not actually flap their wings but mimic the general principles of flying. We might do the same on our path to build artificial intelligence brains.

One thing is for sure: An exciting future is ahead of us!

A gif picture of all the activations from the neuronal layers from our “AI radiologist”


We downloaded the data from Kaggle competition for ESV/EDV prediction through MRIs of the heart.  There are 8 GBs of MRI scans of the heart with one to one output between the scan and the end systolic and diastolic volume. There are lots of scanned slices but we will use only the SAX ones which are the ones that radiologists use for the measurement of the volumes of the ventricles of the heart (around 135,000 images in total).

We then built and trained a convolutional neural network. with 21 layers leading to an output of a number, essentially a convnet regression. We used augmentations on the original dataset (slight rotations and shifts) to avoid overfitting and used dropout on the dense layer of the network. We trained for 50 epochs on iterations of the augmentations of the original dataset and achieved a mean absolute error on the validation set (20% of the data) of 14 ml for the end systolic volume and 33 ml for the end diastole volume. Validation set loss stopped improving at some point while test loss kept on improving so we used early stop to avoid overfitting.

If you want our pre trained network and the weights to do transfer learning on your own MRI dataset please contact us at