DICOM to NumPy with Python

Chest CT from Wikipedia (Public Domain)

If you are interested in creating your own medical image data set for a machine learning project, this post is for you. You may also be interested in this post if you work with publicly-available medical imaging datasets and would like some further insight into how these datasets are created.

In this post, you will learn:

  • The basics of DICOM, the file format in which medical images are stored in health systems;

Tutorial with Code

Source: Wikipedia (Creative Commons license).

This tutorial is based on my repository pytorch-computer-vision which contains PyTorch code for training and evaluating custom neural networks on custom data. By the end of this tutorial, you should be able to:

  • Design custom 2D and 3D convolutional neural networks in PyTorch;

Step 1: Download the code.

The repository with all the code is https://github.com/rachellea/pytorch-computer-vision

If you want to follow along with this tutorial and/or use the code, you…

Tutorial with Code

This is a photo of Habitat 67 from Wikimedia Commons (license: CC)

At the end of this tutorial you should be able to:

  • Load randomly initialized or pre-trained CNNs with PyTorch torchvision.models (ResNet, VGG, etc.)

Predefined Convolutional Neural Network Models in PyTorch

There are many pre-defined CNN models provided in PyTorch, including:

Python, Git, Anaconda, Code, and NO Jupyter Notebooks

Image by Author. Sub-Images are all by author or allowed (see post for individual links).

This post describes best practices for organizing machine learning projects that I have found to be highly effective during my PhD in machine learning.


Python is a great language for machine learning. Python includes a bunch of libraries that are super useful for ML:

  • numpy: n-dimensional arrays and numerical computing. Useful for data processing.

Clinical Recommendations and Machine Learning Applications

In this image the CT slices are from Radiology Assistant. The ROC curves and upper model figure are from Mei et al. The middle model figure is from Li et al.

This post delves in to the use of CT scans in the COVID-19 pandemic, including current guidelines from medical experts (as of August 2020) and examples of recent research papers that use machine learning to make predictions from CT scans of COVID-19 patients.

Disclaimer: Nothing in this post is medical advice.

Diagnosis of COVID-19 with RT-PCR

The gold standard for diagnosis of COVID-19 is reverse transcription polymerase chain reaction (RT-PCR), which is a laboratory test that detects genetic material (RNA) from the COVID-19 virus:

Clinical Goal, Data Representation, Task, Model

Image by Author

This post provides an overview of chest CT scan machine learning organized by clinical goal, data representation, task, and model.

A chest CT scan is a grayscale 3-dimensional medical image that depicts the chest, including the heart and lungs. CT scans are used for the diagnosis and monitoring of many different conditions including cancer, fractures, and infections.

Clinical Goal

The clinical goal refers to the medical abnormality that is the focus of the study. The following figure illustrates some example abnormalities, shown as 2D axial slices through the CT volume:

Tasks, How CNNs Work, Learning, AUROC

Convolutional neural networks (CNNs) are the most popular machine leaning models for image and video analysis.

Example Tasks

Here are some example tasks that can be performed with a CNN:

  • Binary Classification: given an input image from a medical scan, determine if the patient has a lung nodule (1) or not (0)

How CNNs Work

In a CNN, a convolutional filter slides across an image to produce a feature map (which is labeled “convolved feature” in…

Simulations and Visualizations

This post offers the clearest explanation on the web for how the popular metrics AUC (AUROC) and average precision can be used to understand how a classifier performs on balanced data, with the next post focusing on imbalanced data. This post includes numerous simulations and AUROC/average precision plots for classifiers with different properties. All code to replicate the plots and simulations is provided on GitHub.

First, here is a brief intro to AUROC and average precision:

AUROC: Area Under the Receiver Operating Characteristic

The AUROC indicates whether your model can correctly rank examples. The AUROC is the probability that a randomly selected positive example has a higher…

Visual Explanations from Deep Networks

Modified from Figures 1 and 20 of the Grad-CAM paper

Grad-CAM is a popular technique for visualizing where a convolutional neural network model is looking. Grad-CAM is class-specific, meaning it can produce a separate visualization for every class present in the image:

Cells, DNA, RNA, and Protein

Image Source: Wikipedia, credit to William Crochot.

Your genome is approximately 750 megabytes of information (3 x 10⁹ letters x 1 byte/4 letters). That’s about half the size of an operating system except it codes for an entire human body, and the entire code fits into a volume a hundred times smaller than a grain of rice. Your brain, which develops as specified by your genome, is an incredible supercomputer that also happens to require less power than a dim light bulb — literally tens of thousands of times more energy-efficient than manmade supercomputers.

So how does the genome work?

Genomics, transcriptomics, and proteomics are data-driven fields…

Rachel Lea Ballantyne Draelos

I am a Duke computer science PhD candidate and medical student (MD/PhD). My PhD is focused on machine learning methods development for medical data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store