Bone Classification Algorithm (BCA) | Spine Navigation With Electromagnetic Tracking

The goal of the BCA is to return lines that model edges of select vertebrae. Like the TCA, the input is a CT scan that is taken before surgery begins. Additionally, this algorithm requires some user input, as the orthopedic surgeon must visually inspect the CT scan and determine which bones, he/she would like to track. This is a feature that was requested by Dr. Pallotta, as surgeons often find themselves targeting different objectives, and no surgery is the same.

During development, our team took two approaches to create these line models. First, we attempted to classify the bones using a machine learning method. This approach had potential to be very robust, however, due to the limited time on the project, our team was unable to complete this segment. This could be a great improvement for the next iteration of this project. Our second approach, although more simplistic, was reliably able to model the desired bones as line segments. Detailed descriptions of these methods are listed below.

Machine Learning BCA

Machine learning is a powerful tool for performing biomedical image segmentation tasks, such as segmentation and classification. Among them, convolutional neural networks have been tested for bone classification algorithms on various datasets. Those models include U-Net [2-3], U-Net 3D [4], and V-NET [5]. We adopted the 3D V-Net model to segment the bone position because V-Net exploits down-convolutions with customized stride step and kernel-size, instead of a fixed down sampling size achieved through max-pooling. Learnable parameters are also added in this stage. The network architecture is shown in Figure 13.

The input and the output of the model are 3D nii.gz files, which need the type conversion from the original CT scan. This can be achieved through the “dicom2nifti” function in the “convert2nii.py.” We chose the Verse data set as the training dataset, including 143 pairs of training and 256 pairs of testing datasets. During the training process, there were lots of difficulties and bugs during the training process, but we finally figured out most of them and got an acceptable test result, shown in Fig. 16. It takes roughly two days to train the model with 300 epochs and 143 pairs of training data. Theoretically speaking, the model is supposed to take the original CT scan and crop the image to get rid of the background as much as possible but keep all the bone parts, as shown under the pre-processing in Figure 14. Next, we tested the image and get the prediction mask shown in red under the Binary Segmentation image.

Figure 14: Theoretical results example, desired results.

We also tried other networks, like 3D U-Net as can be seen in Figure 15. U-Net is the most common convolutional neural network in biomedical image segmentation. 3D U-Net maintains the advantages U-Net has and performs 3D operations to keep the 3D information. In our 3D U-Net model, all the images were cropped to a fixed size of 256x256x256. This size is larger than the original image so that all the information from the input images is maintained. Next, the cropped images were fed into the network to train the model. Unfortunately, unsolvable type errors occur that stop us from continuing.

Although the 3D V-Net model was able to train the model and get the prediction result, the model was poorly trained. Because of the loss value, dice loss, is around 1 throughout the training process. Since the highest loss value is 1, indicating the model is not learning. We still test the model with our input and got the images shown in Figure 16. The prediction mask is not ideal, because the bone parts are hollow. Luckily, the boundary of the mask is clear enough to find the angle we desired. We also overlapped the prediction mask in red onto the CT scan and found out that the prediction was quite accurate on the boundary.

Figure 16: Prediction Results Visualization: input image / prediction result / combined input in gray, prediction result in red.

Therefore, we take the prediction mask as the input and perform the angle detection with the steps shown below:

Change the coordinate to match the image to the CT scan.
For all the frames, find the frame with the largest predicted bone. This is our frame of interest.
Find the two largest connected components, which are the two trackers. Define the top one as tracker1 and the bottom one as tracker2 for reference.
Perform the same operations for two neighboring frames (#center frame-2 ~ #center frame+2)
Get rid of the trackers for all five frames and average them, named the resulting image as bone image.
Detect the outline of the bone from the bone image using the findContour function.

7. Find the region whose y-axis (vertical) is close to the center of mass of the tracker.

8. Find the two lines using hough_line_peak and calculate the angle. Figure 17 shows an example with two detected lines in thick white, which has an angle of 27 degrees.

*Figure 17: Line detection from the predicted mask*

The current methods will find the frame and the bone automatically based on the tracker position. While this method has lots of space for improvement. A better training model would result in more accurate predicted segmentation results. This could be achieved through varying training parameters, adopting a different training model, or increasing the training dataset. Moreover, a user interface could be added to allow users to pick any bone they want. Currently, the bone is selected based on the tracker position automatically. Third, this method could be adjusted to different types of trackers.

Functional BCA:

The functional BCA has only a few simple steps to create a model of the bone’s edges. These are listed below in chronological order.

The BCA portion of the python script is run.
The stack of slices from CT scan is compressed into three 2D projections. This is done by summing all the slices along a particular axis. The three resulting 2D projections are displayed in Figure 18 below.

Figure 18: Three 2D Projections

Next, the leftmost of these projections displays in a window for the user to view. The pop-up gives three different planes for the surgeon to select from. So, depending on the corrective action required, the doctor will select the option most suited for his/her surgery.
After the plane has been selected, the user is prompted to create lines within the image by placing two points on the bone. Each time a point is selected, a text entry is required to confirm that you would like to place the point. Once two points are plotted, the line is saved for future use. These past two steps are displayed graphically below in Figure 19.

Figure 19: BCA User Input

The above step is repeated once more, as the surgery objective is to measure the angle between two bone edges. So, after this step is repeated, two lines will have been successfully saved. This terminates the BCA portion of the project.

Reference:

[1] Milletari, Fausto, Nassir Navab, and Seyed-Ahmad Ahmadi. “V-net: Fully convolutional neural networks for volumetric medical image segmentation.” 2016 fourth international conference on 3D vision (3DV). Ieee, 2016.

[2] Shim, Jae-Hyuk, et al. “Evaluation of U-Net models in automated cervical spine and cranial bone segmentation using X-ray images for traumatic atlanto-occipital dislocation diagnosis.” Scientific Reports 12.1 (2022): 21438.

[3] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation.” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer International Publishing, 2015.

[4] Çiçek, Özgün, et al. “3D U-Net: learning dense volumetric segmentation from sparse annotation.” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19. Springer International Publishing, 2016.

[5] Milletari, Fausto, Nassir Navab, and Seyed-Ahmad Ahmadi. “V-net: Fully convolutional neural networks for volumetric medical image segmentation.” 2016 fourth international conference on 3D vision (3DV). Ieee, 2016.

[6] Luiserrador. “Luiserrador/ML_3D_Unet: Tensorflow Based Framework for 3D-Unet with Knowledge Distillation.” GitHub, github.com/luiserrador/ML_3D_Unet?tab=readme-ov-file#3d-unet-framework—includes-knowledge-distillation. Accessed 19 Dec. 2023.