0009-0008-7549-4021 0000-0003-2240-1377 0000-0001-7536-8967

af1]淡江大学电机与计算机工程系，新北市，251301，台湾 af2]国立台湾大学网络与多媒体研究所，台湾，台北市，106335 af3]国立台北护理大学护理学院健康科学, 台北市, 112303, 台湾

^🖂电子邮件：jsken.chiang@gmail.com

YOLOv9 用于儿童手腕创伤 X 射线图像中的骨折检测

Chun-Tse Chien Rui-Yang Ju Kuang-Yi Chou Jen-Shiun Chiang^🖂, [ [ [

摘要

YOLO系列最新版本YOLOv9的推出，使其在各种场景中得到广泛应用。本文首次将YOLOv9算法模型应用到计算机辅助诊断（CAD）的骨折检测任务中，以帮助放射科医生和外科医生解读X射线图像。具体来说，本文在 GRAZPEDWRI-DX 数据集上训练模型，并使用数据增强技术扩展训练集以提高模型性能。实验结果表明，与当前state-of-the-art（SOTA）模型的mAP 50-95相比，YOLOv9模型将值从42.16%提高到43.73%，提升了3.7%。实现代码可在 https://github.com/RuiyangJu/YOLOv9-Fracture-Detection 上公开获取。

1简介

计算机辅助诊断 (CAD) 可帮助放射科医生和外科医生等专家解读医学图像，包括磁共振成像 (MRI)、计算机断层扫描 (CT) 和 X 射线图像。深度学习技术在医学图像中的应用[1,2,3,4]取得了越来越令人满意的结果，使其成为热门的研究热点，特别是在骨折检测方面[5,6, 7]。

You Only Look Once (YOLO) 系列[8, 9, 10, 11, 12, 13, 14, 15, 16]是实时目标检测任务的主要神经网络，广泛应用于骨折检测[17,18,19]。儿童手腕骨折更为常见，GRAZPEDWRI-DX 数据集[20]提供了 20,327 张儿童手腕外伤的 X 射线图像，可用于骨折检测任务。研究[21]首先使用YOLOv8[16]模型在此数据集上进行断裂检测。由于注意力机制[22,23,24,25]在增强神经网络模型的性能方面具有出色的效果，Chien等人取得了state-of-the-通过将不同的注意力机制融入到 YOLOv8 模型中来提高艺术（SOTA）性能。

随着YOLOv9 [26]在MS COCO 2017 [27]基准数据集上取得了显着的模型性能，本文首先在GRAZPEDWRI上训练YOLOv9模型 - DX数据集并获得SOTA性能，如图1所示。

本文的主要贡献如下：

1.

本文首次将YOLOv9应用到骨折检测任务中，证明该模型不仅在跨现实场景的实时目标检测方面具有优异的性能，而且在医学图像识别方面也有良好的效果。
2.

本文采用YOLOv9算法解决X射线图像断裂检测中的信息损失问题，旨在在低特征X射线图像的模型训练过程中保留更多信息，提高模型的性能。
3.

在 GRAZPEDWRI-DX 数据集上训练的 YOLOv9 模型的 mAP 50-95 显着提高，达到了 SOTA 水平。

Refer to caption — 图1： GRAZPEDWRI-DX 数据集上断裂检测模型的比较。在准确性方面，我们的模型在最先进的水平上优于所有以前的模型。

2相关作品

在目标检测任务领域，检测器通常采用单阶段或两阶段算法。与两级物体检测器相比，YOLO系列的模型提供了更平衡的精度和推理速度，使其适合部署在移动计算平台上进行医学图像识别。 Son 等人 [28]利用YOLOv4 [9]和U-Net [29]作为辅助诊断帮助牙医无需借助锥形束计算机断层扫描 (CBCT) 即可识别下颌骨折的工具。 Jeon 等人 [30]采用YOLOv4 [9]通过检测骨折并将其映射到3D重建来帮助外科医生诊断创伤骨图像，通过覆盖在 3D 骨图像上的红色掩模清晰显示骨折区域。 Hržić 等人 [18]采用YOLOv4 [9]模型在GRAZPEDWRI-DX数据集[20]上进行断裂检测，首次证明YOLO系列模型可以帮助放射科医生通过X射线图像更准确地预测儿童手腕损伤。 Ahmed 等人 [31] 展示了利用 YOLOv5 一级算法模型在提高儿童手腕 X 射线图像诊断准确性方面的潜力 [ 12]、YOLOv6 [13]、YOLOv7 [15]和YOLOv8 [16]模型分别用于手腕异常检测。 Warin 等人 [32]利用YOLOv5 [12]模型在全景X射线图像中检测下颌骨骨折，展示了YOLOv5 模型可在专家级别识别下颌骨折。 Gaikwad 等人 [33]应用YOLOv5 [12]模型检测C1至C7椎骨的主要和次要骨折，实现了准确率达89%。 Zou 等人 [34]研究了全身各种骨折形态，包括角骨折、正骨折、线骨折和迷失角骨折。他们将 YOLOv7 [15] 模型与注意力机制 [35] 集成，在 FracAtlas [36] 数据集上取得了优异的性能。 Samothai 等人 [17]证明YOLOX [10]模型比YOLOR [11具有更快的收敛速度和更高的精度]通过探测头解耦、无锚、增强策略等方法检测骨折区域。他们还表明，YOLOX 即使在低特征 X 射线图像中也能定位骨折。 Moon等[37]提出了一种基于YOLOX模型的计算机辅助面部骨折诊断（CA-FBFD）系统，有效减少了医生诊断的工作量面部 CT 扫描中的面部骨折。虽然YOLO系列模型在医学图像识别中的应用是一个热门研究课题，但迄今为止，还没有人利用YOLOv9[26]进行骨折检测。

表格1：当输入图像大小为 640 时，与其他最先进的断裂检测模型在 GRAZPEDWRI-DX 数据集上进行定量比较。

Model

Params

(M)

FLOPs

(G)

(%)

mAP 50

(%)

mAP 50-95

(%)

Speedt1n1

(ms)

YOLOv8[16]

43.61

164.9

62.44

40.32

3.6

YOLOv8+SA[22]

43.64

165.4

63.99

41.49

3.9

YOLOv8+ECA[23]

43.64

165.5

62.64

40.21

3.6

YOLOv8+GAM[24]

49.29

183.5

63.32

40.74

8.7

YOLOv8+ResGAM[38]

49.29

183.5

63.97

41.18

9.4

YOLOv8+ResCBAM[25]

53.87

196.2

62.95

40.10

4.1

YOLOv9-C (Ours)

51.02

239.0

65.31

42.66

5.2

YOLOv9-E (Ours)

69.42

244.9

65.46

43.32

6.4

Note: The model size of all YOLOv8 and its variants listed in the table is large. [t1n1]Speed is the total time for preprocessing, inference, and post-processing.

3方法

3.1YOLOv9

神经网络经常面临信息丢失的挑战，因为输入数据经过多层特征提取和空间变换，导致原始信息丢失。这个问题在 X 射线图像中尤其明显，其中低特征在骨折检测任务中表现出极大的困难。具体来说，在此类低特征图像上训练的模型往往表现不佳，而解决信息丢失问题可以大大提高模型预测的准确性。为了解决这个问题，我们利用 YOLOv9 算法，该算法利用可编程梯度信息 (PGI) 和通用高效层聚合网络 (GELAN) 来更有效地提取关键特征。

3.1.1 可编程梯度信息

可编程梯度信息（PGI）是一种辅助监督框架，旨在管理梯度信息在各个语义级别上的传播，以提高模型的检测能力。 PGI由三个主要部分组成：主分支、辅助可逆分支和多级辅助信息。在推理过程中，它专门使用主分支来处理前向和反向传播。随着网络变得更深，可能会出现信息瓶颈，导致损失函数无法产生有用的梯度。在这种情况下，辅助可逆分支采用可逆功能来保持信息完整性并减轻主分支中的信息丢失。此外，多级辅助信息从深度监督机制上解决了误差累积的问题，通过引入不同级别的辅助信息来提高模型的学习能力。值得注意的是，研究[26]强调了 PGI 在保存信息方面的功效，特别是在特征有限的场景中。这为YOLOv9模型在断裂检测任务中具有优异的性能提供了理论基础。

3.1.2广义高效层聚合网络

为了提高模型训练中的信息集成和传播效率，YOLOv9引入了一种新颖的轻量级网络架构，称为通用高效层聚合网络（GELAN）。 GELAN集成了CSPNet[39]和ELAN[40]，有效聚合网络信息，减少传播过程中的信息损失，增强层间信息交互。该架构由于其较低的参数和计算复杂度，特别适合计算资源有限的环境中的断裂检测。

3.2数据处理和增强

图2说明了本研究中进行的实验的流程图。由于 GRAZPEDWRI-DX [20] 数据集的发布者没有提供预定义的训练集、验证集和测试集，因此我们随机分配 70% 给训练集，20% 给验证集，10% 给训练集。数据处理过程中的测试集。此外，由于低特征 X 射线图像的亮度多样性有限，仅在这些图像上训练的模型可能无法很好地推广到其他环境中的 X 射线图像。为了增强模型的鲁棒性，我们采用数据训练增强技术来扩展集合。具体来说，我们使用 OpenCV 库中的 addWeighted 函数微调 X 射线图像的对比度和亮度。

表2：当输入图像大小为 1024 时，与 GRAZPEDWRI-DX 数据集上其他最先进的断裂检测模型进行定量比较。

Model

Params

(M)

FLOPs

(G)

(%)

mAP 50

(%)

mAP 50-95

(%)

Speedt1n1

(ms)

YOLOv8[16]

43.61

164.9

63.63

40.41

7.7

YOLOv8+SA[22]

43.64

165.4

64.25

41.64

8.0

YOLOv8+ECA[23]

43.64

165.5

64.26

41.94

7.7

YOLOv8+GAM[24]

49.29

183.5

64.26

41.00

12.7

YOLOv8+ResGAM[38]

49.29

183.5

64.98

41.75

18.1

YOLOv8+ResCBAM[25]

53.87

196.2

65.78

42.16

8.7

YOLOv9-C (Ours)

51.02

239.0

65.57

43.70

12.7

YOLOv9-E (Ours)

69.42

244.9

65.62

43.73

16.1

Note: The model size of all YOLOv8 and its variants listed in the table is large. [t1n1]Speed is the total time for preprocessing, inference, and post-processing.

4实验

4.1数据集

GRAZPEDWRI-DX [20] 是格拉茨医科大学提供的公共数据集，其中包含 20,327 张儿童手腕外伤的 X 射线图像。这些 X 射线图像由格拉茨大学医院的儿科放射科医生团队于 2008 年至 2018 年收集。该数据集包含 6,091 名患者和 10,643 个研究，共有 74,459 张标记图像，代表 67,771 个标记对象。

4.2实验设置

本文中的实验使用单个 NVIDIA GeForce RTX 3090 GPU，并使用带有 PyTorch 框架的 Python。在训练模型之前，我们采用了在 MS COCO 2017 [27] 数据集上预训练的 YOLOv9 模型权重。在训练过程中，我们使用 SGD [41] 优化器训练模型，权重衰减率设置为 5e-4，动量设置为 0.937。我们按照研究[21]将初始学习率设置为1e-2，epoch数设置为100。由于单个 GPU 训练训练的资源限制（24GB 内存），模型采用的批量大小为 16。

4.3实验结果

为了评估 YOLOv9 和其他 SOTA 模型在真实诊断场景中的性能，本研究比较了模型大小（参数和每秒浮点运算）、精度（F1 分数、50% 时的平均精度（mAP 50）和平均平均精度）精度从 50% 到 95% (mAP 50-95)) 和推理时间。人们普遍认为，使用较大的输入图像尺寸可以提高预测精度，但也需要更多的计算资源。因此，我们针对不同场景进行了两次实验，输入图像大小分别为640和1024，结果如表1和2所示。输入大小为 640 时，YOLOv9-C（紧凑）和 YOLOv9-E（扩展）都表现出显着提高的 mAP，同时保持合理的推理速度。具体来说，YOLOv9-E 的 mAP 50-95 达到了 43.32%，比当前 SOTA 模型 YOLOv8+SA 达到的 41.49% 高出 4.4%。当输入图像尺寸为1024时，YOLOv9-E的mAP 50-95达到43.73%，也获得了SOTA性能。但由于推理时间增加，更适合部署在计算资源较高的设备上。

5结论

YOLO系列的模型可以作为CAD来协助放射科医生和外科医生解读X射线图像。然而，由于 X 射线图像的特征较低，以前的模型的预测往往不能令人满意。本文首先介绍了YOLOv9在断裂检测中的应用，利用新提出的PGI和GELAN解决了模型训练过程中的信息丢失问题。实验结果表明YOLOv9模型在GRAZPEDWRI-DX数据集上实现了SOTA性能，证明了该方法的有效性。

这项研究得到了台湾国家科学技术委员会的支持，授权号：NSTC 112-2221-E-032-037-MY2。

参考

[1] authorChung, S.W., et al.: titleAutomated detection and classification of the proximal humerus fracture by using deep learning algorithm. journalActa orthopaedica volume89(number4), pages468–473 (year2018)
[2] authorChoi, J.W., et al.: titleUsing a dual-input convolutional neural network for automated detection of pediatric supracondylar fracture on conventional radiography. journalInvestigative radiology volume55(number2), pages101–110 (year2020)
[3] authorTanzi, L., et al.: titleHierarchical fracture classification of proximal femur x-ray images using a multistage deep learning approach. journalEuropean journal of radiology volume133, pages109373 (year2020)
[4] authorAdams, S.J., et al.: titleArtificial intelligence solutions for analysis of x-ray images. journalCanadian Association of Radiologists Journal volume72(number1), pages60–72 (year2021)
[5] authorGan, K., et al.: titleArtificial intelligence detection of distal radius fractures: a comparison between the convolutional neural network and professional assessments. journalActa orthopaedica volume90(number4), pages394–400 (year2019)
[6] authorYahalomi, E., authorChernofsky, M., authorWerman, M.: titleDetection of distal radius fractures trained by a small set of x-ray images and faster r-cnn. In: booktitleIntelligent Computing: Proceedings of the 2019 Computing Conference, Volume 1, pp. pages971–981. organizationSpringer (year2019)
[7] authorBlüthgen, C., et al.: titleDetection and localization of distal radius fractures: Deep learning system versus radiologists. journalEuropean journal of radiology volume126, pages108925 (year2020)
[8] authorRedmon, J., et al.: titleYou only look once: Unified, real-time object detection. In: booktitleProceedings of the IEEE conference on computer vision and pattern recognition, pp. pages779–788. (year2016)
[9] authorBochkovskiy, A., authorWang, C.Y., authorLiao, H.Y.M.: titleYolov4: Optimal speed and accuracy of object detection. journalarXiv preprint arXiv:2004.10934 (year2020)
[10] authorGe, Z., et al.: titleYolox: Exceeding yolo series in 2021. journalarXiv preprint arXiv:2107.08430 (year2021)
[11] authorWang, C.Y., authorYeh, I.H., authorLiao, H.Y.M.: titleYou only learn one representation: Unified network for multiple tasks. journalarXiv preprint arXiv:2105.04206 (year2021)
[12] authorGlenn, J.: titleUltralytics yolov5. GitHub. howpublishedhttps://github.com/ultralytics/yolov5 (year2022)
[13] authorLi, C., et al.: titleYolov6: A single-stage object detection framework for industrial applications. journalarXiv preprint arXiv:2209.02976 (year2022)
[14] authorJu, R.Y., et al.: titleResolution enhancement processing on low quality images using swin transformer based on interval dense connection strategy. journalMultimedia Tools and Applications pp. pages1–17. (year2023)
[15] authorWang, C.Y., authorBochkovskiy, A., authorLiao, H.Y.M.: titleYolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: booktitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. pages7464–7475. (year2023)
[16] authorGlenn, J.: titleUltralytics yolov8. GitHub. howpublishedhttps://github.com/ultralytics/ultralytics (year2023)
[17] authorSamothai, P., et al.: titleThe evaluation of bone fracture detection of yolo series. In: booktitle2022 37th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), pp. pages1054–1057. organizationIEEE (year2022)
[18] authorHržić, F., et al.: titleFracture recognition in paediatric wrist radiographs: An object detection approach. journalMathematics volume10(number16), pages2939 (year2022)
[19] authorSu, Z., et al.: titleSkeletal fracture detection with deep learning: A comprehensive review. journalDiagnostics volume13(number20), pages3245 (year2023)
[20] authorNagy, E., et al.: titleA pediatric wrist trauma x-ray dataset (grazpedwri-dx) for machine learning. journalScientific Data volume9(number1), pages222 (year2022)
[21] authorJu, R.Y., authorCai, W.: titleFracture detection in pediatric wrist trauma x-ray images using yolov8 algorithm. journalarXiv preprint arXiv:2304.05071 (year2023)
[22] authorZhang, Q.L., authorYang, Y.B.: titleSa-net: Shuffle attention for deep convolutional neural networks. In: booktitleICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. pages2235–2239. organizationIEEE (year2021)
[23] authorWang, Q., et al.: titleEca-net: Efficient channel attention for deep convolutional neural networks. In: booktitleProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. pages11534–11542. (year2020)
[24] authorLiu, Y., authorShao, Z., authorHoffmann, N.: titleGlobal attention mechanism: Retain information to enhance channel-spatial interactions. journalarXiv preprint arXiv:2112.05561 (year2021)
[25] authorWoo, S., et al.: titleCbam: Convolutional block attention module. In: booktitleProceedings of the European conference on computer vision (ECCV), pp. pages3–19. (year2018)
[26] authorWang, C.Y., authorYeh, I.H., authorLiao, H.Y.M.: titleYolov9: Learning what you want to learn using programmable gradient information. journalarXiv preprint arXiv:2402.13616 (year2024)
[27] authorLin, T.Y., et al.: titleMicrosoft coco: Common objects in context. In: booktitleComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. pages740–755. organizationSpringer (year2014)
[28] authorSon, D.M., et al.: titleCombined deep learning techniques for mandibular fracture diagnosis assistance. journalLife volume12(number11), pages1711 (year2022)
[29] authorRonneberger, O., authorFischer, P., authorBrox, T.: titleU-net: Convolutional networks for biomedical image segmentation. In: booktitleMedical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. pages234–241. organizationSpringer (year2015)
[30] authorJeon, Y.D., et al.: titleDeep learning model based on you only look once algorithm for detection and visualization of fracture areas in three-dimensional skeletal images. journalDiagnostics volume14(number1), pages11 (year2023)
[31] authorAhmed, A., et al.: titleEnhancing wrist abnormality detection with yolo: Analysis of state-of-the-art single-stage detection models. journalBiomedical Signal Processing and Control volume93, pages106144 (year2024)
[32] authorWarin, K., et al.: titleAssessment of deep convolutional neural network models for mandibular fracture detection in panoramic radiographs. journalInternational Journal of Oral and Maxillofacial Surgery volume51(number11), pages1488–1494 (year2022)
[33] authorGaikwad, D., et al.: titleIdentification of cervical spine fracture using deep learning. journalAustralian Journal of Multi-Disciplinary Engineering pp. pages1–9. (year2024)
[34] authorZou, J., authorArshad, M.R.: titleDetection of whole body bone fractures based on improved yolov7. journalBiomedical Signal Processing and Control volume91, pages105995 (year2024)
[35] authorHu, J., authorShen, L., authorSun, G.: titleSqueeze-and-excitation networks. In: booktitleProceedings of the IEEE conference on computer vision and pattern recognition, pp. pages7132–7141. (year2018)
[36] authorAbedeen, I., et al.: titleFracatlas: A dataset for fracture classification, localization and segmentation of musculoskeletal radiographs. journalScientific Data volume10(number1), pages521 (year2023)
[37] authorMoon, G., et al.: titleComputer aided facial bone fracture diagnosis (ca-fbfd) system based on object detection model. journalIEEE Access volume10, pages79061–79070 (year2022)
[38] authorChien, C.T., et al.: titleYolov8-am: Yolov8 with attention mechanisms for pediatric wrist fracture detection. journalarXiv preprint arXiv:2402.09329 (year2024)
[39] authorWang, C.Y., et al.: titleCspnet: A new backbone that can enhance learning capability of cnn. In: booktitleProceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. pages390–391. (year2020)
[40] authorWang, C.Y., authorLiao, H.Y.M., authorYeh, I.H.: titleDesigning network design strategies through gradient path analysis. journalarXiv preprint arXiv:2211.04800 (year2022)
[41] authorRuder, S.: titleAn overview of gradient descent optimization algorithms. journalarXiv preprint arXiv:1609.04747 (year2016)