Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Multi-Scale Aggregation and Anthropic Prior Knowledge

Bo Zou1, Shaofeng Wang2, Hao Liu1, Gaoyue Sun5, Yajie Wang1,4, FeiFei Zuo4, Chengbin Quan1, Youjian Zhao1,3,
1Tsinghua University, 2Capital Medical Universty, 3Zhongguancun Laboratory, 4LargeV .Inc, 5Imperial College London

Abstract

Teeth localization, segmentation, and labeling in 2D images have great potential in modern dentistry to enhance dental diagnostics, treatment planning, and population-based studies on oral health. However, general instance segmentation frameworks are incompetent due to 1) the subtle differences between some teeth' shapes (e.g., maxillary first premolar and second premolar), 2) the teeth's position and shape variation across subjects, and 3) the presence of abnormalities in the dentition (e.g., caries and edentulism).

To address these problems, we propose a ViT-based framework named TeethSEG, which consists of stacked Multi-Scale Aggregation (MSA) blocks and an Anthropic Prior Knowledge (APK) layer. Specifically, to compose the two modules, we design a unique permutation-based upscaler to ensure high efficiency while establishing clear segmentation boundaries with multi-head self/cross-gating layers to emphasize particular semantics meanwhile maintaining the divergence between token embeddings.

Besides, we collect the first open-sourced intraoral image dataset IO150K, which comprises over 150k intraoral photos, and all photos are annotated by orthodontists using a human-machine hybrid algorithm.

Experiments on IO150K demonstrate that our TeethSEG outperforms the state-of-the-art segmentation models on dental image segmentation.

The largest 2D intra-oral scan dataset IO150k

We create the first open-source 2D intraoral scan dataset IO150k, which consists of:

(1) Challenge80K, 80K rendered images generated from 1,800 3D scans sourced from 3D Teeth Scan Segmentation and Labeling Challenge 2023.
(2) Plaster70K, 70K images of 940 oral plaster models made before, during, and after taking the orthodontic treatment.
(3) RGB0.8K, 0.8K RGB standard intraoral photos taken before orthodontic treatment.


This dataset has the following key properties:

(1) Large: We have collected over 150K images that enable well-trained transformers that are usually more data-hungry than CNN models.
(2) Diverse: We cover a wide range of dental malformations (e.g., crowded dentition and edentulism) to ensure the ability to generalize to clinical applications.
(3) Professional: The data is annotated by multiple professional orthodontists using a human-machine hybrid algorithm,
ensuring accurate tooth position recognition in complex instances. Please see Appendix A for dataset statistics.


Illustration of human-machine hybrid annotation process

Labeling method

Data Samples

Labeling method

Data statistics

Labeling method

Pipeline of TeethSEG

We utilize a pretrained encoder to project an intraoral image into a sequence of visual tokens, and a set of trainable class tokens to predict segmentation masks. The multi-scale aggregation (MSA) blocks efficiently aggregate the visual information into class tokens, and the anthropic prior knowledge (APK) layer imposes human judgment into the mask prediction.


Pipeline of TeethSEG

Qualitative Results


Comparsions on IO150k o.o.d. test

Compared with previous segmentation methods, TeethSEG can deal better with complex situations such as missing teeth or irregular tooth arrangements.

Results on o.o.d. test

performance under serious dental abnormals

Results on dental abnormals

Comparsions on IO150k RGB test

2D images can efficiently determine dental crowding, dentition space, missing teeth, narrow dental arch, and midline deviation. Considering the cost of 3D scans and CBCT, the large radiation CBCT dose to patients, and the fact that they are not routine examinations, we tend to study using 2D images for early orthodontic warning and 3D data for treatment planning. TeethSEG can fast adapt to 2D intra-oral images by finetuning on a small amount of annotated data.

Results on RGB test

Quantitative Results


Comparsions on IO150k i.i.d. test

Results on i.i.d. test

Comparsions on IO150k o.o.d. test

Results on o.o.d. test

Comparsions on IO150k RGB test

Results on RGB test

BibTeX

@inproceedings{
      anonymous2024teethseg,
      title={Teeth-{SEG}: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge},
      author={Anonymous},
      booktitle={Conference on Computer Vision and Pattern Recognition 2024},
      year={2024},
      url={https://openreview.net/forum?id=P6tNXNycrT}
      }