分割一切:基础视觉模型的发展

Journal: Advances in Computer and Autonomous Intelligence Research DOI: 10.12238/acair.v3i1.11913

李昊喆, 王志豪, 黄利萍

东北大学软件学院

Abstract

本文综述了基础视觉模型的发展,重点介绍了Segment Anything Model(SAM),由Meta AI提出的提示驱动的通用图像分割模型。SAM通过大规模预训练,在处理高分辨率图像的同时保持了计算效率,展现出了强大的零样本泛化能力。SAM的核心技术包括图像编码器、提示编码器和掩码解码器,它们共同实现了高效的特征提取、灵活的提示处理和高质量的分割掩码生成。SAM的应用案例涵盖医学和遥感领域,显示了其在边界清晰、结构简单的图像分割任务中的优越性能,同时也揭示了在处理复杂场景时存在的挑战。未来的研究方向包括领域适配与模型优化、多模态融合与跨领域应用,以及更高效的提示工程技术,这些都将有助于进一步提升SAM的功能和适用范围。

Keywords

Segment Anything Model;提示驱动分割;多模态融合

References

[1] Devlin J.Bert:Pre-training of deep bidirectional tran sformers for language understanding[J].arXiv preprint arXiv: 1810.04805,2018.
[2] Brown T B.Language models are few-shot learners[J]. arXiv preprint arXiv:2005.14165,2020.
[3] 宋婧.视觉大模型将是下一个风口[N].中国电子报,2024-01-09(005).
[4] Zhai X, Kolesnikov A, Houlsby N, et al. Scaling vision transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.2022:12104-12113.
[5] Dehghani M, Djolonga J, Mustafa B, et al. Scaling vision transformers to 22 billion parameters[C]//International Conf erence on Machine Learning.PMLR,2023:7480-7512.
[6] Liu Z, Hu H, Lin Y, et al. Swin transformer v2: Scaling up capacity and resolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022:12009-12019.
[7] Wang L,Huang B,Zhao Z, et al. Videomae v2: Scaling video masked autoencoders with dual masking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni tion.2023:14549-14560.
[8] Radford A,Kim J W,Hallacy C,et al.Learning transferable visual models from natural language supervision[C]//Internat ional conference on machine learning.PMLR,2021:8748-8763.
[9] Jia C,Yang Y,Xia Y,et al.Scaling up visual and visionlanguage representation learning with noisy text supervision [C]//International conference on machine learning. PMLR, 2021: 4904-4916.
[10] 沈怡然.对话三位IEEE专家:如何理解SAM视觉大模型[N].经济观察报,2023-08-21(006).
[11] Mazurowski M A, Dong H, Gu H, et al. Segment anything model for medical image analysis: an experimental study[J]. Medical Image Analysis,2023,89:102918.
[12] Huang Y, Yang X, Liu L, et al. Segment anything model for medical images?[J].Medical Image Analysis,2024,92:103061.
[13] Deng R, Cui C, Liu Q, et al. Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging[J]. arXiv preprint arXiv:2304.04155, 2023.
[14] Liu Y,Zhang J, She Z, et al. Samm (segment any medical model):A 3d slicer integration to sam[J].arXiv preprint arXiv:2304.05622,2023.
[15] Osco L P,Wu Q,de Lemos E L, et al.The segment anything model (sam) for remote sensing applications: From zero to one shot[J]. International Journal of Applied Earth Observation and Geoinformation, 2023,124:103540.
[16] Wu Q,Osco L P.samgeo:A Python package for segmenting geospatial data with the Segment Anything Model (SAM)[J].Journal of Open Source Software,2023,8(89):5663.
[17] Ma A,Wang J,Zhong Y,et al.FactSeg:Foreground activat ion-driven small object semantic segmentation in large-sca le remote sensing imagery[J].IEEE Transactions on Geoscience and Remote Sensing,2021,60:1-16.

Copyright © 2025 李昊喆, 王志豪, 黄利萍

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License