Figure 3 from Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Semantic Scholar (2024)

Skip to search formSkip to main contentSkip to account menu

Semantic ScholarSemantic Scholar's Logo
  • Corpus ID: 270371846
@inproceedings{Blankemeier2024MerlinAV, title={Merlin: A Vision Language Foundation Model for 3D Computed Tomography}, author={Louis Blankemeier and Joseph Paul Cohen and Ashwin Kumar and Dave Van Veen and Syed Jamal Safdar Gardezi and Magdalini Paschali and Zhihong Chen and Jean-Benoit Delbrouck and Eduardo Pontes Reis and Cesar Augusto Madid Truyts and Christian Bluethgen and Malte E. K. Jensen and Sophie Ostmeier and Maya Varma and Jeya Maria Jose Valanarasu and Zhongnan Fang and Zepeng Huo and Zaid Nabulsi and Diego Ardila and Wei-Hung Weng and Edson Amaro Junior and Neera Ahuja and Jason Alan Fries and Nigam H. Shah and Andrew Johnston and Robert D. Boutin and Andrew Wentland and Curtis P. Langlotz and Jason Hom and Sergios Gatidis and Akshay S. Chaudhari}, year={2024}, url={https://api.semanticscholar.org/CorpusID:270371846}}
  • Louis Blankemeier, Joseph Paul Cohen, Akshay S. Chaudhari
  • Published 10 June 2024
  • Computer Science, Medicine

This work introduces Merlin - a 3D VLM that is trained using paired CT scans, EHR diagnosis codes, and radiology reports, and derives data scaling laws to empirically assess training data needs for requisite downstream task performance.

Figures and Tables from this paper

  • figure 1
  • table 1
  • figure 2
  • table 2
  • figure 3
  • table 3
  • figure 4
  • table 4
  • figure 5
  • table 5
  • figure 6
  • table 6
  • figure 7
  • table 7
  • table 8
  • figure 8
  • table 9
  • table 10
  • table 11

Ask This Paper

BETA

AI-Powered

Our system tries to constrain to information found in this paper. Results quality may vary. Learn more about how we generate these answers.

Feedback?

89 References

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    João CarreiraAndrew Zisserman

    Computer Science

    2017 IEEE Conference on Computer Vision and…

  • 2017

I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

  • 6,787
  • Highly Influential
  • [PDF]
Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data
    J. DennyL. Bastarache D. Roden

    Medicine, Biology

    Nature Biotechnology

  • 2013

The first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs) is reported, an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes in EMR-based cohorts.

  • 834
  • Highly Influential
  • PDF
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
    Sheng ZhangYanbo Xu Hoifung Poon

    Computer Science, Medicine

  • 2023

PMC-15M is a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types, and pretrained BiomedCLIP, a multimodal foundation model with domain-specific adaptations tailored to biomedical vision-language processing.

  • 40
  • Highly Influential
  • [PDF]
A Vertebral Segmentation Dataset with Fracture Grading
    M. LöfflerA. Sekuboyina J. Kirschke

    Medicine, Computer Science

    Radiology. Artificial intelligence

  • 2020

This research highlights the need to understand more fully the role that language plays in the development of identity and how language and identity politics play a role in the creation of identity.

  • 117
  • Highly Influential
  • [PDF]
Screening and diagnosis of cardiovascular disease using artificial intelligence-enabled cardiac magnetic resonance imaging
    Yan-Ran Joyce WangKai Yang Shihua Zhao

    Medicine, Computer Science

    Nature medicine

  • 2024

A two-stage paradigm consisting of noninvasive cine-based CVD screening followed by cine and late gadolinium enhancement-based diagnosis is proposed, which holds the potential to substantially advance the efficiency and scalability of CMR interpretation, thereby improving CVD screening and diagnosis.

  • 1
  • PDF
What matters when building vision-language models?
    Hugo LaurençonLéo TronchonMatthieu CordVictor Sanh

    Computer Science

    ArXiv

  • 2024

This work conducts extensive experiments around pre-trained models, architecture choice, data, and training methods, and develops Idefics2, an efficient foundational VLM of 8 billion parameters that achieves state-of-the-art performance within its size category across various multimodal benchmarks.

Vision–language foundation model for echocardiogram interpretation
    M. ChristensenMilos VukadinovicNeal YuanDavid Ouyang

    Medicine, Computer Science

    Nature medicine

  • 2024

EchoCLIP is a vision–language foundation model for echocardiography that learns the relationship between cardiac ultrasound images and the interpretations of expert cardiologists across a wide range of patients and indications for imaging.

  • 4
  • PDF
Automated abdominal CT contrast phase detection using an interpretable and open-source artificial intelligence algorithm.
    E. P. ReisLouis Blankemeier Akshay S. Chaudhari

    Medicine, Computer Science

    European radiology

  • 2024

An open-source and interpretable AI algorithm accurately detects contrast phases in abdominal CT scans, with high accuracy and F1 scores in internal and external validation, confirming its generalization capability.

  • 1
A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities
    Ibrahim Ethem HamamciSezgin Er Bjoern H Menze

    Medicine, Computer Science

    ArXiv

  • 2024

This study introduces CT-RATE, the first 3D medical imaging dataset that pairs images with textual reports, and developed CT-CLIP, a CT-focused contrastive language-image pre-training framework that outperforms state-of-the-art, fully supervised methods in multi-abnormality detection across all key metrics, thus eliminating the need for manual annotation.

...

...

Related Papers

Showing 1 through 3 of 0 Related Papers

    Figure 3 from Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Semantic Scholar (20)

    Figure 3: Phenotype classification. (a) Average AUROC performance for the top 20 phenotype groups listed in order of prevalence (black line). (b) Data scaling law experiments that measure how average AUROC (top) and…

    Published in 2024

    Merlin: A Vision Language Foundation Model for 3D Computed Tomography

    Louis BlankemeierJoseph Paul Cohen Akshay S. Chaudhari

    Figure 5 of 19

    Figure 3 from Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Semantic Scholar (2024)

    References

    Top Articles
    Latest Posts
    Article information

    Author: Nathanial Hackett

    Last Updated:

    Views: 5522

    Rating: 4.1 / 5 (52 voted)

    Reviews: 91% of readers found this page helpful

    Author information

    Name: Nathanial Hackett

    Birthday: 1997-10-09

    Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

    Phone: +9752624861224

    Job: Forward Technology Assistant

    Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

    Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.