Figure 3 from Merlin: A Vision Language Foundation Model for 3D Computed Tomography

Skip to search formSkip to main contentSkip to account menu

Semantic ScholarSemantic Scholar's Logo

Corpus ID: 270371846

@inproceedings{Blankemeier2024MerlinAV, title={Merlin: A Vision Language Foundation Model for 3D Computed Tomography}, author={Louis Blankemeier and Joseph Paul Cohen and Ashwin Kumar and Dave Van Veen and Syed Jamal Safdar Gardezi and Magdalini Paschali and Zhihong Chen and Jean-Benoit Delbrouck and Eduardo Pontes Reis and Cesar Augusto Madid Truyts and Christian Bluethgen and Malte E. K. Jensen and Sophie Ostmeier and Maya Varma and Jeya Maria Jose Valanarasu and Zhongnan Fang and Zepeng Huo and Zaid Nabulsi and Diego Ardila and Wei-Hung Weng and Edson Amaro Junior and Neera Ahuja and Jason Alan Fries and Nigam H. Shah and Andrew Johnston and Robert D. Boutin and Andrew Wentland and Curtis P. Langlotz and Jason Hom and Sergios Gatidis and Akshay S. Chaudhari}, year={2024}, url={https://api.semanticscholar.org/CorpusID:270371846}}

Louis Blankemeier, Joseph Paul Cohen, Akshay S. Chaudhari
Published 10 June 2024
Computer Science, Medicine

This work introduces Merlin - a 3D VLM that is trained using paired CT scans, EHR diagnosis codes, and radiology reports, and derives data scaling laws to empirically assess training data needs for requisite downstream task performance.

[PDF] Semantic Reader

Figures and Tables from this paper

figure 1
table 1
figure 2
table 2
figure 3
table 3
figure 4
table 4
figure 5
table 5
figure 6
table 6
figure 7
table 7
table 8
figure 8
table 9
table 10
table 11

Ask This Paper
BETA
AI-Powered

Our system tries to constrain to information found in this paper. Results quality may vary. Learn more about how we generate these answers.

Feedback?

89 References

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

João CarreiraAndrew Zisserman

Computer Science

2017 IEEE Conference on Computer Vision and…

2017

I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.

6,787
Highly Influential

[PDF]

Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data

J. DennyL. Bastarache D. Roden

Medicine, Biology

Nature Biotechnology

2013

The first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs) is reported, an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes in EMR-based cohorts.

834
Highly Influential
PDF

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Sheng ZhangYanbo Xu Hoifung Poon

Computer Science, Medicine

2023

PMC-15M is a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types, and pretrained BiomedCLIP, a multimodal foundation model with domain-specific adaptations tailored to biomedical vision-language processing.

40
Highly Influential

[PDF]

A Vertebral Segmentation Dataset with Fracture Grading

M. LöfflerA. Sekuboyina J. Kirschke

Medicine, Computer Science

Radiology. Artificial intelligence

2020

This research highlights the need to understand more fully the role that language plays in the development of identity and how language and identity politics play a role in the creation of identity.

117
Highly Influential

[PDF]

Screening and diagnosis of cardiovascular disease using artificial intelligence-enabled cardiac magnetic resonance imaging

Yan-Ran Joyce WangKai Yang Shihua Zhao

Medicine, Computer Science

Nature medicine

2024

A two-stage paradigm consisting of noninvasive cine-based CVD screening followed by cine and late gadolinium enhancement-based diagnosis is proposed, which holds the potential to substantially advance the efficiency and scalability of CMR interpretation, thereby improving CVD screening and diagnosis.

What matters when building vision-language models?

Hugo LaurençonLéo TronchonMatthieu CordVictor Sanh

Computer Science

ArXiv

2024

This work conducts extensive experiments around pre-trained models, architecture choice, data, and training methods, and develops Idefics2, an efficient foundational VLM of 8 billion parameters that achieves state-of-the-art performance within its size category across various multimodal benchmarks.

[PDF]

Vision–language foundation model for echocardiogram interpretation

M. ChristensenMilos VukadinovicNeal YuanDavid Ouyang

Medicine, Computer Science

Nature medicine

2024

EchoCLIP is a vision–language foundation model for echocardiography that learns the relationship between cardiac ultrasound images and the interpretations of expert cardiologists across a wide range of patients and indications for imaging.

Automated abdominal CT contrast phase detection using an interpretable and open-source artificial intelligence algorithm.

E. P. ReisLouis Blankemeier Akshay S. Chaudhari

Medicine, Computer Science

European radiology

2024

An open-source and interpretable AI algorithm accurately detects contrast phases in abdominal CT scans, with high accuracy and F1 scores in internal and external validation, confirming its generalization capability.

A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities

Ibrahim Ethem HamamciSezgin Er Bjoern H Menze

Medicine, Computer Science

ArXiv

2024

This study introduces CT-RATE, the first 3D medical imaging dataset that pairs images with textual reports, and developed CT-CLIP, a CT-focused contrastive language-image pre-training framework that outperforms state-of-the-art, fully supervised methods in multi-abnormality detection across all key metrics, thus eliminating the need for manual annotation.

[PDF]

...

Related Papers

Showing 1 through 3 of 0 Related Papers

Figure 3: Phenotype classification. (a) Average AUROC performance for the top 20 phenotype groups listed in order of prevalence (black line). (b) Data scaling law experiments that measure how average AUROC (top) and…

Published in 2024

Merlin: A Vision Language Foundation Model for 3D Computed Tomography

Louis BlankemeierJoseph Paul Cohen Akshay S. Chaudhari

Figure 5 of 19

Figure 3 from Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Semantic Scholar (2024)

Figures and Tables from this paper

Ask This PaperBETAAI-Powered

89 References

Related Papers

References

Ask This Paper
BETA
AI-Powered