Skip to search formSkip to main contentSkip to account menu
- Corpus ID: 270371846
@inproceedings{Blankemeier2024MerlinAV, title={Merlin: A Vision Language Foundation Model for 3D Computed Tomography}, author={Louis Blankemeier and Joseph Paul Cohen and Ashwin Kumar and Dave Van Veen and Syed Jamal Safdar Gardezi and Magdalini Paschali and Zhihong Chen and Jean-Benoit Delbrouck and Eduardo Pontes Reis and Cesar Augusto Madid Truyts and Christian Bluethgen and Malte E. K. Jensen and Sophie Ostmeier and Maya Varma and Jeya Maria Jose Valanarasu and Zhongnan Fang and Zepeng Huo and Zaid Nabulsi and Diego Ardila and Wei-Hung Weng and Edson Amaro Junior and Neera Ahuja and Jason Alan Fries and Nigam H. Shah and Andrew Johnston and Robert D. Boutin and Andrew Wentland and Curtis P. Langlotz and Jason Hom and Sergios Gatidis and Akshay S. Chaudhari}, year={2024}, url={https://api.semanticscholar.org/CorpusID:270371846}}
- Louis Blankemeier, Joseph Paul Cohen, Akshay S. Chaudhari
- Published 10 June 2024
- Computer Science, Medicine
This work introduces Merlin - a 3D VLM that is trained using paired CT scans, EHR diagnosis codes, and radiology reports, and derives data scaling laws to empirically assess training data needs for requisite downstream task performance.
Figures and Tables from this paper
- figure 1
- table 1
- figure 2
- table 2
- figure 3
- table 3
- figure 4
- table 4
- figure 5
- table 5
- figure 6
- table 6
- figure 7
- table 7
- table 8
- figure 8
- table 9
- table 10
- table 11
Ask This Paper
BETA
AI-Powered
Ask This Paper
BETA
AI-Powered
Unknown Error
An unexpected error occurred. Please try again.
No Answer Found
Ask another question that can be answered by this paper or rephrase your question.
We are still processing this paper
Please try again later.
Question Answering Unavailable
Please try again later.
No Response
The server took too long to answer your question. You can either rephrase your question or wait until it is less busy.
AI-Generated
Thank you for your feedback!
We're sorry, something went wrong while submitting this feedback.
Thank you for your feedback!
We're sorry, something went wrong while submitting this feedback.
Supporting Statements
Our system tries to constrain to information found in this paper. Results quality may vary. Learn more about how we generate these answers.
Feedback?
89 References
- João CarreiraAndrew Zisserman
- 2017
Computer Science
2017 IEEE Conference on Computer Vision and…
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.
- 6,787
- Highly Influential[PDF]
- J. DennyL. Bastarache D. Roden
- 2013
Medicine, Biology
Nature Biotechnology
The first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs) is reported, an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes in EMR-based cohorts.
- 834
- Highly Influential
- PDF
- Sheng ZhangYanbo Xu Hoifung Poon
- 2023
Computer Science, Medicine
PMC-15M is a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types, and pretrained BiomedCLIP, a multimodal foundation model with domain-specific adaptations tailored to biomedical vision-language processing.
- 40
- Highly Influential[PDF]
- M. LöfflerA. Sekuboyina J. Kirschke
- 2020
Medicine, Computer Science
Radiology. Artificial intelligence
This research highlights the need to understand more fully the role that language plays in the development of identity and how language and identity politics play a role in the creation of identity.
- 117
- Highly Influential[PDF]
- Yan-Ran Joyce WangKai Yang Shihua Zhao
- 2024
Medicine, Computer Science
Nature medicine
A two-stage paradigm consisting of noninvasive cine-based CVD screening followed by cine and late gadolinium enhancement-based diagnosis is proposed, which holds the potential to substantially advance the efficiency and scalability of CMR interpretation, thereby improving CVD screening and diagnosis.
- 1
- PDF
- Hugo LaurençonLéo TronchonMatthieu CordVictor Sanh
- 2024
Computer Science
ArXiv
This work conducts extensive experiments around pre-trained models, architecture choice, data, and training methods, and develops Idefics2, an efficient foundational VLM of 8 billion parameters that achieves state-of-the-art performance within its size category across various multimodal benchmarks.
- 11 [PDF]
- M. ChristensenMilos VukadinovicNeal YuanDavid Ouyang
- 2024
Medicine, Computer Science
Nature medicine
EchoCLIP is a vision–language foundation model for echocardiography that learns the relationship between cardiac ultrasound images and the interpretations of expert cardiologists across a wide range of patients and indications for imaging.
- 4
- PDF
- E. P. ReisLouis Blankemeier Akshay S. Chaudhari
- 2024
Medicine, Computer Science
European radiology
An open-source and interpretable AI algorithm accurately detects contrast phases in abdominal CT scans, with high accuracy and F1 scores in internal and external validation, confirming its generalization capability.
- 1
- Ibrahim Ethem HamamciSezgin Er Bjoern H Menze
- 2024
Medicine, Computer Science
ArXiv
This study introduces CT-RATE, the first 3D medical imaging dataset that pairs images with textual reports, and developed CT-CLIP, a CT-focused contrastive language-image pre-training framework that outperforms state-of-the-art, fully supervised methods in multi-abnormality detection across all key metrics, thus eliminating the need for manual annotation.
...
...
Related Papers
Showing 1 through 3 of 0 Related Papers
Figure 3: Phenotype classification. (a) Average AUROC performance for the top 20 phenotype groups listed in order of prevalence (black line). (b) Data scaling law experiments that measure how average AUROC (top) and…
Published in 2024
Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Louis BlankemeierJoseph Paul Cohen Akshay S. Chaudhari
Figure 5 of 19