Using World Models to Analyze Ultrasound Data
Category: Research Poster
Author(s): Benjamin Haddad
Presenter(s): Benjamin Haddad
Mentors(s): Nathaniel Blanchard
Computer vision, as a subsection of machine learning, has taken a massive spotlight both in the current literature and everyday interaction. Exploring new facets in the field has led to major advancements which bridge the gap between theory and practice. Innovation with world models, Facebook’s V-JEPA (& V-JEPA2) notably, allow practitioners of computer vision to implement solutions to versatile problems via access to public models and pretrained weights. The objective of many vision researchers is to utilize these high power world models to create pipelines which perform well in distinct situations. The aim was to implement an architecture which could, with high probability, predict compression and decompression of an ultrasound probe on humans. Automated detection of ultrasound probe compression has clinical implications for healthcare professionals globally. The architecture, built on Pytorch, implements V-JEPA2 with the ‘large’ set of pretrained weights as an encoder and processor, then uses a two-layered neural network as a classification head for compression and non-compression of ultrasound image data. The dataset contains 22 ultrasound videos with separate manually labelled csv files containing the times of compressions vs non-compressions. The set is broken into a 90/10 train-test split and then is further split into separate compression vs non compression clips based on the labelled csv files. The model is then trained in a 10 epoch cycle using the ADAM optimizer and Cross Entropy loss. For reproducibility, torch is set with a manual seed of 42.