About Me
Education:
- Ph.D., Physiology, Weill Medical College of Cornell University (2010)
- M.Eng., Computer Science, Cornell University (2005)
- B.S., Computer Science, Columbia University (2004)
Recent News:
- 10-10-2024: MedImageInsight, our open-source medical imaging embedding model, has been announced, and was featured in the press.
- 07-02-2024: Fully Authentic Visual Question Answering Dataset from Online Communities has been accepted to ECCV 2024
- 06-04-2024: U.S. Patent Office issues US 12001942B2, Biological Neuron to Electronic Computer Interface
- 10-07-2023: Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond has been accepted to Findings of EMNLP 2023
- 09-07-2023: Expert agreement on the presence and spatial localization of melanocytic features in dermoscopy published in Journal of Investigative Dermatology
- 07-27-2023: "A reinforcement learning model for AI-based decision support in skin cancer" has been published in Nature Medicine
- 05-02-2023: "UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding" has been accepted to Findings of ACL
- 03-27-2023: "What is Next in Multimodal Foundation Models" workshop accepted to ICCV 2023
- 03-07-2023: Project Florence is now in Public Preview
- 02-27-2023: Streaming Video Model has been accepted to CVPR 2023
- 02-25-2023: The Eighth ISIC Skin Image Analysis Workshop will be hosted at MICCAI 2023.
- 11-21-2022: i-Code has been accepted to AAAI 2023
- 10-23-2022: The L2ID @ ECCV 2022 Workshop talks are now all available on YouTube
- 08-11-2022: Accepted role of Area Chair at ICLR 2023
- 07-03-2022: DaViT and MS-CLIP have both been accepted to ECCV 2022
- 06-11-2022: Our Florence Foundation Model was featured in an article in The Economist
- 05-24-2022: Our work on the Florence computer vision foundation model was featured in Microsoft CEO Satya Nadella's keynote at Microsoft Build 2022
- 05-20-2022: CvT models are now available on HuggingFace Transformers
- 05-13-2022: Our CvT paper has been ranked 5th most influential ICCV paper by PaperDigest
- 04-26-2022: Invited speaker at the University of Buffalo Institute for Artificial Intelligence
- 04-22-2022: Nominated as a Highlighted Area Chair of ICLR 2022
- 04-20-2022: "Validation of AI prediction models" published in Lancet Digital Health 2022.
- 04-11-2022: On organizing committees for two ECCV 2022 workshops: L2ID & ISIC
- 03-30-2022: Accepted role of Area Chair at NeurIPS 2022
- 03-02-2022: RegionCLIP has been accepted to CVPR 2022 [ Paper | Code ]
- 01-03-2022: Accepted role of Area Chair at ICPR 2022
- 12-01-2021: CLEAR Derm Consensus Guidlines Published in JAMA Dermatology
- 08-06-2021: Assigned to Senior Program Committee of AAAI 2022
- 08-02-2021: Assigned an Associate Editor of IEEE Transactions on Multimedia
- 07-22-2021: Convolutional Vision Transformer (CvT) accepted to ICCV 2021 [ Paper | Code ]
- 06-20-2021: Co-organizer and session chair for the L2ID Workshop @ CVPR 2021
- 06-19-2021: On steering committee of the Skin Image Analysis Workshop @ CVPR 2021
- 06-15-2021: Accepted role of Area Chair at ICLR 2022
- 09-21-2020: Joined Microsoft Azure Cognitive Services!
- 08-26-2020: A Broader Study of Cross-Domain Few-Shot Learning presented at ECCV 2020
- 06-22-2020: Dermatology Human-AI collaboration study published in Nature Medicine
- 06-19-2020: Organizer of Learning with Limited Labels Workshop @ CVPR 2020
- 06-15-2020: Organizer of Skin Image Analysis Workshop @ CVPR 2020
- 10-31-2019: Speaking at the Transforming Dermatology in the Digital Era continuing medical education session at Memorial Sloan-Kettering
- 07-15-2019: Dermatology Human-AI comparison study published at Lancet Oncology
Research
Machine Learning: |
|
Computer Vision
ECCV 2024
Fully Authentic Visual Question Answering Dataset from Online Communities
Visual Question Answering (VQA) entails answering questions about authentic use case. Sourced from online question answering community forums, we call it VQAonline. We characterize this dataset and how it relates to eight mainstream VQA datasets. Observing that answers in our dataset tend to be much longer (i.e., a mean of 173 words) and so incompatible with standard VQA evaluation metrics, we instead utilize popular metrics for longer text evaluation for evaluating six state-of-the-art VQA models on VQAonline and report where they struggle most. Finally, we analyze which evaluation metrics align best with human judgments.
[ Paper ]
EMNLP Findings 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions. However, we have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding. The first type of dataset bias is Unbalanced Matching bias, where the correct answer overlaps the question and image more than the incorrect answers. The second type of dataset bias is Distractor Similarity bias, where incorrect answers are overly dissimilar to the correct answer but significantly similar to other incorrect answers within the same sample. To address these dataset biases, we first propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data. We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation. Extensive experiments demonstrate the effectiveness of ADS and ICT in consistently improving model performance across different benchmarks, even in domain-shifted scenarios.
[ Paper ]
CVPR 2023
Streaming Video Model
Video understanding tasks have traditionally been modeled by two separate architectures, specially tailored for two distinct tasks. Sequence-based video tasks, such as action recognition, use a video backbone to directly extract spatiotemporal features, while frame-based video tasks, such as multiple object tracking (MOT), rely on single fixed-image backbone to extract spatial features. In contrast, we propose to unify video understanding tasks into one novel streaming video architecture, referred to as Streaming Vision Transformer (S-ViT). S-ViT first produces frame-level features with a memory-enabled temporally-aware spatial encoder to serve the frame-based video tasks. Then the frame features are input into a task-related temporal decoder to obtain spatiotemporal features for sequence-based tasks. The efficiency and efficacy of S-ViT is demonstrated by the state-of-the-art accuracy in the sequence-based action recognition task and the competitive advantage over conventional architecture in the frame-based MOT task. We believe that the concept of streaming video model and the implementation of S-ViT are solid steps towards a unified deep learning architecture for video understanding.
[ Paper ]
AAAI 2023
i-Code: An Integrative and Composable Multimodal Learning Framework
Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.
[ Paper ]
ECCV 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that light-weight modality-specific parallel modules further improve performance. Experimental results show that the proposed MS-CLIP approach outperforms vanilla CLIP by up to 13% relative in zero-shot ImageNet classification (pre-trained on YFCC-100M), while simultaneously supporting a reduction of parameters.
of the entire model.
[ Paper ]
ECCV 2022
DaViT: Dual Attention Vision Transformers
In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. We propose approaching the problem from an orthogonal angle: exploiting self-attention mechanisms with both “spatial tokens” and “channel tokens”. With spatial tokens, the spatial dimension defines the token scope, and the channel dimension defines the token feature dimension. With channel tokens, we have the inverse: the channel dimension defines the token scope, and the spatial dimension defines the token feature dimension. We further group tokens along the sequence direction for both spatial and channel tokens to maintain the linear complexity of the entire model.
[ Paper ]
arXiv 2021
Florence: A New Foundation Model for Computer Vision
Automated visual understanding of our diverse and open world demands computer vision models to generalize well with minimal customization for specific tasks, similar to human vision. Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications. While existing vision foundation models such as CLIP (Radford et al., 2021), ALIGN (Jia et al., 2021), and Wu Dao 2.0 (Wud) focus mainly on mapping images and textual representations to a cross-modal shared representation, we introduce a new computer vision foundation model, Florence, to expand the representations from coarse (scene) to fine (object), from static (images) to dynamic (videos), and from RGB to multiple modalities (caption, depth)
[ Paper ]
CVPR 2022
RegionCLIP: Region-based Language-Image Pretraining.
We propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Further, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets.
[ Paper | Code ]
ICCV 2021
CvT: Introducing Convolutions to Vision Transformers.
We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
[ Paper | Code ]
ECCV 2020
A Broader Study of Cross-Domain Few-Shot Learning.
In this paper, we propose the Broader Study of Cross-Domain Few-Shot Learning (BSCD-FSL)
benchmark, consisting of image data from a diverse assortment of image acquisition methods. This includes natural images, such as crop disease images,
but additionally those that present with an increasing dissimilarity to natural images, such as satellite images, dermatology images, and x-rays.
[ Paper | Code ]
ACM MM 2014
Modeling Attributes from Category-Attribute Proportions
In this paper, we propose to model attributes from category-attribute proportions. The proposed framework can model attributes without attribute labels on the images.
[ Paper ]
Explainability
ICML HILL 2019
Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning
This framework
augments training data to include explanations
elicited from domain users, in addition to features
and labels. This approach ensures that explanations for predictions are tailored to the complexity
expectations and domain knowledge of the consumer
[ Paper ]
AAAI AIES 2019
TED: Teaching AI to explain its decisions
This work introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the generality and effectiveness of this approach with two different examples, resulting in highly accurate explanations with no loss of prediction performance.
[ Paper ]
MICCAI IMIMIC 2018
Collaborative Human-AI (CHAI): Evidence-based interpretable melanoma classification in dermoscopic images.
In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors.
[ Paper ]
Dermatology
Journal of Investigative Dermatology
Expert agreement on the presence and spatial localization of melanocytic features in dermoscopy
Dermoscopy aids in melanoma detection; however, agreement on dermoscopic features, including those of high clinical relevance, remains poor. Herein we attempted to evaluate agreement among experts on "exemplar images" not only for the presence of melanocytic-specific features but also for spatial localization.
[ Paper ]
Nature Medicine 2023
A reinforcement learning model for AI-based decision support in skin cancer
We investigated whether human preferences hold the potential to improve diagnostic artificial intelligence (AI)-based decision support using skin cancer diagnosis as a use case. We utilized nonuniform rewards and penalties based on expert-generated tables, balancing the benefits and harms of various diagnostic errors, which were applied using reinforcement learning. Compared with supervised learning, the reinforcement learning model improved the sensitivity for melanoma from 61.4% to 79.5% and for basal cell carcinoma from 79.4% to 87.1%. AI overconfidence was also reduced while simultaneously maintaining accuracy. Reinforcement learning increased the rate of correct diagnoses made by dermatologists by 12.0% and improved the rate of optimal management decisions from 57.4% to 65.3%. We further demonstrated that the reward-adjusted reinforcement learning model and a threshold-based model outperformed naïve supervised learning in various clinical scenarios. Our findings suggest the potential for incorporating human preferences into image-based diagnostic algorithms.
[ Paper ]
Lancet Digital Health 2022
Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images
Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy.
[ Paper ]
JAMA Dermatology 2021
CLEAR Derm Consensus Guidelines
In this consensus statement, key recommendations for developers and reviewers of imaging-based AI reports in dermatology were formulated and grouped into the topics of (1) data, (2) technique, (3) technical assessment, and (4) application. Guidelines are proposed to address current challenges in dermatology image-based AI that hinder clinical translation, including lack of image standardization, concerns about potential sources of bias, and factors that cause performance degradation.
[ Paper ]
Nature Scientific Data 2021
A patient-centric dataset of images and metadata for identifying melanomas using clinical context.
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another.
[ Paper ]
Nature Medicine 2020
Human–computer collaboration for skin cancer recognition.
Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows.
[ Paper ]
MICCAI 2020
Fairness of Classifiers Across Skin Tones in Dermatology
In this paper, we present an approach to estimate skin tone in skin disease benchmark datasets and investigate whether model performance is dependent on this measure.
[ Paper | arXiv ]
Lancet Oncology 2019
Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.
We provide a state-of-the-art comparison of the most
advanced machine-learning algorithms with a large number of
human readers, including the most experienced human experts
[ Paper ]
IEEE JBHI 2019
Dermoscopy Image Analysis: Overview and
Future Directions.
In this paper, we present a brief overview of
this exciting subfield of medical image analysis, primarily
focusing on three aspects of it, namely, segmentation, feature extraction, and classification. We then provide future
directions for researchers.
[ Paper ]
Seminars in Cutaneous Medicine and Surgery 2019
The role of public challenges and data sets towards algorithm development, trust, and use in clinical practice.
In this article, we summarize recent advancements in machine learning, with a focused perspective on the role of public challenges and data sets on the progression of these technologies in skin imaging. In addition, we highlight the remaining hurdles toward effective implementation of technologies to the clinical workflow and discuss how public challenges and data sets can catalyze the development of solutions.
[ Paper ]
(Please email for a free copy)
Journal of the American Academy of Dermatology 2018
Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images.
We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images.
[ Paper ]
EMBC 2018
Segmentation of both Diseased and Healthy Skin
from Clinical Photographs in a Primary Care Setting
This work presents the first segmentation study of
both diseased and healthy skin in standard camera photographs
from a clinical environment. Challenges arise from varied lighting conditions, skin types, backgrounds, and pathological states
[ Paper ]
ISBI 2018
Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC).
This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer.
[ Paper ]
IBM JRD 2016
Deep learning ensembles for melanoma recognition in dermoscopy images.
We propose a system that combines recent
developments in deep learning with established machine learning approaches, creating
ensembles of methods that are capable of segmenting skin lesions, as well as analyzing the
detected area and surrounding tissue for melanoma detection.
[ Paper ]
MICCAI MLMI 2015
Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images.
This work presents an approach for melanoma recognition in dermoscopy images that combines deep learning, sparse coding, and support vector machine (SVM) learning algorithms. One of the beneficial aspects of the proposed approach is that unsupervised learning within the domain, and feature transfer from the domain of natural photographs, eliminates the need of annotated data in the target task to learn good features.
[ Paper ]
Electroencephalography (EEG)
ICLR 2016
Learning Representations from EEG with Deep Recurrent Convolutional Neural Networks
One of the challenges in modeling cognitive events from electroencephalogram
(EEG) data is finding representations that are invariant to inter- and intra-subject
differences, as well as to inherent noise associated with EEG data collection.
Herein, we propose a novel approach for learning such representations from multichannel EEG time-series, and demonstrate its advantages in the context of mental
load classification task.
[ Paper ]
Magnetic Resonance Imaging (MRI)
Journal of Cardiovascular Magnetic Resonance 2019
Machine learning derived segmentation of phase velocity encoded cardiovascular magnetic resonance for fully automated aortic flow quantification
Fully automated machine learning PC-CMR segmentation performs robustly for aortic flow quantification - yielding rapid segmentation, small differences with manual segmentation, and identification of differential forward/left ventricular volumetric stroke volume in context of concomitant mitral regurgitation. Findings support use of machine learning for analysis of large scale CMR datasets.
[ Paper ]
JACC: Cardiovascular Imaging 2016
Echocardiographic Algorithm for Post–Myocardial Infarction LV Thrombus: A Gatekeeper for Thrombus Evaluation by Delayed Enhancement CMR
The goal of this study was to determine the prevalence of post–myocardial infarction (MI) left ventricular (LV) thrombus in the current era and to develop an effective algorithm (predicated on echocardiography [echo]) to discern patients warranting further testing for thrombus via delayed enhancement (DE) cardiac magnetic resonance (CMR).
[ Paper ]
Circulation: Cardiovascular Imaging 2012
Improved Left Ventricular Mass Quantification With Partial Voxel Interpolation In Vivo and Necropsy Validation of a Novel Cardiac MRI
Segmentation Algorithm.
This study tested LVM segmentation among clinical patients and laboratory animals undergoing CMR. In patients,
echocardiography (echo) was performed within 1 day of
CMR and used as a clinical comparator for LVM. In
laboratory animals, euthanasia was performed after CMR and
segmentation results were compared with ex vivo LV weight.
The aim was to examine the impact of partial voxel segmentation on CMR quantification of LVM
[ Paper ]
ICIP 2012
Cardiac Anatomy as a Biometric.
In this study, we propose a novel biometric signature for human identification based on anatomically unique structures of
the left ventricle of the heart. An algorithm is developed that
analyzes the 3 primary anatomical structures of the left ventricle: the endocardium, myocardium, and papillary muscle
[ Paper ]
Journal of Cardiovascular Magnetic Resonance 2010
Impact of diastolic dysfunction severity on global left ventricular volumetric filling-assessment by automated segmentation of routine cine cardiovascular magnetic resonance
To examine relationships between severity of echocardiography (echo) -evidenced diastolic dysfunction (DD) and volumetric filling by automated processing of routine cine cardiovascular magnetic resonance (CMR).
[ Paper ]
NMR in Biomedicine 2010
A radial self-calibrated (RASCAL) generalized autocalibrating partially parallel acquisition (GRAPPA) method using weight interpolation.
A generalized autocalibrating partially parallel acquisition (GRAPPA) method for radial k-space sampling is presented that calculates GRAPPA weights without synthesized or acquired calibration data.
[ Paper ]
Magnetic Resonance in Medicine 2010
Respiratory and Cardiac Self-Gated Free-Breathing
Cardiac CINE Imaging With Multiecho 3D Hybrid Radial
SSFP Acquisition.
A respiratory and cardiac self-gated free-breathing three-dimensional cine steady-state free precession imaging
method using multiecho hybrid radial sampling is presented.
[ Paper ]
IEEE TBME 2010
Automatic Left Ventricle Segmentation Using
Iterative Thresholding and an Active Contour Model
With Adaptation on Short-Axis Cardiac MRI
An automatic left ventricle (LV) segmentation algorithm is presented for quantification of cardiac output and myocardial mass in clinical practice.
[ Paper ]
Circulation: Cardiovascular Imaging 2009
Automated Segmentation of Routine Clinical Cardiac Magnetic Resonance Imaging for Assessment of Left Ventricular Diastolic Dysfunction
Automated CMR segmentation can provide LV filling profiles that may offer insight into diastolic dysfunction. Patients with diastolic dysfunction have prolonged diastolic filling intervals, which are associated with echo-evidenced diastolic dysfunction independent of clinical and imaging variables.
[ Paper ]
Radiology 2008
Left Ventricle: Automated
Segmentation by Using Myocardial
Effusion Threshold Reduction and
Intravoxel Computation at MR Imaging
The purpose of the study was to develop and
validate an algorithm for automated segmentation of the
left ventricular (LV) cavity that accounts for papillary
and/or trabecular muscles and partial voxels in cine magnetic resonance (MR) images, an algorithm called LV Myocardial Effusion Threshold Reduction with Intravoxel Computation (LV-METRIC).
[ Paper ]
Other Healthcare Research
MICCAI 2014
Automated medical image modality recognition by fusion of visual and text information.
In this work, we present a framework for medical image modality recognition based on a fusion of both visual and text classification methods. Experiments are performed on the public ImageCLEF 2013 medical image modality dataset, which provides figure images and associated fulltext articles from PubMed as components of the benchmark.
[ Paper ]
IBM JRD 2015
A generalized framework for medical image classification and recognition
In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification of medical images. In the first stage, models are built for subsets of features and data, and in the second stage, models are combined. We demonstrate the performance of this framework in four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 medical modality recognition benchmark, 2) echocardiography view and mode recognition, 3) dermatology disease recognition across two datasets, and 4) a broad medical image dataset, merged from multiple data sources into a collection of 158 categories covering both general and specific medical conceptsVincluding modalities, body regions, views, and disease states. In the first context, the presented system achieves state-of-art performance of 82.2% multiclass accuracy. In the second context, the system attains 90.48% multiclass accuracy. In the third, state-of-art performance of 90% specificity and 90% sensitivity is obtained on a small standardized dataset of 200 images using a leave-one-out strategy. For a larger dataset of 2,761 images, 95% specificity and 98% sensitivity is obtained on a 20% held-out test set. Finally, in the fourth context, the system achieves sensitivity and specificity of 94.7% and 98.4%, respectively, demonstrating the ability to generalize over domains.
[ Paper ]
Fully Authentic Visual Question Answering Dataset from Online Communities Visual Question Answering (VQA) entails answering questions about authentic use case. Sourced from online question answering community forums, we call it VQAonline. We characterize this dataset and how it relates to eight mainstream VQA datasets. Observing that answers in our dataset tend to be much longer (i.e., a mean of 173 words) and so incompatible with standard VQA evaluation metrics, we instead utilize popular metrics for longer text evaluation for evaluating six state-of-the-art VQA models on VQAonline and report where they struggle most. Finally, we analyze which evaluation metrics align best with human judgments. [ Paper ] |
|
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions. However, we have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding. The first type of dataset bias is Unbalanced Matching bias, where the correct answer overlaps the question and image more than the incorrect answers. The second type of dataset bias is Distractor Similarity bias, where incorrect answers are overly dissimilar to the correct answer but significantly similar to other incorrect answers within the same sample. To address these dataset biases, we first propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data. We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation. Extensive experiments demonstrate the effectiveness of ADS and ICT in consistently improving model performance across different benchmarks, even in domain-shifted scenarios. [ Paper ] |
|
Streaming Video Model Video understanding tasks have traditionally been modeled by two separate architectures, specially tailored for two distinct tasks. Sequence-based video tasks, such as action recognition, use a video backbone to directly extract spatiotemporal features, while frame-based video tasks, such as multiple object tracking (MOT), rely on single fixed-image backbone to extract spatial features. In contrast, we propose to unify video understanding tasks into one novel streaming video architecture, referred to as Streaming Vision Transformer (S-ViT). S-ViT first produces frame-level features with a memory-enabled temporally-aware spatial encoder to serve the frame-based video tasks. Then the frame features are input into a task-related temporal decoder to obtain spatiotemporal features for sequence-based tasks. The efficiency and efficacy of S-ViT is demonstrated by the state-of-the-art accuracy in the sequence-based action recognition task and the competitive advantage over conventional architecture in the frame-based MOT task. We believe that the concept of streaming video model and the implementation of S-ViT are solid steps towards a unified deep learning architecture for video understanding. [ Paper ] |
|
i-Code: An Integrative and Composable Multimodal Learning Framework Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining. [ Paper ] |
|
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that light-weight modality-specific parallel modules further improve performance. Experimental results show that the proposed MS-CLIP approach outperforms vanilla CLIP by up to 13% relative in zero-shot ImageNet classification (pre-trained on YFCC-100M), while simultaneously supporting a reduction of parameters. of the entire model. [ Paper ] |
|
DaViT: Dual Attention Vision Transformers In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. We propose approaching the problem from an orthogonal angle: exploiting self-attention mechanisms with both “spatial tokens” and “channel tokens”. With spatial tokens, the spatial dimension defines the token scope, and the channel dimension defines the token feature dimension. With channel tokens, we have the inverse: the channel dimension defines the token scope, and the spatial dimension defines the token feature dimension. We further group tokens along the sequence direction for both spatial and channel tokens to maintain the linear complexity of the entire model. [ Paper ] |
|
Florence: A New Foundation Model for Computer Vision Automated visual understanding of our diverse and open world demands computer vision models to generalize well with minimal customization for specific tasks, similar to human vision. Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications. While existing vision foundation models such as CLIP (Radford et al., 2021), ALIGN (Jia et al., 2021), and Wu Dao 2.0 (Wud) focus mainly on mapping images and textual representations to a cross-modal shared representation, we introduce a new computer vision foundation model, Florence, to expand the representations from coarse (scene) to fine (object), from static (images) to dynamic (videos), and from RGB to multiple modalities (caption, depth) [ Paper ] |
|
RegionCLIP: Region-based Language-Image Pretraining. We propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Further, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. [ Paper | Code ] |
|
CvT: Introducing Convolutions to Vision Transformers. We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. [ Paper | Code ] |
|
A Broader Study of Cross-Domain Few-Shot Learning. In this paper, we propose the Broader Study of Cross-Domain Few-Shot Learning (BSCD-FSL) benchmark, consisting of image data from a diverse assortment of image acquisition methods. This includes natural images, such as crop disease images, but additionally those that present with an increasing dissimilarity to natural images, such as satellite images, dermatology images, and x-rays. [ Paper | Code ] |
|
Modeling Attributes from Category-Attribute Proportions In this paper, we propose to model attributes from category-attribute proportions. The proposed framework can model attributes without attribute labels on the images. [ Paper ] |
Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning This framework augments training data to include explanations elicited from domain users, in addition to features and labels. This approach ensures that explanations for predictions are tailored to the complexity expectations and domain knowledge of the consumer [ Paper ] |
|
TED: Teaching AI to explain its decisions This work introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the generality and effectiveness of this approach with two different examples, resulting in highly accurate explanations with no loss of prediction performance. [ Paper ] |
|
Collaborative Human-AI (CHAI): Evidence-based interpretable melanoma classification in dermoscopic images. In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors. [ Paper ] |
Dermatology
Journal of Investigative Dermatology
Expert agreement on the presence and spatial localization of melanocytic features in dermoscopy
Dermoscopy aids in melanoma detection; however, agreement on dermoscopic features, including those of high clinical relevance, remains poor. Herein we attempted to evaluate agreement among experts on "exemplar images" not only for the presence of melanocytic-specific features but also for spatial localization.
[ Paper ]
Nature Medicine 2023
A reinforcement learning model for AI-based decision support in skin cancer
We investigated whether human preferences hold the potential to improve diagnostic artificial intelligence (AI)-based decision support using skin cancer diagnosis as a use case. We utilized nonuniform rewards and penalties based on expert-generated tables, balancing the benefits and harms of various diagnostic errors, which were applied using reinforcement learning. Compared with supervised learning, the reinforcement learning model improved the sensitivity for melanoma from 61.4% to 79.5% and for basal cell carcinoma from 79.4% to 87.1%. AI overconfidence was also reduced while simultaneously maintaining accuracy. Reinforcement learning increased the rate of correct diagnoses made by dermatologists by 12.0% and improved the rate of optimal management decisions from 57.4% to 65.3%. We further demonstrated that the reward-adjusted reinforcement learning model and a threshold-based model outperformed naïve supervised learning in various clinical scenarios. Our findings suggest the potential for incorporating human preferences into image-based diagnostic algorithms.
[ Paper ]
Lancet Digital Health 2022
Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images
Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy.
[ Paper ]
JAMA Dermatology 2021
CLEAR Derm Consensus Guidelines
In this consensus statement, key recommendations for developers and reviewers of imaging-based AI reports in dermatology were formulated and grouped into the topics of (1) data, (2) technique, (3) technical assessment, and (4) application. Guidelines are proposed to address current challenges in dermatology image-based AI that hinder clinical translation, including lack of image standardization, concerns about potential sources of bias, and factors that cause performance degradation.
[ Paper ]
Nature Scientific Data 2021
A patient-centric dataset of images and metadata for identifying melanomas using clinical context.
Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another.
[ Paper ]
Nature Medicine 2020
Human–computer collaboration for skin cancer recognition.
Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows.
[ Paper ]
MICCAI 2020
Fairness of Classifiers Across Skin Tones in Dermatology
In this paper, we present an approach to estimate skin tone in skin disease benchmark datasets and investigate whether model performance is dependent on this measure.
[ Paper | arXiv ]
Lancet Oncology 2019
Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.
We provide a state-of-the-art comparison of the most
advanced machine-learning algorithms with a large number of
human readers, including the most experienced human experts
[ Paper ]
IEEE JBHI 2019
Dermoscopy Image Analysis: Overview and
Future Directions.
In this paper, we present a brief overview of
this exciting subfield of medical image analysis, primarily
focusing on three aspects of it, namely, segmentation, feature extraction, and classification. We then provide future
directions for researchers.
[ Paper ]
Seminars in Cutaneous Medicine and Surgery 2019
The role of public challenges and data sets towards algorithm development, trust, and use in clinical practice.
In this article, we summarize recent advancements in machine learning, with a focused perspective on the role of public challenges and data sets on the progression of these technologies in skin imaging. In addition, we highlight the remaining hurdles toward effective implementation of technologies to the clinical workflow and discuss how public challenges and data sets can catalyze the development of solutions.
[ Paper ]
(Please email for a free copy)
Journal of the American Academy of Dermatology 2018
Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images.
We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images.
[ Paper ]
EMBC 2018
Segmentation of both Diseased and Healthy Skin
from Clinical Photographs in a Primary Care Setting
This work presents the first segmentation study of
both diseased and healthy skin in standard camera photographs
from a clinical environment. Challenges arise from varied lighting conditions, skin types, backgrounds, and pathological states
[ Paper ]
ISBI 2018
Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC).
This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer.
[ Paper ]
IBM JRD 2016
Deep learning ensembles for melanoma recognition in dermoscopy images.
We propose a system that combines recent
developments in deep learning with established machine learning approaches, creating
ensembles of methods that are capable of segmenting skin lesions, as well as analyzing the
detected area and surrounding tissue for melanoma detection.
[ Paper ]
MICCAI MLMI 2015
Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images.
This work presents an approach for melanoma recognition in dermoscopy images that combines deep learning, sparse coding, and support vector machine (SVM) learning algorithms. One of the beneficial aspects of the proposed approach is that unsupervised learning within the domain, and feature transfer from the domain of natural photographs, eliminates the need of annotated data in the target task to learn good features.
[ Paper ]
Electroencephalography (EEG)
ICLR 2016
Learning Representations from EEG with Deep Recurrent Convolutional Neural Networks
One of the challenges in modeling cognitive events from electroencephalogram
(EEG) data is finding representations that are invariant to inter- and intra-subject
differences, as well as to inherent noise associated with EEG data collection.
Herein, we propose a novel approach for learning such representations from multichannel EEG time-series, and demonstrate its advantages in the context of mental
load classification task.
[ Paper ]
Magnetic Resonance Imaging (MRI)
Journal of Cardiovascular Magnetic Resonance 2019
Machine learning derived segmentation of phase velocity encoded cardiovascular magnetic resonance for fully automated aortic flow quantification
Fully automated machine learning PC-CMR segmentation performs robustly for aortic flow quantification - yielding rapid segmentation, small differences with manual segmentation, and identification of differential forward/left ventricular volumetric stroke volume in context of concomitant mitral regurgitation. Findings support use of machine learning for analysis of large scale CMR datasets.
[ Paper ]
JACC: Cardiovascular Imaging 2016
Echocardiographic Algorithm for Post–Myocardial Infarction LV Thrombus: A Gatekeeper for Thrombus Evaluation by Delayed Enhancement CMR
The goal of this study was to determine the prevalence of post–myocardial infarction (MI) left ventricular (LV) thrombus in the current era and to develop an effective algorithm (predicated on echocardiography [echo]) to discern patients warranting further testing for thrombus via delayed enhancement (DE) cardiac magnetic resonance (CMR).
[ Paper ]
Circulation: Cardiovascular Imaging 2012
Improved Left Ventricular Mass Quantification With Partial Voxel Interpolation In Vivo and Necropsy Validation of a Novel Cardiac MRI
Segmentation Algorithm.
This study tested LVM segmentation among clinical patients and laboratory animals undergoing CMR. In patients,
echocardiography (echo) was performed within 1 day of
CMR and used as a clinical comparator for LVM. In
laboratory animals, euthanasia was performed after CMR and
segmentation results were compared with ex vivo LV weight.
The aim was to examine the impact of partial voxel segmentation on CMR quantification of LVM
[ Paper ]
ICIP 2012
Cardiac Anatomy as a Biometric.
In this study, we propose a novel biometric signature for human identification based on anatomically unique structures of
the left ventricle of the heart. An algorithm is developed that
analyzes the 3 primary anatomical structures of the left ventricle: the endocardium, myocardium, and papillary muscle
[ Paper ]
Journal of Cardiovascular Magnetic Resonance 2010
Impact of diastolic dysfunction severity on global left ventricular volumetric filling-assessment by automated segmentation of routine cine cardiovascular magnetic resonance
To examine relationships between severity of echocardiography (echo) -evidenced diastolic dysfunction (DD) and volumetric filling by automated processing of routine cine cardiovascular magnetic resonance (CMR).
[ Paper ]
NMR in Biomedicine 2010
A radial self-calibrated (RASCAL) generalized autocalibrating partially parallel acquisition (GRAPPA) method using weight interpolation.
A generalized autocalibrating partially parallel acquisition (GRAPPA) method for radial k-space sampling is presented that calculates GRAPPA weights without synthesized or acquired calibration data.
[ Paper ]
Magnetic Resonance in Medicine 2010
Respiratory and Cardiac Self-Gated Free-Breathing
Cardiac CINE Imaging With Multiecho 3D Hybrid Radial
SSFP Acquisition.
A respiratory and cardiac self-gated free-breathing three-dimensional cine steady-state free precession imaging
method using multiecho hybrid radial sampling is presented.
[ Paper ]
IEEE TBME 2010
Automatic Left Ventricle Segmentation Using
Iterative Thresholding and an Active Contour Model
With Adaptation on Short-Axis Cardiac MRI
An automatic left ventricle (LV) segmentation algorithm is presented for quantification of cardiac output and myocardial mass in clinical practice.
[ Paper ]
Circulation: Cardiovascular Imaging 2009
Automated Segmentation of Routine Clinical Cardiac Magnetic Resonance Imaging for Assessment of Left Ventricular Diastolic Dysfunction
Automated CMR segmentation can provide LV filling profiles that may offer insight into diastolic dysfunction. Patients with diastolic dysfunction have prolonged diastolic filling intervals, which are associated with echo-evidenced diastolic dysfunction independent of clinical and imaging variables.
[ Paper ]
Radiology 2008
Left Ventricle: Automated
Segmentation by Using Myocardial
Effusion Threshold Reduction and
Intravoxel Computation at MR Imaging
The purpose of the study was to develop and
validate an algorithm for automated segmentation of the
left ventricular (LV) cavity that accounts for papillary
and/or trabecular muscles and partial voxels in cine magnetic resonance (MR) images, an algorithm called LV Myocardial Effusion Threshold Reduction with Intravoxel Computation (LV-METRIC).
[ Paper ]
Other Healthcare Research
MICCAI 2014
Automated medical image modality recognition by fusion of visual and text information.
In this work, we present a framework for medical image modality recognition based on a fusion of both visual and text classification methods. Experiments are performed on the public ImageCLEF 2013 medical image modality dataset, which provides figure images and associated fulltext articles from PubMed as components of the benchmark.
[ Paper ]
IBM JRD 2015
A generalized framework for medical image classification and recognition
In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification of medical images. In the first stage, models are built for subsets of features and data, and in the second stage, models are combined. We demonstrate the performance of this framework in four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 medical modality recognition benchmark, 2) echocardiography view and mode recognition, 3) dermatology disease recognition across two datasets, and 4) a broad medical image dataset, merged from multiple data sources into a collection of 158 categories covering both general and specific medical conceptsVincluding modalities, body regions, views, and disease states. In the first context, the presented system achieves state-of-art performance of 82.2% multiclass accuracy. In the second context, the system attains 90.48% multiclass accuracy. In the third, state-of-art performance of 90% specificity and 90% sensitivity is obtained on a small standardized dataset of 200 images using a leave-one-out strategy. For a larger dataset of 2,761 images, 95% specificity and 98% sensitivity is obtained on a 20% held-out test set. Finally, in the fourth context, the system achieves sensitivity and specificity of 94.7% and 98.4%, respectively, demonstrating the ability to generalize over domains.
[ Paper ]
Expert agreement on the presence and spatial localization of melanocytic features in dermoscopy Dermoscopy aids in melanoma detection; however, agreement on dermoscopic features, including those of high clinical relevance, remains poor. Herein we attempted to evaluate agreement among experts on "exemplar images" not only for the presence of melanocytic-specific features but also for spatial localization. [ Paper ] |
|
A reinforcement learning model for AI-based decision support in skin cancer We investigated whether human preferences hold the potential to improve diagnostic artificial intelligence (AI)-based decision support using skin cancer diagnosis as a use case. We utilized nonuniform rewards and penalties based on expert-generated tables, balancing the benefits and harms of various diagnostic errors, which were applied using reinforcement learning. Compared with supervised learning, the reinforcement learning model improved the sensitivity for melanoma from 61.4% to 79.5% and for basal cell carcinoma from 79.4% to 87.1%. AI overconfidence was also reduced while simultaneously maintaining accuracy. Reinforcement learning increased the rate of correct diagnoses made by dermatologists by 12.0% and improved the rate of optimal management decisions from 57.4% to 65.3%. We further demonstrated that the reward-adjusted reinforcement learning model and a threshold-based model outperformed naïve supervised learning in various clinical scenarios. Our findings suggest the potential for incorporating human preferences into image-based diagnostic algorithms. [ Paper ] |
|
Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy. [ Paper ] |
|
CLEAR Derm Consensus Guidelines In this consensus statement, key recommendations for developers and reviewers of imaging-based AI reports in dermatology were formulated and grouped into the topics of (1) data, (2) technique, (3) technical assessment, and (4) application. Guidelines are proposed to address current challenges in dermatology image-based AI that hinder clinical translation, including lack of image standardization, concerns about potential sources of bias, and factors that cause performance degradation. [ Paper ] |
|
A patient-centric dataset of images and metadata for identifying melanomas using clinical context. Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another. [ Paper ] |
|
Human–computer collaboration for skin cancer recognition. Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows. [ Paper ] |
|
Fairness of Classifiers Across Skin Tones in Dermatology In this paper, we present an approach to estimate skin tone in skin disease benchmark datasets and investigate whether model performance is dependent on this measure. [ Paper | arXiv ] |
|
Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. We provide a state-of-the-art comparison of the most advanced machine-learning algorithms with a large number of human readers, including the most experienced human experts [ Paper ] |
|
Dermoscopy Image Analysis: Overview and
Future Directions. In this paper, we present a brief overview of this exciting subfield of medical image analysis, primarily focusing on three aspects of it, namely, segmentation, feature extraction, and classification. We then provide future directions for researchers. [ Paper ] |
|
The role of public challenges and data sets towards algorithm development, trust, and use in clinical practice. In this article, we summarize recent advancements in machine learning, with a focused perspective on the role of public challenges and data sets on the progression of these technologies in skin imaging. In addition, we highlight the remaining hurdles toward effective implementation of technologies to the clinical workflow and discuss how public challenges and data sets can catalyze the development of solutions. [ Paper ] (Please email for a free copy) |
|
Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images. [ Paper ] |
|
Segmentation of both Diseased and Healthy Skin
from Clinical Photographs in a Primary Care Setting This work presents the first segmentation study of both diseased and healthy skin in standard camera photographs from a clinical environment. Challenges arise from varied lighting conditions, skin types, backgrounds, and pathological states [ Paper ] |
|
Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer. [ Paper ] |
|
Deep learning ensembles for melanoma recognition in dermoscopy images. We propose a system that combines recent developments in deep learning with established machine learning approaches, creating ensembles of methods that are capable of segmenting skin lesions, as well as analyzing the detected area and surrounding tissue for melanoma detection. [ Paper ] |
|
Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images. This work presents an approach for melanoma recognition in dermoscopy images that combines deep learning, sparse coding, and support vector machine (SVM) learning algorithms. One of the beneficial aspects of the proposed approach is that unsupervised learning within the domain, and feature transfer from the domain of natural photographs, eliminates the need of annotated data in the target task to learn good features. [ Paper ] |
Learning Representations from EEG with Deep Recurrent Convolutional Neural Networks One of the challenges in modeling cognitive events from electroencephalogram (EEG) data is finding representations that are invariant to inter- and intra-subject differences, as well as to inherent noise associated with EEG data collection. Herein, we propose a novel approach for learning such representations from multichannel EEG time-series, and demonstrate its advantages in the context of mental load classification task. [ Paper ] |
Magnetic Resonance Imaging (MRI)
Journal of Cardiovascular Magnetic Resonance 2019
Machine learning derived segmentation of phase velocity encoded cardiovascular magnetic resonance for fully automated aortic flow quantification
Fully automated machine learning PC-CMR segmentation performs robustly for aortic flow quantification - yielding rapid segmentation, small differences with manual segmentation, and identification of differential forward/left ventricular volumetric stroke volume in context of concomitant mitral regurgitation. Findings support use of machine learning for analysis of large scale CMR datasets.
[ Paper ]
JACC: Cardiovascular Imaging 2016
Echocardiographic Algorithm for Post–Myocardial Infarction LV Thrombus: A Gatekeeper for Thrombus Evaluation by Delayed Enhancement CMR
The goal of this study was to determine the prevalence of post–myocardial infarction (MI) left ventricular (LV) thrombus in the current era and to develop an effective algorithm (predicated on echocardiography [echo]) to discern patients warranting further testing for thrombus via delayed enhancement (DE) cardiac magnetic resonance (CMR).
[ Paper ]
Circulation: Cardiovascular Imaging 2012
Improved Left Ventricular Mass Quantification With Partial Voxel Interpolation In Vivo and Necropsy Validation of a Novel Cardiac MRI
Segmentation Algorithm.
This study tested LVM segmentation among clinical patients and laboratory animals undergoing CMR. In patients,
echocardiography (echo) was performed within 1 day of
CMR and used as a clinical comparator for LVM. In
laboratory animals, euthanasia was performed after CMR and
segmentation results were compared with ex vivo LV weight.
The aim was to examine the impact of partial voxel segmentation on CMR quantification of LVM
[ Paper ]
ICIP 2012
Cardiac Anatomy as a Biometric.
In this study, we propose a novel biometric signature for human identification based on anatomically unique structures of
the left ventricle of the heart. An algorithm is developed that
analyzes the 3 primary anatomical structures of the left ventricle: the endocardium, myocardium, and papillary muscle
[ Paper ]
Journal of Cardiovascular Magnetic Resonance 2010
Impact of diastolic dysfunction severity on global left ventricular volumetric filling-assessment by automated segmentation of routine cine cardiovascular magnetic resonance
To examine relationships between severity of echocardiography (echo) -evidenced diastolic dysfunction (DD) and volumetric filling by automated processing of routine cine cardiovascular magnetic resonance (CMR).
[ Paper ]
NMR in Biomedicine 2010
A radial self-calibrated (RASCAL) generalized autocalibrating partially parallel acquisition (GRAPPA) method using weight interpolation.
A generalized autocalibrating partially parallel acquisition (GRAPPA) method for radial k-space sampling is presented that calculates GRAPPA weights without synthesized or acquired calibration data.
[ Paper ]
Magnetic Resonance in Medicine 2010
Respiratory and Cardiac Self-Gated Free-Breathing
Cardiac CINE Imaging With Multiecho 3D Hybrid Radial
SSFP Acquisition.
A respiratory and cardiac self-gated free-breathing three-dimensional cine steady-state free precession imaging
method using multiecho hybrid radial sampling is presented.
[ Paper ]
IEEE TBME 2010
Automatic Left Ventricle Segmentation Using
Iterative Thresholding and an Active Contour Model
With Adaptation on Short-Axis Cardiac MRI
An automatic left ventricle (LV) segmentation algorithm is presented for quantification of cardiac output and myocardial mass in clinical practice.
[ Paper ]
Circulation: Cardiovascular Imaging 2009
Automated Segmentation of Routine Clinical Cardiac Magnetic Resonance Imaging for Assessment of Left Ventricular Diastolic Dysfunction
Automated CMR segmentation can provide LV filling profiles that may offer insight into diastolic dysfunction. Patients with diastolic dysfunction have prolonged diastolic filling intervals, which are associated with echo-evidenced diastolic dysfunction independent of clinical and imaging variables.
[ Paper ]
Radiology 2008
Left Ventricle: Automated
Segmentation by Using Myocardial
Effusion Threshold Reduction and
Intravoxel Computation at MR Imaging
The purpose of the study was to develop and
validate an algorithm for automated segmentation of the
left ventricular (LV) cavity that accounts for papillary
and/or trabecular muscles and partial voxels in cine magnetic resonance (MR) images, an algorithm called LV Myocardial Effusion Threshold Reduction with Intravoxel Computation (LV-METRIC).
[ Paper ]
Other Healthcare Research
MICCAI 2014
Automated medical image modality recognition by fusion of visual and text information.
In this work, we present a framework for medical image modality recognition based on a fusion of both visual and text classification methods. Experiments are performed on the public ImageCLEF 2013 medical image modality dataset, which provides figure images and associated fulltext articles from PubMed as components of the benchmark.
[ Paper ]
IBM JRD 2015
A generalized framework for medical image classification and recognition
In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification of medical images. In the first stage, models are built for subsets of features and data, and in the second stage, models are combined. We demonstrate the performance of this framework in four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 medical modality recognition benchmark, 2) echocardiography view and mode recognition, 3) dermatology disease recognition across two datasets, and 4) a broad medical image dataset, merged from multiple data sources into a collection of 158 categories covering both general and specific medical conceptsVincluding modalities, body regions, views, and disease states. In the first context, the presented system achieves state-of-art performance of 82.2% multiclass accuracy. In the second context, the system attains 90.48% multiclass accuracy. In the third, state-of-art performance of 90% specificity and 90% sensitivity is obtained on a small standardized dataset of 200 images using a leave-one-out strategy. For a larger dataset of 2,761 images, 95% specificity and 98% sensitivity is obtained on a 20% held-out test set. Finally, in the fourth context, the system achieves sensitivity and specificity of 94.7% and 98.4%, respectively, demonstrating the ability to generalize over domains.
[ Paper ]
Machine learning derived segmentation of phase velocity encoded cardiovascular magnetic resonance for fully automated aortic flow quantification Fully automated machine learning PC-CMR segmentation performs robustly for aortic flow quantification - yielding rapid segmentation, small differences with manual segmentation, and identification of differential forward/left ventricular volumetric stroke volume in context of concomitant mitral regurgitation. Findings support use of machine learning for analysis of large scale CMR datasets. [ Paper ] |
|
Echocardiographic Algorithm for Post–Myocardial Infarction LV Thrombus: A Gatekeeper for Thrombus Evaluation by Delayed Enhancement CMR The goal of this study was to determine the prevalence of post–myocardial infarction (MI) left ventricular (LV) thrombus in the current era and to develop an effective algorithm (predicated on echocardiography [echo]) to discern patients warranting further testing for thrombus via delayed enhancement (DE) cardiac magnetic resonance (CMR). [ Paper ] |
|
Improved Left Ventricular Mass Quantification With Partial Voxel Interpolation In Vivo and Necropsy Validation of a Novel Cardiac MRI
Segmentation Algorithm. This study tested LVM segmentation among clinical patients and laboratory animals undergoing CMR. In patients, echocardiography (echo) was performed within 1 day of CMR and used as a clinical comparator for LVM. In laboratory animals, euthanasia was performed after CMR and segmentation results were compared with ex vivo LV weight. The aim was to examine the impact of partial voxel segmentation on CMR quantification of LVM [ Paper ] |
|
Cardiac Anatomy as a Biometric. In this study, we propose a novel biometric signature for human identification based on anatomically unique structures of the left ventricle of the heart. An algorithm is developed that analyzes the 3 primary anatomical structures of the left ventricle: the endocardium, myocardium, and papillary muscle [ Paper ] |
|
Impact of diastolic dysfunction severity on global left ventricular volumetric filling-assessment by automated segmentation of routine cine cardiovascular magnetic resonance To examine relationships between severity of echocardiography (echo) -evidenced diastolic dysfunction (DD) and volumetric filling by automated processing of routine cine cardiovascular magnetic resonance (CMR). [ Paper ] |
|
A radial self-calibrated (RASCAL) generalized autocalibrating partially parallel acquisition (GRAPPA) method using weight interpolation. A generalized autocalibrating partially parallel acquisition (GRAPPA) method for radial k-space sampling is presented that calculates GRAPPA weights without synthesized or acquired calibration data. [ Paper ] |
|
Respiratory and Cardiac Self-Gated Free-Breathing
Cardiac CINE Imaging With Multiecho 3D Hybrid Radial
SSFP Acquisition. A respiratory and cardiac self-gated free-breathing three-dimensional cine steady-state free precession imaging method using multiecho hybrid radial sampling is presented. [ Paper ] |
|
Automatic Left Ventricle Segmentation Using
Iterative Thresholding and an Active Contour Model
With Adaptation on Short-Axis Cardiac MRI An automatic left ventricle (LV) segmentation algorithm is presented for quantification of cardiac output and myocardial mass in clinical practice. [ Paper ] |
|
Automated Segmentation of Routine Clinical Cardiac Magnetic Resonance Imaging for Assessment of Left Ventricular Diastolic Dysfunction Automated CMR segmentation can provide LV filling profiles that may offer insight into diastolic dysfunction. Patients with diastolic dysfunction have prolonged diastolic filling intervals, which are associated with echo-evidenced diastolic dysfunction independent of clinical and imaging variables. [ Paper ] |
|
Left Ventricle: Automated
Segmentation by Using Myocardial
Effusion Threshold Reduction and
Intravoxel Computation at MR Imaging The purpose of the study was to develop and validate an algorithm for automated segmentation of the left ventricular (LV) cavity that accounts for papillary and/or trabecular muscles and partial voxels in cine magnetic resonance (MR) images, an algorithm called LV Myocardial Effusion Threshold Reduction with Intravoxel Computation (LV-METRIC). [ Paper ] |
Automated medical image modality recognition by fusion of visual and text information. In this work, we present a framework for medical image modality recognition based on a fusion of both visual and text classification methods. Experiments are performed on the public ImageCLEF 2013 medical image modality dataset, which provides figure images and associated fulltext articles from PubMed as components of the benchmark. [ Paper ] |
|
A generalized framework for medical image classification and recognition In this work, we study the performance of a two-stage ensemble visual machine learning framework for classification of medical images. In the first stage, models are built for subsets of features and data, and in the second stage, models are combined. We demonstrate the performance of this framework in four contexts: 1) The public ImageCLEF (Cross Language Evaluation Forum) 2013 medical modality recognition benchmark, 2) echocardiography view and mode recognition, 3) dermatology disease recognition across two datasets, and 4) a broad medical image dataset, merged from multiple data sources into a collection of 158 categories covering both general and specific medical conceptsVincluding modalities, body regions, views, and disease states. In the first context, the presented system achieves state-of-art performance of 82.2% multiclass accuracy. In the second context, the system attains 90.48% multiclass accuracy. In the third, state-of-art performance of 90% specificity and 90% sensitivity is obtained on a small standardized dataset of 200 images using a leave-one-out strategy. For a larger dataset of 2,761 images, 95% specificity and 98% sensitivity is obtained on a 20% held-out test set. Finally, in the fourth context, the system achieves sensitivity and specificity of 94.7% and 98.4%, respectively, demonstrating the ability to generalize over domains. [ Paper ] |
Professional Activities
Area Chair: | ICLR 2018, 2019, 2022 (Highlighted), 2023 ICPR 2022 NeurIPS 2022 |
Associate Editor: | |
Senior Program Committee: | AAAI 2022 |
Challenge Co-Founder / Co-Organizer: | |
Workshop Co-Founder / Co-Organizer: | |
Microsoft Community: |
|
IBM Research Community: |
|
Awards
- IBM Outstanding Research Accomplishment Award (2019)
- IBM Eminence and Excellence Award (2018)
- IBM Outstanding Technical Achievement Award (2018)
- IBM Research Image Award (2016)
- IBM Invention Achievement Awards (2013, 2014, 2016, 2017, 2018)
- IBM Research Division Award (2013)
- ImageCLEF Medical Image Recognition 1st Place Team (2013)
- IBM Eminence and Excellence Award (2012)
- Cornell University Bits on our Mind (BOOM) Best in Category: Biological Science (2006)
Teaching
- Columbia University: Guest Lecturer in Computer Vision (2018)
- NYU: Guest Lecturer in Computer Vision (2016)
- Stevens Institute of Technology: Adjunct Professor in Artificial Intelligence (2014-2016)
Patents
Issued
- Automatic identification of food substance (US10528793)
- Method and system for categorizing heart disease states (US20150317789)
- Static Image Segmentation (US9311716 B2)
- Image Segmentation Techniques (US9299145 B2)
- Techniques for spatial semantic attribute matching for location identification (US9251434 B2)
- Techniques for ground-level photo geolocation using digital elevation (US9165217B2)
- Unique Cardiovascular Measurements for Human Identification (US9031288B2)
- Social media event detection and content-based retrieval (US9002069B2)
- Method for segmenting objects in images (US8369590B2)
- Viewpoint recognition in computer tomography images (US9652846)
- Determination of unique items based on generating descriptive vectors of users (US10664894)
- Surgical skin lesion removal (US10568695)
- Surface reflectance reduction in images using non-specular portion replacement (US10255674)
- Biological Neuron to Electronic Computer Interface (US12001942B2)
- Training Transfer-Focused Models for Deep Learning (US11853877B2)
- Drug Delivery Device Having Controlled Delivery and Confirmation (US11052023B2)
Provisional
- 3D-Printable Telemedicine Device (US2022/0406454A1)
- Estimating the Number of Attendees in a Meeting (US Patent App. 15/295,409)
- System and method for comparing training data with test data (US Patent App. 14/982,036)
- Identifying transfer models for machine learning tasks (US Patent App. 15/982,622)
- Generating and augmenting transfer learning datasets with pseudo-labeled images (US Patent App. 16/125,153)
- Pill collection visual recognition for automatic compliance to prescriptions (US Patent App. 15/483,126)
- Category Oversampling for Imbalanced Machine Learning (US Patent App. 14/500,023)
Featured Press Coverage
- CNBC: Microsoft announces new AI tools to help ease workload for doctors and nurses
- Forbes:Microsoft Announces Numerous New AI Tools Dedicated To Healthcare
- The Economist: Huge “foundation models” are turbo-charging AI progress
- MedGadget: Using Watson to Diagnose Skin Cancer: Interview with IBM Computer Vision Scientist, Noel Codella.
- CNN: IBM uses a smartphone to help diagnose skin cancer.
- Mashable: IBM’s smart skin cancer detection tech is as accurate as expert dermatologists.
- ZDnet: IBM’s computer vision zeros in on identifying skin cancer.
- Medium: Visual recognition could help detect skin cancer.
- VentureBeat: Skin cancer meets its worst nightmare: IBM.
- SoundCloud: Using Cognitive Computing to Visually Analyze Skin Cancer. Audio Interview.
- IBM Research Blog: Identifying skin cancer with computer vision
- IBM Press Release: IBM Research Scientists Investigate Use of Cognitive Computing-Based Visual Analytics for Skin Cancer Image Analysis.
Social Media / Contact Information