About Me

I am a Principal Researcher at Microsoft. My expertise spans topics across computer vision and healthcare, with publications in CVPR, ICCV, ECCV, ICLR, MICCAI, Nature Medicine, Lancet Oncology, and other prestigious venues, some of which have been covered by major international news media organizations (CNN, Medgadget, etc.). Within computer vision, my interests include few-shot learning, transformer architectures, vision+language tasks, fairness, explainability, and object detection. Within healthcare, my expertise includes dermatology, electroencephalography, cardiac MRI, and human physiology. In dermatology, I am a co-founder of the International Skin Imaging Collaboration (ISIC) challenges on skin cancer classification, which have received over 114,000 total submissions from over 4,000 competitors to continuous live challenges (hosted at ISBI 2016-2017, MICCAI 2018-2020). My Ph.D. research was focused on cardiac MRI in the department of Human Physiology at the Weill Medical College of Cornell University.

In addition to technical expertise building state-of-art machine learning systems and identifying new fields of study, I have fundamental expertise in application domain risks and costs, especially in healthcare, where I have led and collaborated on numerous clinical validation studies and assessments of machine learning error characteristics, fairness, and biases. In these works, I have developed state-of-art evaluation protocols that have exposed previously unidentified errors when compared with established metrics.


  • Ph.D., Physiology, Weill Medical College of Cornell University (2010)
  • M.Eng., Computer Science, Cornell University (2005)
  • B.S., Computer Science, Columbia University (2004)

Recent News:


Machine Learning:
[ Computer Vision | Explainability ]

[ Dermatology | EEG | MRI | Other ]

Google Scholar Profile

Computer Vision

ECCV 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that light-weight modality-specific parallel modules further improve performance. Experimental results show that the proposed MS-CLIP approach outperforms vanilla CLIP by up to 13% relative in zero-shot ImageNet classification (pre-trained on YFCC-100M), while simultaneously supporting a reduction of parameters. of the entire model.

[ Paper ]
ECCV 2022
DaViT: Dual Attention Vision Transformers

In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. We propose approaching the problem from an orthogonal angle: exploiting self-attention mechanisms with both “spatial tokens” and “channel tokens”. With spatial tokens, the spatial dimension defines the token scope, and the channel dimension defines the token feature dimension. With channel tokens, we have the inverse: the channel dimension defines the token scope, and the spatial dimension defines the token feature dimension. We further group tokens along the sequence direction for both spatial and channel tokens to maintain the linear complexity of the entire model.

[ Paper ]
arXiv 2021
Florence: A New Foundation Model for Computer Vision

Automated visual understanding of our diverse and open world demands computer vision models to generalize well with minimal customization for specific tasks, similar to human vision. Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications. While existing vision foundation models such as CLIP (Radford et al., 2021), ALIGN (Jia et al., 2021), and Wu Dao 2.0 (Wud) focus mainly on mapping images and textual representations to a cross-modal shared representation, we introduce a new computer vision foundation model, Florence, to expand the representations from coarse (scene) to fine (object), from static (images) to dynamic (videos), and from RGB to multiple modalities (caption, depth)

[ Paper ]
CVPR 2022
RegionCLIP: Region-based Language-Image Pretraining.

We propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Further, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets.

[ Paper | Code ]
ICCV 2021
CvT: Introducing Convolutions to Vision Transformers.

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

[ Paper | Code ]
ECCV 2020
A Broader Study of Cross-Domain Few-Shot Learning.

In this paper, we propose the Broader Study of Cross-Domain Few-Shot Learning (BSCD-FSL) benchmark, consisting of image data from a diverse assortment of image acquisition methods. This includes natural images, such as crop disease images, but additionally those that present with an increasing dissimilarity to natural images, such as satellite images, dermatology images, and x-rays.

[ Paper | Code ]
ACM MM 2014
Modeling Attributes from Category-Attribute Proportions

In this paper, we propose to model attributes from category-attribute proportions. The proposed framework can model attributes without attribute labels on the images.

[ Paper ]


Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning

This framework augments training data to include explanations elicited from domain users, in addition to features and labels. This approach ensures that explanations for predictions are tailored to the complexity expectations and domain knowledge of the consumer

[ Paper ]
TED: Teaching AI to explain its decisions

This work introduces a simple, practical framework, called Teaching Explanations for Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the generality and effectiveness of this approach with two different examples, resulting in highly accurate explanations with no loss of prediction performance.

[ Paper ]
Collaborative Human-AI (CHAI): Evidence-based interpretable melanoma classification in dermoscopic images.

In this work, an approach for evidence-based classification is presented. A feature embedding is learned with CNNs, triplet-loss, and global average pooling, and used to classify via kNN search. Evidence is provided as both the discovered neighbors, as well as localized image regions most relevant to measuring distance between query and neighbors.

[ Paper ]


Lancet Digital Health 2022
Validation of artificial intelligence prediction models for skin cancer diagnosis using dermoscopy images

Previous studies of artificial intelligence (AI) applied to dermatology have shown AI to have higher diagnostic classification accuracy than expert dermatologists; however, these studies did not adequately assess clinically realistic scenarios, such as how AI systems behave when presented with images of disease categories that are not included in the training dataset or images drawn from statistical distributions with significant shifts from training distributions. We aimed to simulate these real-world scenarios and evaluate the effects of image source institution, diagnoses outside of the training set, and other image artifacts on classification accuracy, with the goal of informing clinicians and regulatory agencies about safety and real-world accuracy.

[ Paper ]
JAMA Dermatology 2021
CLEAR Derm Consensus Guidelines

In this consensus statement, key recommendations for developers and reviewers of imaging-based AI reports in dermatology were formulated and grouped into the topics of (1) data, (2) technique, (3) technical assessment, and (4) application. Guidelines are proposed to address current challenges in dermatology image-based AI that hinder clinical translation, including lack of image standardization, concerns about potential sources of bias, and factors that cause performance degradation.

[ Paper ]
Nature Scientific Data 2021
A patient-centric dataset of images and metadata for identifying melanomas using clinical context.

Prior skin image datasets have not addressed patient-level information obtained from multiple skin lesions from the same patient. Though artificial intelligence classification algorithms have achieved expert-level performance in controlled studies examining single images, in practice dermatologists base their judgment holistically from multiple lesions on the same patient. The 2020 SIIM-ISIC Melanoma Classification challenge dataset described herein was constructed to address this discrepancy between prior challenges and clinical practice, providing for each image in the dataset an identifier allowing lesions from the same patient to be mapped to one another.

[ Paper ]
Nature Medicine 2020
Human–computer collaboration for skin cancer recognition.

Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows.

[ Paper ]
Fairness of Classifiers Across Skin Tones in Dermatology

In this paper, we present an approach to estimate skin tone in skin disease benchmark datasets and investigate whether model performance is dependent on this measure.

[ Paper | arXiv ]
Lancet Oncology 2019
Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

We provide a state-of-the-art comparison of the most advanced machine-learning algorithms with a large number of human readers, including the most experienced human experts

[ Paper ]
Dermoscopy Image Analysis: Overview and Future Directions.

In this paper, we present a brief overview of this exciting subfield of medical image analysis, primarily focusing on three aspects of it, namely, segmentation, feature extraction, and classification. We then provide future directions for researchers.

[ Paper ]
Seminars in Cutaneous Medicine and Surgery 2019
The role of public challenges and data sets towards algorithm development, trust, and use in clinical practice.

In this article, we summarize recent advancements in machine learning, with a focused perspective on the role of public challenges and data sets on the progression of these technologies in skin imaging. In addition, we highlight the remaining hurdles toward effective implementation of technologies to the clinical workflow and discuss how public challenges and data sets can catalyze the development of solutions.

[ Paper ]
(Please email for a free copy)
Journal of the American Academy of Dermatology 2018
Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images.

We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images.

[ Paper ]
EMBC 2018
Segmentation of both Diseased and Healthy Skin from Clinical Photographs in a Primary Care Setting

This work presents the first segmentation study of both diseased and healthy skin in standard camera photographs from a clinical environment. Challenges arise from varied lighting conditions, skin types, backgrounds, and pathological states

[ Paper ]
ISBI 2018
Skin lesion analysis toward melanoma detection: A challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC).

This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer.

[ Paper ]
IBM JRD 2016
Deep learning ensembles for melanoma recognition in dermoscopy images.

We propose a system that combines recent developments in deep learning with established machine learning approaches, creating ensembles of methods that are capable of segmenting skin lesions, as well as analyzing the detected area and surrounding tissue for melanoma detection.

[ Paper ]
Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images.

This work presents an approach for melanoma recognition in dermoscopy images that combines deep learning, sparse coding, and support vector machine (SVM) learning algorithms. One of the beneficial aspects of the proposed approach is that unsupervised learning within the domain, and feature transfer from the domain of natural photographs, eliminates the need of annotated data in the target task to learn good features.

[ Paper ]

Electroencephalography (EEG)

ICLR 2016
Learning Representations from EEG with Deep Recurrent Convolutional Neural Networks

One of the challenges in modeling cognitive events from electroencephalogram (EEG) data is finding representations that are invariant to inter- and intra-subject differences, as well as to inherent noise associated with EEG data collection. Herein, we propose a novel approach for learning such representations from multichannel EEG time-series, and demonstrate its advantages in the context of mental load classification task.

[ Paper ]

Magnetic Resonance Imaging (MRI)

Journal of Cardiovascular Magnetic Resonance 2019
Machine learning derived segmentation of phase velocity encoded cardiovascular magnetic resonance for fully automated aortic flow quantification

Fully automated machine learning PC-CMR segmentation performs robustly for aortic flow quantification - yielding rapid segmentation, small differences with manual segmentation, and identification of differential forward/left ventricular volumetric stroke volume in context of concomitant mitral regurgitation. Findings support use of machine learning for analysis of large scale CMR datasets.

[ Paper ]
JACC: Cardiovascular Imaging 2016
Echocardiographic Algorithm for Post–Myocardial Infarction LV Thrombus: A Gatekeeper for Thrombus Evaluation by Delayed Enhancement CMR

The goal of this study was to determine the prevalence of post–myocardial infarction (MI) left ventricular (LV) thrombus in the current era and to develop an effective algorithm (predicated on echocardiography [echo]) to discern patients warranting further testing for thrombus via delayed enhancement (DE) cardiac magnetic resonance (CMR).

[ Paper ]
Circulation: Cardiovascular Imaging 2012
Improved Left Ventricular Mass Quantification With Partial Voxel Interpolation In Vivo and Necropsy Validation of a Novel Cardiac MRI Segmentation Algorithm.

This study tested LVM segmentation among clinical patients and laboratory animals undergoing CMR. In patients, echocardiography (echo) was performed within 1 day of CMR and used as a clinical comparator for LVM. In laboratory animals, euthanasia was performed after CMR and segmentation results were compared with ex vivo LV weight. The aim was to examine the impact of partial voxel segmentation on CMR quantification of LVM

[ Paper ]
ICIP 2012
Cardiac Anatomy as a Biometric.

In this study, we propose a novel biometric signature for human identification based on anatomically unique structures of the left ventricle of the heart. An algorithm is developed that analyzes the 3 primary anatomical structures of the left ventricle: the endocardium, myocardium, and papillary muscle

[ Paper ]
Journal of Cardiovascular Magnetic Resonance 2010
Impact of diastolic dysfunction severity on global left ventricular volumetric filling-assessment by automated segmentation of routine cine cardiovascular magnetic resonance

To examine relationships between severity of echocardiography (echo) -evidenced diastolic dysfunction (DD) and volumetric filling by automated processing of routine cine cardiovascular magnetic resonance (CMR).

[ Paper ]
NMR in Biomedicine 2010
A radial self-calibrated (RASCAL) generalized autocalibrating partially parallel acquisition (GRAPPA) method using weight interpolation.

A generalized autocalibrating partially parallel acquisition (GRAPPA) method for radial k-space sampling is presented that calculates GRAPPA weights without synthesized or acquired calibration data.

[ Paper ]
Magnetic Resonance in Medicine 2010
Respiratory and Cardiac Self-Gated Free-Breathing Cardiac CINE Imaging With Multiecho 3D Hybrid Radial SSFP Acquisition.

A respiratory and cardiac self-gated free-breathing three-dimensional cine steady-state free precession imaging method using multiecho hybrid radial sampling is presented.

[ Paper ]
Automatic Left Ventricle Segmentation Using Iterative Thresholding and an Active Contour Model With Adaptation on Short-Axis Cardiac MRI

An automatic left ventricle (LV) segmentation algorithm is presented for quantification of cardiac output and myocardial mass in clinical practice.

[ Paper ]
Circulation: Cardiovascular Imaging 2009
Automated Segmentation of Routine Clinical Cardiac Magnetic Resonance Imaging for Assessment of Left Ventricular Diastolic Dysfunction

Automated CMR segmentation can provide LV filling profiles that may offer insight into diastolic dysfunction. Patients with diastolic dysfunction have prolonged diastolic filling intervals, which are associated with echo-evidenced diastolic dysfunction independent of clinical and imaging variables.

[ Paper ]
Radiology 2008
Left Ventricle: Automated Segmentation by Using Myocardial Effusion Threshold Reduction and Intravoxel Computation at MR Imaging

The purpose of the study was to develop and validate an algorithm for automated segmentation of the left ventricular (LV) cavity that accounts for papillary and/or trabecular muscles and partial voxels in cine magnetic resonance (MR) images, an algorithm called LV Myocardial Effusion Threshold Reduction with Intravoxel Computation (LV-METRIC).

[ Paper ]

Other Healthcare Research

Automated medical image modality recognition by fusion of visual and text information.

In this work, we present a framework for medical image modality recognition based on a fusion of both visual and text classification methods. Experiments are performed on the public ImageCLEF 2013 medical image modality dataset, which provides figure images and associated fulltext articles from PubMed as components of the benchmark.

[ Paper ]

Professional Activities

Area Chair: ICLR 2018, 2019, 2022 (Highlighted)
ICPR 2022
NeurIPS 2022
Associate Editor:
Senior Program Committee: AAAI 2022
Challenge Co-Founder / Co-Organizer:
  • International Skin Imaging Collaboration: ISBI 2016, 2017. MICCAI 2018, 2019, 2020
  • Cross-Domain Few-Shot Learning Benchmark: CVPR 2020, 2021
Workshop Co-Founder / Co-Organizer:
Microsoft Community:
  • Microsoft Startups Mentor in Machine Learning (2020-Present)
IBM Research Community:
  • Computer Vision & Multimedia Professional Interest Community (PIC) Chair (2017-2020)
  • Global Technology Outlook (GTO) Advocate for Department of Cognitive Computing (2017)
  • Culture Club & IBM 5K Co-Organizer (2014-2020)


  • IBM Outstanding Research Accomplishment Award (2019)
  • IBM Eminence and Excellence Award (2018)
  • IBM Outstanding Technical Achievement Award (2018)
  • IBM Research Image Award (2016)
  • IBM Invention Achievement Awards (2013, 2014, 2016, 2017, 2018)
  • IBM Research Division Award (2013)
  • ImageCLEF Medical Image Recognition 1st Place Team (2013)
  • IBM Eminence and Excellence Award (2012)
  • Cornell University Bits on our Mind (BOOM) Best in Category: Biological Science (2006)


  • Columbia University: Guest Lecturer in Computer Vision (2018)
  • NYU: Guest Lecturer in Computer Vision (2016)
  • Stevens Institute of Technology: Adjunct Professor in Artificial Intelligence (2014-2016)



  • Automatic identification of food substance (US10528793)
  • Method and system for categorizing heart disease states (US20150317789)
  • Static Image Segmentation (US9311716 B2)
  • Image Segmentation Techniques (US9299145 B2)
  • Techniques for spatial semantic attribute matching for location identification (US9251434 B2)
  • Techniques for ground-level photo geolocation using digital elevation (US9165217B2)
  • Unique Cardiovascular Measurements for Human Identification (US9031288B2)
  • Social media event detection and content-based retrieval (US9002069B2)
  • Method for segmenting objects in images (US8369590B2)
  • Viewpoint recognition in computer tomography images (US9652846)
  • Determination of unique items based on generating descriptive vectors of users (US10664894)
  • Surgical skin lesion removal (US10568695)
  • Surface reflectance reduction in images using non-specular portion replacement (US10255674)


  • Biological neuron to electronic computer interface (US Patent App. 15/851,949)
  • Estimating the Number of Attendees in a Meeting (US Patent App. 15/295,409)
  • System and method for comparing training data with test data (US Patent App. 14/982,036)
  • Identifying transfer models for machine learning tasks (US Patent App. 15/982,622)
  • Training transfer-focused models for deep learning (US Patent App. 16/373,149)
  • Generating and augmenting transfer learning datasets with pseudo-labeled images (US Patent App. 16/125,153)
  • Drug delivery device having controlled delivery and confirmation (US Patent App. 15/848,169)
  • Pill collection visual recognition for automatic compliance to prescriptions (US Patent App. 15/483,126)
  • Category Oversampling for Imbalanced Machine Learning (US Patent App. 14/500,023)

Social Media / Contact Information