CRAFT: Curriculum Rank Adversarial Fine-Tuning for Robust Vision Language Models

Shivang Chopra, Chengyue Huang, Brisa Maneechotesuwan, Zsolt Kira.

December, 2024

Abstract

Existing Vision-Language Models (VLMs) have demonstrated remarkable zero-shot performance across various visual domains and tasks. However, recent studies have shown that fine-tuning VLMs on downstream tasks results in loss of generalization and decreased robustness against distribution shifts. To address this issue, we propose Curriculum Rank Adversarial Fine-Tuning (CRAFT), a unified low-rank fine-tuning framework designed to enhance both out-of-distribution (OOD) and adversarial robustness by integrating adaptive adversarial weight perturbations into a curriculum-driven Low-Rank Adaptation (LoRA) framework. CRAFT is grounded in three key insights (1) constrained parameter updates preserve OOD generalization, (2) promoting a flat weight-loss landscape enhances OOD robustness, and (3) adversarial training with adaptive perturbation budgets mitigate catastrophic forgetting. By progressively increasing the rank of weight updates and perturbations over the course of training, CRAFT balances task-specific adaptation with robustness, yielding flatter minima and enhanced OOD robustness. Through comprehensive empirical experiments, we demonstrate that CRAFT preserves VLMs’ zero-shot abilities while adapting to specific tasks, outperforming state-of-the-art adversarial and robust fine-tuning approaches in both natural and adversarial distribution shifts. When fine-tuned on DomainNet and ImageNet datasets, CRAFT shows state-of-the-art ID performance while improving average OOD performance by 12% and 10% respectively over the vanilla fine-tuning baseline.

Type

Preprint

Publication

In Submission

CRAFT: Curriculum Rank Adversarial Fine-Tuning for Robust Vision Language Models

Abstract

Shivang Chopra

CS Ph.D. Student