back to projects
preview coming soon
researcharXiv preprint

NeuroLens

Multimodal adversarial robustness toolkit. Novel cross-modal attack with transfer analysis. Published preprint.

The problem

Multimodal models like CLIP are increasingly used in production, but their adversarial robustness is poorly understood. How do perturbations in one modality affect the other?

Architecture

Attack Pipeline:
  Image → FGSM/PGD perturbation → CLIP encoder → embedding space
  Text  → Token manipulation → CLIP encoder → embedding space
  
Transfer Analysis:
  Cross-modal: image perturbations → text retrieval
  Cross-model: CLIP attacks → other vision-language models

Key decisions

DecisionChoiceWhy
Attack methodsFGSM + PGDStandard gradient-based attacks, extended to multimodal setting.
Target modelCLIPMost deployed multimodal model. High impact research.
TrackingWeights & BiasesExperiment tracking for hundreds of attack configurations.

Results

ATTACK SUCCESS
94%
on CLIP ViT-B/32
TRANSFER RATE
67%
to other VL models
PREPRINT
arxiv
under review