Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users

1Stony Brook University, 2University of Texas at Austin 2University of Maryland, College Park
Conference on Large Language Models (COLM) 2025

Comparison with other models

Image
Gemini
GPT4
MistralOCR
Finegrained
FigPriv
Image
Gemini
GPT4
MistralOCR
Finegrained
FigPriv
Image
Gemini
GPT4
MistralOCR
Finegrained
FigPriv
Image

Image

Gemini

Gemini 2.5

GPT4

GPT-4o

MistralOCR

MistralOCR

Finegrained

Full Fine-Grained (Ours)

FigPriv

⭐ Fig-Priv (Ours)

Abstract

As visual assistant systems powered by visual language models (VLMs) be- come more prevalent, concerns over user privacy have grown, particularly for blind and low vision users who may unknowingly capture personal private information in their images. Existing privacy protection methods rely on coarse-grained segmentation, which uniformly masks entire private objects, often at the cost of usability. In this work, we propose FiG-Priv, a fine-grained privacy protection framework that selectively masks only high-risk private information while preserving low-risk information. Our approach integrates fine-grained segmentation with a data-driven risk scoring mech- anism. By leveraging a more nuanced understanding of privacy risk, our method enables more effective protection without unnecessarily restricting users’ access to critical information. We evaluate our framework using the BIV-Priv-Seg dataset and show that FiG-Priv preserves +26% of image content, enhancing the ability of VLMs to provide useful responses by 11% and identify the image content by 45%, while ensuring privacy protection.

Description of the image

Approach

We propose the FiG-Priv framework, which combines fine-grained segmentation with a data-driven risk scoring mechanism.

Description of the image

Results

To evaluate VLM performance across the three masking strategies, we first test object recognition in two settings: when the private object is the focus of the question, and when a control object is the focus with the private object in the background.

Description of the image

We also evaluate VLM performance on the VQA task using realistic, human-asked questions. Note that, high-risk masking achieves an answerability rate closest to that of the full image.

Description of the image

BibTeX

@article{ji@2025posetraj,
  author    = {Murrugarra-Llerena, Jeffri and Haoran, Niu and K.Suzanne, Barber and Daume III, Hal and Trista Cao, Yang and Cascante-Bonilla, Paola},
  title     = {Beyond Blanket Masking: Examining Granularity for Privac Protection in Images Captured by Blind and Low Vision Users},
  journal   = {COLM},
  year      = {2025},
}