Beyond Blanket Masking: Examining Granularity for Privacy Protection in Images Captured by Blind and Low Vision Users

Jeffri Murrugarra-Llerena¹, Haoran Niu², K. Suzanne Barber², Hal Daumé III³, Yang Trista Cao², Paola Cascante-Bonilla^1,3,

¹Stony Brook University, ²University of Texas at Austin ³University of Maryland, College Park

Conference on Large Language Models (COLM) 2025

arXiv Code: Agent Code: Risk Score

Comparison with other models

Image

Gemini 2.5

GPT-4o

MistralOCR

Full Fine-Grained (Ours)

⭐ Fig-Priv (Ours)

Abstract

As visual assistant systems powered by visual language models (VLMs) be- come more prevalent, concerns over user privacy have grown, particularly for blind and low vision users who may unknowingly capture personal private information in their images. Existing privacy protection methods rely on coarse-grained segmentation, which uniformly masks entire private objects, often at the cost of usability. In this work, we propose FiG-Priv, a fine-grained privacy protection framework that selectively masks only high-risk private information while preserving low-risk information. Our approach integrates fine-grained segmentation with a data-driven risk scoring mech- anism. By leveraging a more nuanced understanding of privacy risk, our method enables more effective protection without unnecessarily restricting users’ access to critical information. We evaluate our framework using the BIV-Priv-Seg dataset and show that FiG-Priv preserves +26% of image content, enhancing the ability of VLMs to provide useful responses by 11% and identify the image content by 45%, while ensuring privacy protection.

Approach

We propose the FiG-Priv framework, which combines fine-grained segmentation with a data-driven risk scoring mechanism.

Results

To evaluate VLM performance across the three masking strategies, we first test object recognition in two settings: when the private object is the focus of the question, and when a control object is the focus with the private object in the background.

We also evaluate VLM performance on the VQA task using realistic, human-asked questions. Note that, high-risk masking achieves an answerability rate closest to that of the full image.