DIRS-YTMT
Activation-based CNN interaction that recycles suppressed features between streams.
Official Project Page
A unified DIRS framework for reflection separation, reflection scene reconstruction, and polarized multi-image reflection separation.
College of Intelligence and Computing, Tianjin University † Corresponding author: xj.max.guo@gmail.com
Abstract
DIRS tackles real-world layer entanglement by introducing a learnable nonlinear superposition model and unifying dual-stream architectures under a generalized interaction paradigm.
The Challenge: Reflection superimposition remains a severely ill-posed problem. Existing methods often struggle in real-world scenarios because they rely on flawed linear blending assumptions in sRGB space or treat layer disentanglement as isolated, single-stream subproblems.
Nonlinear Formation: We challenge the conventional linear composition model. DIRS introduces a learnable nonlinear interaction term that faithfully captures the complex layer couplings and biases introduced by real-world ISP pipelines.
Unified Interaction: To resolve the intrinsic ambiguity, we propose a generalized dual-stream interactive architecture. It facilitates deep, bidirectional feature exchange between transmission and reflection pathways. This principled framework seamlessly unifies activation-based, gate-based, and attention-based mechanisms across both CNN and Transformer backbones.
Method
The Learnable Offset-Residual model captures complex nonlinear physical couplings, while the interactive dual-stream architecture enables explicit bidirectional feature exchange to cleanly disentangle the overlapping layers.
Activation-based CNN interaction that recycles suppressed features between streams.
Mutually gated CNN interaction for spatially varying nonlinear reflection coupling.
Transformer interaction with dual-stream joint attention and parallel self-attention.
Models
FLOPs and latency are measured on a single NVIDIA RTX 3090 with 256 x 256 inputs. PSNR/SSIM are averaged over Real20 and SIR2.
| Model | Type | Params | FLOPs | Time | PSNR | SSIM |
|---|---|---|---|---|---|---|
| DIRS-YTMT | CNN, activation interaction | 32.42M | 102.91G | 31.35 ms | 24.94 | 0.902 |
| DIRS-MuGI | CNN, mutual gating | 84.47M | 153.98G | 49.95 ms | 25.63 | 0.913 |
| DIRS-PAIR | Transformer, joint attention | 48.80M | 200.22G | 75.36 ms | 26.37 | 0.918 |
| DIRS-PAIR + Nature | Transformer, joint attention | 48.80M | 200.22G | 75.36 ms | 26.95 | 0.926 |
Survey
Decades of reflection removal and separation methods organized by input modality, physical cue, prior constraint, and network paradigm.
Results
DIRS separates strong real-world reflections and extends naturally to reflection scene reconstruction and polarized image reflection separation.
Examples
Drag the divider to inspect the input image against the predicted transmission layer, and the predicted reflection layer against the reconstructed reflection scene.
Citation
Please cite DIRS if this project is useful for your research.
@article{hu2026dirs,
title={Principled Reflection Separation via Nonlinear Superposition and Feature Interaction},
author={Hu, Qiming and Li, Mingjia and Li, Yuntong and Guo, Xiaojie},
journal={arXiv preprint},
year={2026}
}