SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution
Abstract
Abstract—Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e.g., SwinIR) can significantly improve the model performance. Still, the computation overhead is also considerable when the window size gradually increases. In this paper, we present SRFormer, a simple but novel method that can enjoy the benefit of large window self-attention but introduces even less computational burden. The core of our SRFormer is the permuted self-attention (PSA), which strikes an appropriate balance between the channel and spatial information for self-attention. Without any bells and whistles, we show that our SRFormer achieves a 33.86dB PSNR score on the Urban100 dataset, which is 0.46dB higher than that of SwinIR but uses fewer parameters and computations. In addition, we also attempt to scale up the model by further enlarging the window size and channel numbers to explore the potential of Transformer-based models. Experiments show that our scaled model, named SRFormerV2, can further improve the results and achieves state-of-the-art. We hope our simple and effective approach could be useful for future research in super-resolution model design.
Method
Results
We provide the results on classical image SR, lightweight image SR, realworld image SR. More results can be found in the paper. The visual results of SRFormer can be found in [Visual Results].
Classical image SR
Model size comparison
Realworld image SR
Citation
@inproceedings{zhou2023srformer, title={Srformer: Permuted self-attention for single image super-resolution}, author={Zhou, Yupeng and Li, Zhen and Guo, Chun-Le and Bai, Song and Cheng, Ming-Ming and Hou, Qibin}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={12780--12791}, year={2023} }
The website is borrowed from Codeformer .