Network compression techniques that combine tensor decompositions and pruning have shown promise in leveraging the advantages of both strategies. In this work, we propose enhanced Network cOmpRession through TensOr decompositions and pruNing (NORTON), a novel method for network compression. NORTON introduces the concept of filter decomposition, enabling a more detailed decomposition of the network while preserving the weight’s multidimensional properties. Our method incorporates a novel structured pruning approach, effectively integrating the decomposed model. Through extensive experiments on various architectures, benchmark datasets, and representative vision tasks, we demonstrate the usefulness of our method. NORTON achieves superior results compared to state-of-the-art techniques in terms of complexity and accuracy.
Tensor decomposition has emerged as an effective tool for compressing convolutional neural networks by exploiting the low-rank structure of weight tensors. Existing approaches typically focus on which decomposition to use and how to select the rank, while implicitly assuming that the convolutional weights should be decomposed at the level of the entire layer.
However, a convolutional layer weight is inherently a 4-order tensor, and the way this tensor is handled prior to decomposition has a major impact on both compression efficiency and approximation quality. Prior works either decompose the whole layer directly, or reshape the tensor into a lower-order form before applying decomposition. These strategies operate at a coarse granularity and may either limit flexibility or compromise the original multidimensional structure.
In contrast, we observe that convolution operates on a filter-by-filter basis: each output feature map is produced independently by convolving the input with a single filter. This motivates a more fine-grained perspective, where the 4-order weight tensor is naturally interpreted as a collection of 3-order filter tensors.
Building on this observation, NORTON adopts a filters decomposition strategy, in which each filter is decomposed individually. This preserves the full multidimensional structure, avoids unnecessary tensor reshaping, and yields a narrower and more stable range of ranks. Moreover, filters decomposition replaces a convolutional layer with fewer sublayers than layer-wise decomposition, reducing architectural complexity and facilitating subsequent compression steps.
Among existing model compression paradigms, tensor decomposition and structured pruning have proven to be both effective and practical. Decomposition reduces redundancy by exploiting low-rank structure in weights, while pruning removes redundant filters or channels directly. Importantly, both approaches produce standard architectures that can be deployed on resource-constrained devices without specialized hardware.
Despite their shared goal, these two techniques have largely been studied in isolation. Only limited efforts have explored how to combine them in a truly complementary manner. This separation overlooks a key property of convolutional networks: their weights simultaneously exhibit low-rank and sparse characteristics.
Decomposition methods alone cannot eliminate all redundant filters, while aggressive pruning often leads to severe accuracy degradation when low-rank structure is ignored. As a result, state-of-the-art methods based on either paradigm typically saturate well below extreme compression regimes.
NORTON is motivated by the idea that these two strategies should be integrated rather than stacked mechanically. By operating on a shared representation— the factor matrices produced by filter-wise decomposition—NORTON enables decomposition and pruning to reinforce each other. This principled integration allows NORTON to achieve ultra-high compression rates, pushing network compression far beyond what either approach can reach independently.
NORTON is a unified network compression framework that tightly integrates filter-wise tensor decomposition and structured filter pruning in a sequential and mutually reinforcing manner. The key insight is that convolutional filters exhibit both low-rank and redundant structures, which can be exploited more effectively when these two paradigms are combined through a shared representation.
As illustrated in Figure 3, NORTON follows a two-phase pipeline applied simultaneously to all convolutional layers:
By sharing the same factor matrices for both decomposition and pruning, NORTON achieves extreme compression rates while maintaining accuracy. A single fine-tuning stage is then sufficient to recover performance, resulting in highly compact models suitable for deployment on resource-constrained devices.
To evaluate NORTON's effectiveness in downstream tasks, we used our compressed ResNet-50/Imagenet as the backbone for training Faster/Mask/Keypoint-RCNN on COCO. Our method shows promising results in terms of precision and recall, along with achieving relatively higher compression levels compared to other approaches. Remarkably, NORTON significantly enhances inference throughput, resulting in over a \(2 \times\) improvement in frames per second (FPS) compared to the baseline models. For instance, Faster-RCNN exhibits a reduction in end-to-end latency from 100 ms to 42 ms, achieving a real-time framerate of 25 FPS. It is worth emphasizing that these performance evaluations were conducted on a GTX 3060 GPU, providing robust evidence of the real-world applicability of our approach. These results highlight NORTON's potential as a valuable tool for enhancing neural network efficiency and effectiveness in demanding tasks such as real-world object detection, instance segmentation, and keypoint detection.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Figure 2: Baseline (left) vs Compressed (right) model inference.
We present a qualitative evaluation of feature preservation, complementing the established efficiency demonstrated through numerical results. Our analysis involves a random selection of 5 images from the ImageNet validation dataset, examining three compression levels applied to the original ResNet-50 model: 44%, 63%, and 79%. Utilizing GradCAM for interpretation, we visually assess and analyze feature maps in both the original and compressed models. The visual representation underscores our framework's efficacy in retaining crucial features across a diverse range of classes. Noteworthy is its consistent robustness in capturing and preserving essential information at different CRs. This resilience implies sustained effectiveness and reliability across varying scenarios and compression levels, positioning our framework as a versatile choice for network compression across diverse applications and datasets.
| Input | CR=0% | CR=50% | CR=64% | CR=78% |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
Figure 3: Qualitative assessment of feature preservation in compressed models.
If the code and paper help your research, please kindly cite:
@article{pham2025enhanced,
title={Enhanced Network Compression Through Tensor Decompositions and Pruning},
author={Pham, Van Tien and Zniyed, Yassine and Nguyen, Thanh Phuong},
journal={IEEE Transactions on Neural Networks and Learning Systems},
year={2025},
volume={36},
number={3},
pages={4358-4370},
doi={10.1109/TNNLS.2024.3370294}
}
This work was granted access to the high-performance computing resources of IDRIS under the allocation 2023-103147 made by GENCI. Specifically, our experiments were conducted on the Jean Zay supercomputer, located at IDRIS, the national computing center for the National Centre for Scientific Research (CNRS).
We thank the Agence Nationale de la Recherche (ANR) for partially supporting our work through the ANR ASTRID ROV-Chasseur project (ANR-21-ASRO-0003).