TY - GEN
T1 - Real-time detection and tracking by an array camera with distributed neural processing
AU - Skowronek, James T.
AU - Hageman, Gordon C.
AU - Brady, David J.
N1 - Publisher Copyright:
© 2025 SPIE. All rights reserved.
PY - 2025
Y1 - 2025
N2 - An intelligent array camera capable of detecting, identifying and tracking anomalies requires real-time processing in a distributed architecture. It also requires a robust, abstract feature space that a central”brain” can probe for prior features. An algorithm is presented for synthesizing multi-focal, multi-perspective, multi-spectral (color and monochrome) video sequences from a single high-resolution color source and fusing them to form a single spatio-temporally super-resolved output. By applying 2D homographies to generate new viewpoints and focal lengths, a miniature camera array is simulated. Each camera’s frames are downsampled and time-decimated, then converted into a robust, structured feature representation using a “PixelSquasher” module that computes biased statistical moments and geometric cues in local patches. A symmetry-aware UNet encoder (asymmUnet) processes these features in separate invariant vs. equivariant channels for rotation, scale, intensity, and time. The fused latent representation is connected to a forward diffusion network (FDN), which injects inhomogeneous noise (conditioned on the encoder’s features) into a pretrained VAE-based representation of the ground truth. A reverse diffusion network (RDN) then refines the asymmUnet’s decoder outputs back into that same VAE latent space to yield final high-resolution reconstructions. This design captures higher-frequency details from narrower-FOV monochrome views while preserving color coverage and wide-FOV context from the original camera. It is demonstrated that a single camera’s video can effectively simulate multi-camera input to evaluate spatio-temporal super-resolution in a controlled environment.
AB - An intelligent array camera capable of detecting, identifying and tracking anomalies requires real-time processing in a distributed architecture. It also requires a robust, abstract feature space that a central”brain” can probe for prior features. An algorithm is presented for synthesizing multi-focal, multi-perspective, multi-spectral (color and monochrome) video sequences from a single high-resolution color source and fusing them to form a single spatio-temporally super-resolved output. By applying 2D homographies to generate new viewpoints and focal lengths, a miniature camera array is simulated. Each camera’s frames are downsampled and time-decimated, then converted into a robust, structured feature representation using a “PixelSquasher” module that computes biased statistical moments and geometric cues in local patches. A symmetry-aware UNet encoder (asymmUnet) processes these features in separate invariant vs. equivariant channels for rotation, scale, intensity, and time. The fused latent representation is connected to a forward diffusion network (FDN), which injects inhomogeneous noise (conditioned on the encoder’s features) into a pretrained VAE-based representation of the ground truth. A reverse diffusion network (RDN) then refines the asymmUnet’s decoder outputs back into that same VAE latent space to yield final high-resolution reconstructions. This design captures higher-frequency details from narrower-FOV monochrome views while preserving color coverage and wide-FOV context from the original camera. It is demonstrated that a single camera’s video can effectively simulate multi-camera input to evaluate spatio-temporal super-resolution in a controlled environment.
KW - Spatio-temporal super-resolution
KW - UNet
KW - deep generative models
KW - diffusion
KW - multi-camera fusion
KW - multi-view homographies
KW - scale-space
UR - https://www.scopus.com/pages/publications/105010478459
UR - https://www.scopus.com/pages/publications/105010478459#tab=citedBy
U2 - 10.1117/12.3054238
DO - 10.1117/12.3054238
M3 - Conference contribution
AN - SCOPUS:105010478459
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - Real-Time Image Processing and Deep Learning 2025
A2 - Kehtarnavaz, Nasser
A2 - Shirvaikar, Mukul V.
PB - SPIE
T2 - Real-Time Image Processing and Deep Learning 2025
Y2 - 14 April 2025 through 15 April 2025
ER -