Time and Frequency Network for Human Action Detection in Videos

Human action detection in films can be applied in such spots as online video surveillance, human-laptop or computer conversation, and machine management. The endeavor calls for an image sequence with a three-dimensional shape as an input to detect such actions as jogging or catching a ball.

Impression credit score: pxhere.com, CC0 Public Area

Normally, convolutional neural networks (CNN) are made use of for this endeavor. Having said that, they only think about the spatiotemporal capabilities, whilst employing frequency capabilities would facilitate the discovering. A modern paper on arXiv.org proposes an end-to-end single-phase community in the time-frequency domain.

3D-CNN and 2d-CNN were being made use of to extract time and frequency capabilities appropriately. Then, they were being fused with an awareness system to acquire detecting designs. The experiments demonstrate the superiority of the suggested strategy against other state-of-the-artwork types. The feasibility of action detection using frequency capabilities was proved.

At this time, spatiotemporal capabilities are embraced by most deep discovering strategies for human action detection in films, on the other hand, they neglect the vital capabilities in frequency domain. In this work, we suggest an end-to-end community that considers the time and frequency capabilities at the same time, named TFNet. TFNet holds two branches, one particular is time department shaped of three-dimensional convolutional neural community(3D-CNN), which usually takes the image sequence as input to extract time capabilities and the other is frequency department, extracting frequency capabilities as a result of two-dimensional convolutional neural community(2d-CNN) from DCT coefficients. At last, to acquire the action designs, these two capabilities are deeply fused less than the awareness system. Experimental effects on the JHMDB51-21 and UCF101-24 datasets demonstrate that our strategy achieves remarkable general performance for frame-mAP.

Study paper: Li, C., Chen, H., Lu, J., Huang, Y., and Liu, Y., “Time and Frequency Network for Human Motion Detection in Videos”, 2021. Link: https://arxiv.org/stomach muscles/2103.04680