Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Shuyang Sun, Zhanghui Kuang, Lu Sheng, Wanli Ouyang, Wei Zhang
The University of Sydney, SenseTime Research, The Chinese University of Hong Kong
Abstract
- Novel compact motion representation method, named Optical Flow guided Feature (OFF)
- OFF can be embedded in any framework.
1. Introduction
- Temporal information is the key.
- Optical flow is useful motion representation, but inefficient.
- 3D CNN does not perform as well as Two-stream networks with optical flow.
- OFF is a new feature representation from orthogonal space of optical flow on feature level.
- Spatial gradients of feature maps in horizontal, vertical directions
- Temporal gradients
2. Related Work
- Hand-crafted features
- Deep-features
- Optical flow
- 3D CNN
- RNN
- OFF
- Well captures the motion patterns
- Complementary to other motion representations
3. Optical Flow Guided Feature : OFF
- Optical Flow
: pixel at the location
of a frame t
: spatial pixel displacement in each axes
- Apply at feature level
: mapping function for extracting features from image
: parameters in
- According to definition of optical flow
: feature level optical flow
- OFF :
- Orthogonal to feature level optical flow
and changes as it changes.
- Encodes spatial-temporal information orthogonally and complementarily to
- Orthogonal to feature level optical flow
4. Using Optical Flow Guided Feature in CNN
4.1. Network Architecture
Feature Generation Sub-network
- BN-Inception for extracting feature map
OFF Sub-network
- 1x1 convolutional layer
- Apply Sobel operator for spatial gradients
- Element-wise subtraction for temporal gradients
- Concatenate features from lower level.
Classification Sub-network
- Multiple inner-product classifiers for each features
- Classification scores are averaged
4.2. Network Training
th segment on level
:
- Classification score of
:
is average pooling for summarizing scores
- Cross-entropy loss for each level
: number of categories
: ground-truth class label
- Two-stage training
- Train feature generation sub-network first.
- Train classification sub-network with feature network frozen.
4.3. Network Testing
- Test under TSN framework
- 25 segments are sampled from RGB
th segment is treated as Frame
5. Experiments and Evaluations
5.1. Datasets and Implementation Details
- UCF-101 / HMDB-51 datasets
- 4 NVIDIA TITAN X GPUs
- Caffe & OpenMPI
- Train feature generation network by TSN method
- Train OFF sub-networks from scratch with feature generation networks frozen.
5.2. Experimental Investigations of OFF
- Efficiency
- State-of-the art among real-time methods
- Effectiveness
- Investigate the roustness of OFF when applying different inputs.
- Comparison
- 2.0%/5.7% gain compared with the baseline Two-Stream TSN
6. Conclusion
- OFF is fast(200fps) and robust.
- The result with only RGB input is comparable to Two-stream approaches.
- Complementary to other motion representations.
'Computer Science' 카테고리의 다른 글
[논문 / Action Recognition] PoTion : Pose MoTion Representation for Action Recognition (0) | 2018.09.07 |
---|---|
[논문/Action Recognition] I3D와 Kinetics dataset (2) | 2018.08.22 |
MacOS에 OpenCV 3.3.0 설치하려고 한 후기 (0) | 2018.08.22 |