LSSPT

本文最后更新于 2024年10月25日上午

Introduction

conventional deep learning-based methods

LSTM, GCN
explore group activity representations under supervised or weakly supervised modes
require manually annotated personal action, labels(数据标记?)

background

NLP:unsupervised
SSL developes
SSRL, the temporal evolution (时间演变) not yet been explicitly exploited
predictive coding scheme (预测编码方案)

group activities

more complex state dynamics
lead to failure of SSRL using RNN(复杂序列关系建模困难)
LSTM相关模型缺乏注意力机制(attention to the history sequence dependencies)
Transformer networks in NLP restricted to normal data
人类在长周期group activity中重复某种运动
exploiting multiple ranges of historical information

LSSPT

encoder-decoder framework

encoder: summarize group state
decoder: anticipate the state in the future
based on relation graph and casual Transformer

sparse graph Transformer

spatial state context in short time

casual temporal Transformer(CTT)

long range temporal dynamics

Approach

predictive coding

时空编码函数
预测函数
优化函数

Architecture

特征提取
- I3D预训练模型提取人物特征
长短状态编码
长短状态解码
推理训练
- 重构损失reconstructed loss
- 对比损失contrasitive loss
- 对抗损失adversarial loss

Long-Short State Encoder

sparse graph transformer

building

\(\{p{}^t_i\}{}^N_{i=1}\),\(p_i\in \mathit R^d\)表示第i个人的特征

\(稀疏矩阵G^t=\{V^t,E^t\}\),\(V_t=\{p{}^t_i\}{}^N_{i=1}\)表示节点,\(E_t=\{(i,j)|p_i,p_j 在n时刻连结\}\)

\(节点的邻居Nei(i,t)=\{p^t_j\}{}^M_{i=1},其中p^t_j满足(i,j)\in E^t\)

update

通过邻居节点传递的key,自身节点的equry更新节点信息,由原先的\(h_i\)变为\(\hat{h_i}\)

$ =softmax()[v_i]^N_{i=1}\ q_i表示query\ k_j表示key\ v_i表示value\ $

group state modeling

$ 小组状态g_t=P_{max}(Norm(f_o(),...,f_o())) \ P_{max}池化层 \ Norm层标准化 \ f_o全连接层 \ $

casual temporal transformer

masked Transformer
- 为绝对帧添加时间位置编码
- 多层CTT层传递,masked multihead attention, LayerNorm(层归一化),MLP(what???)
- mask保证模型只注意部分特定输入(类似于LLM中后文不会影响前文语素的注意力分配机制)

Long-Short State Decoder

state attention modules: 建立长短期之间的依赖
state update modules: 输出长短期信息

#CV

LSSPT

https://meteor041.git.io/2024/10/20/LSSPT/

作者

meteor041

发布于

2024年10月20日

许可协议

Two-Stream Inflated 3D ConvNets 上一篇

Java HashMap 下一篇