Skip to main content

Fc2 3292343 Jun 2026

| Model | Params (M) | Top‑1 (K700) | mAP (AVA) | mAP (AudioSet) | Latency (ms) | |---|---|---|---|---|---| | SlowFast + PANNs (late fusion) [10] | 85 | 77.6 | 41.2 | 54.1 | 48 | | Cross‑modal Transformer (CMT) [7] | 92 | 78.9 | 42.5 | 55.3 | 71 | | MMT‑Dual [8] | 61 | 79.5 | 43.8 | 55.8 | 55 | | | 48 | 81.3 | 45.6 | 56.9 | 28 |

where denotes the sigmoid gate, ⊙ element‑wise product, and LN layer‑norm. The fused token f is obtained by concatenating \tildev and \tildea and passing through a linear projection back to ℝⁿ. fc2 3292343

where is cross‑entropy for the primary classification, MSE encourages similar gating patterns for correlated modalities, and Θ denotes all trainable parameters. Hyper‑parameters are set to λ_cls = 1.0 , λ_att = 0.1 , λ_reg = 5 × 10⁻⁴ . | Model | Params (M) | Top‑1 (K700)

Proceedings of the 2026 International Conference on Computer Vision & Pattern Recognition (ICCV‑2026) Hyper‑parameters are set to λ_cls = 1

[ \mathcalL = \lambda_\textcls ,\mathcalL_\textCE(y,\haty)

Common tags associated with this entry include Amateur , G-Cup , Uncensored , and Outflow (leaked content).