Using Kinetics-pretrained models can make a huge difference in the video acc, because the test video all come from the Kinetics dataset. And it's hard to find out even if the code is open-sourced. Maybe it's more suitbale to only evaluate other metric including human and part box/state acc.
Posted by: fangwudi @ Sept. 18, 2021, 3:06 a.m.