CodaLab -

> the question about the data format of input and output that does not make sense

Rule： “The input tensor of your model should accept 10 subsequent video frames and have a size of [1 x 180 x 320 x 30], where the first dimension is the batch size, the second and third dimensions are the height and width of the input frames from the REDS dataset, and the last dimension is the number of channels (3 color channels x 10 frames). The size of the output tensor of your model should be [1 x 720 x 1280 x 30].”

It doesn't make sense of the limitation.

The state-of-the-art video super-resolution models take multi-frame reference images (or model state information) as input, and output is single-frame image。
Such as EDVR， It takes (a) as input: (a) the adjacent frames I{t−1,t, t+1}。
It takes (b) as output： (b) the reconstructed frame: I‘{t} 。
It does not comply with the rule。

We can input 10 frames, and get 10 output frames, but It isn't necessary to limit the input and output of tensor.

Posted by: Finn_zhang @ Jan. 29, 2021, 1:41 a.m.

Hello,
I am not quite sure I understand the issue.
If EDVR does not comply with the rule, then participants are encouraged to create models that comply with it, that is, a model that takes 10 frames as input and produces 10 frames as output.
As it is stated in the overview of the challenge: "Participants are encouraged to check out research areas of not only video super-resolution but neural architecture search, network quantization, and network pruning such as BasicVSR, RRN-L, RLSP, AdderNet, ProxylessNAS, FGNAS, DAQ, and TA+TSNet".
If this does not clarify your doubt, could you please elaborate more on it?
- Andrés

Posted by: afromero @ Jan. 29, 2021, 4:52 p.m.

Post in this thread

Forums

Mobile AI 2021 Real-Time Video Super-Resolution Challenge Forum

> the question about the data format of input and output that does not make sense