Hi guys,
Since I am a rookie in Affective Computing, I would like to know about the what type of video and audio features that can be extracted from the training videos.
Also how can I merge both the audio/video feature for a given video? Apart from this has Can Convolutional Neural Networks be used in this problem? If yes then how can we use it?
Thank you
Posted by: ayushrai @ May 18, 2016, 10:17 a.m.Dear Participant, for most of your questions the answer is "up to you" :) See some advices next:
Hi guys,
Since I am a rookie in Affective Computing, I would like to know about the what type of video and audio features that can be extracted from the training videos.
-Any state of the art features you consider relevant for this problem can be applied: face alignment and facial expression analysis, tracking, MFCC audio features, CNN features, etc... it is up to you to define the set of features you consider relevant or allow a CNN model to find and learn them (with or without a previous normalization of the image, such as face alignment...). Possibilities are huge, you should define your own one and win the rest of participants methods ;)
Also how can I merge both the audio/video feature for a given video? Apart from this has Can Convolutional Neural Networks be used in this problem? If yes then how can we use it?
-This is a huge question, the answer is yes you can/should fuse features (in an early or late fusion way) and by sure you can apply CNN here. Note that as an example you can visualize in an image audio features and synchronize them with video to be learnt by CNN. CNN has not been applied to this problem before given that these data and labels are quite novel. See an example of gesture recognition with audio-video CNN here:
http://arxiv.org/pdf/1501.00102v2.pdf
Have a nice competition
best
Alright thanks
Posted by: ayushrai @ May 19, 2016, 9:03 a.m.