Hello :),
Is there any metadata file that describes what the rows and columns of the csv files in the audio_descriptors and vis_descriptors represent?
An example of a csv file in audio_descriptors folder: https://www.dropbox.com/sh/0zvqd4h941omc18/AAD9GOGmHgP2BmIA3r1RsDVCa/Dev_Set/audio_descriptors/A_Goofy_Movie.csv?dl=0
An example of a csv file in vis_descriptors folder: https://www.dropbox.com/sh/0zvqd4h941omc18/AADJtOsYUtCKA2cxfJiqu2oOa/Dev_Set/vis_descriptors/A_Goofy_Movie.csv?dl=0
Also, why are there 2 rows in the csv files of vis_descriptors?
Posted by: prantoran @ July 19, 2016, 6:35 p.m.Hi,
Visual descriptors
- HOG Gray
3x3 cell blocks of 4x4 pixel cells with 9 histogram channels
- Color moments
Mean and standard deviation for each channel of RGB space
- Local Binary Patterns
Size=13 and Radius=1 for Radial Filter
- Graylevel run length (GLRL) matrix:
Following stats computer on the graylevel run length (GLRL) matrix using zigzag scan method.
% Short Run Emphasis (SRE)
% Long Run Emphasis (LRE)
% Gray-Level Nonuniformity (GLN)
% Run Length Nonuniformity (RLN)
% Run Percentage (RP)
% Low Gray-Level Run Emphasis (LGRE)
% High Gray-Level Run Emphasis (HGRE)
% Short Run Low Gray-Level Emphasis (SRLGE)
% Short Run High Gray-Level Emphasis (SRHGE)
% Long Run Low Gray-Level Emphasis (LRLGE)
% Long Run High Gray-Level Emphasis (LRHGE)
Audio descriptors
MFCC coefficients computed using the following params:
Tw = 25; % analysis frame duration (ms)
Ts = 10; % analysis frame shift (ms)
alpha = 0.97; % preemphasis coefficient
M = 20; % number of filterbank channels
C = 12; % number of cepstral coefficients
L = 22; % cepstral sine lifter parameter
LF = 300; % lower frequency limit (Hz)
HF = 3700; % upper frequency limit (Hz)
Please let me know if you need other info.
Posted by: mriegler @ Aug. 1, 2016, 2:58 p.m.Thank you !!!! :) I wasn't able to figure anything on my own.
Posted by: prantoran @ Aug. 1, 2016, 5:30 p.m.You are very welcome!
Posted by: mriegler @ Aug. 1, 2016, 6 p.m.