Diagnostic Questions - The NeurIPS 2020 Education Challenge Forum

Go back to competition Back to thread list Post in this thread

> Stuck

Hi all,
I implemented the Partial Variational AutoEncoder as described in the paper below:

Large-Scale Educational Question Analysis with Partial Variational Auto-encoders

However, my model gets stuck at 67% accuracy. It's like as if it was just learning to predict the majority.

Looking at the confusion matrix, the model has a strong bias to predict 1 (IsCorrect), making a lot of mistakes when IsCorrect is false.

Any tips on what I might be doing wrong and how to improve?


Posted by: carlossouza @ Aug. 6, 2020, 2:30 a.m.

I have seen in a few other educational data competitions that some complex modeling approaches do not go above baseline. This can be because of various reasons, and people have shown repeatedly that simpler models can do as well as complex in the education domain. So maybe a less complex model might help.

Posted by: nirmalpatel @ Aug. 6, 2020, 8:42 a.m.

First of all, smart move :D. I didn't thought about checkout the author's work, especially given that work is using data from the exact platform.

I suspect there might be an implementation error in your code? As the paper reported 0.734 accuracy on their testing data, and I strongly feel p-VAE can get there for task 1 (or even slightly higher?, given that train/testing in their data makes things more challenging than the competition).

With that said, I doubt using the p-VAE for task 1 will help you to win in task 1. As I don't think p-VAE is sophisticated enough for it. Don't get me wrong, the math is definitely complicated and the model is definitely good for measuring item quality. But it is not designed for task 1.

A relavant remark:
I like the novel loss the authors use to learn item quality. But I don't think human evaluator are anything close to a fair baseline. Statistical models is much better than humans in estimating item parameters. That is why there is a field called education psychology. Experts are better than average human, but is still way worse than a simple item response model. I assume the concern is that most questions are not answered by most of the student population. But I think if the authors find a small subsample of items that are answered by a good portion of the student population and compare that with some decent baseline, it will make the paper stronger.

Posted by: scott.pu.pennstate @ Aug. 6, 2020, 7:14 p.m.

Hello´╝îI got a problem.What is the input of p-VAE?
I'll be glad if you give me a little inspiration!

Posted by: candice1996 @ Oct. 5, 2020, 2:24 a.m.
Post in this thread