Thanks to the organizers and congratulations to the winners and to all the participants! I had a lot of fun participating in this competition. I have released my solution writeup and code.
Any feedback is welcome!
Also I hope the organizers will open the testing phase for the future, so we all can verify and specially for my team, figure out what went wrong with our submission.
SanchitPosted by: overfitting @ June 13, 2020, 2:16 a.m.
Thanks for writing such a great article about your experience in OpenKBP! It's really neat to read about all the steps you took to work your way up the leaderboard, and it's really cool that you teamed up with others with domain knowledge. I really liked your observation that decreasing the batch size, which I've always done out of necessity, acts as a good regularizer because it's almost as if the technology/memory limitations are a blessing in disguise for KBP.
Based on this write up, thinking about KBP as a 2.5D learning task sounds like a really good research direction. I think it's important for the model to "know" about neighbouring slices because the dose between adjacent slices is so dependent on each other. At the same time, using the full 3D space is probably unnecessary because the dose at very distance slices are relatively independent, so the network is likely to "learn" dependencies that don't actually exist and overfit. Does that network architecture of your 2.5D model only use 2D convolutions or is it a mix of 2D and 3D? It sounded like it was 3D volumes were collapsed into 2D images with 108 channels (i.e., 128x128x108). If that's the case, can I ask why you chose that approach rather than keeping the 128x128x9x12 volumes and using 3D convolutions?
I'll open up the testing leaderboard later today (in the next ~10 hours), and post here when I do.
AaronPosted by: OpenKBP @ June 14, 2020, 3:52 p.m.
Thanks for the kind words. I agree that 2.5D looks like the most suitable modelling method for this problem, and using full 3D data would be a waste of resources. The issues are two-fold - first, when using full 3D scans, we have only 200 samples to learn from, rather than 128x200 for 2D models. Second, the ratio of number of voxels (or pixels) in each data point to the number of parameters in the model is very high for a 3D model compared to a 2D, and in my opinion that would make learning more difficult. (The second argument may not be valid because its DL and no. of learnable parameters is already very high.)
1) Yes, my final model uses only 2D convolutions, not 3D.
2) I did try what you described, i.e. having 12x9x128x128 input and using a fully 3D network, but it had approximately the same performance and was very slow in comparison. I didn't have time to try a mixed 3D-2D model, maybe having a 3D encoder and rest 2D, but based on the performance of the fully 3D model, I don't think that would've helped either.
But again, my teammate made 3D models work somehow. His best score for a 5 fold 3D model ensemble was 2.53. So I can't say that they are ineffective, just that 2D is much easier to train.Posted by: overfitting @ June 14, 2020, 10:05 p.m.