Whitepaper: Multi-Stage Ensemble and Feature Engineering

In this paper, we present the winning solution of KDD Cup 2015, where participants are asked to predict dropouts in a Massive Open Online Course (MOOC) platform. Our approach demonstrates best practices in feature engineering while dealing with complex real world data, and pushes forward state-of-the-art Ensemble Methods.

The first step was feature engineering. We extracted the hand-crafted and autoencoder features from raw student activity logs, course enrollment, and course material data. Then, we trained 64 classifiers with 8 different algorithms and different subsets of extracted features. Lastly, we blended predictions of classifiers with the multi-stage ensemble framework. Our final solution achieved AUC scores of 0.90918 and 0.90744 on the KDD competitions’ public and private leaderboards respectively, and put us at 1st place out of 821 teams.

Download Your
Complimentary Copy