Emotion prediction is important when interacting with computers. However, emotions are complex, difficult to assess, understand, and hard to classify. Current emotion classification strategies skip why a specific emotion was predicted, complicating the user’s understanding of affective and empathic interface behaviors. Advances in deep learning showed that convolutional networks can learn powerful time-series patterns while showing classification decisions and feature importances. We present a novel convolution-based model that classifies emotions robustly. Our model not only offers high emotion-prediction performance but also enables transparency on CHI the model decisions. Our solution thereby provides a time-aware feature interpretation of classification decisions using saliency maps. We evaluate the system on a contextual, real-world driving dataset involving twelve participants. Our model achieves a mean accuracy of 70% in 5-class emotion classification on unknown roads and outperforms in-car facial expression recognition by 14%. We conclude how emotion prediction can be improved by incorporating emotion sensing into interactive computing systems.