Automatic scoring system is the current trend for large scale assessment. Automatic scoring system has applied widely in electronic engineering field, but still quite limited in education field. In this work, we employ behavioral signal processing (BSP)-based methodology to develop a computational framework that can automate the scoring process of pre-service school principals’ oral presentations given at the yearly training program. Using the audio-video feature extraction approach with session-level representation techniques based on bag-of-word and Fisher-vector encoding to characterize each candidate principal’s multimodal behavior during an impromptu speech examination.Two approaches have been used, first is to choose the top and bottom 20% of the total samples and further labeling them as high and low scoring oral presentations. Second approach is using the entire database. The first approach has higher agreement level. For the future study, including the lexical content, exploring more approach to handle the varying noise structures, collecting more samples and rater could be possibly increase the accuracy rate of this system. Contributions: This study construct a automatic scoring system. There is quite limited studies and discussions related to this field in Taiwan. The results of this study can be provided as preliminary studies for performance assessment for teacher certification or recruitment assessments.