Text this: A combined multiple action recognition and summarization for surveillance video sequences