Text this: A Model for the Automatic Mixing of Multiple Audio and Video Clips