Text this: Image and audio caps: automated captioning of background sounds and images using deep learning