Text this: Cross-modal generalization performance of FocusGate-Net.