Text this: Fine-tuning on same-model feature matching tasks. The X-FFN and Y-FFN represent the assistants of any two kinds of pre-trained different modal images in the second stage of Fig 4. The fine-tuning of the X-modal image and the fine-tuning of the Y-modal image are independent of each other.