Abstract: Medical image reporting focused on automatically generating the diagnostic reports from medical images has garnered growing research attention. In this task, learning cross-modal alignment ...
Abstract: Document Image Translation (DIT) aims to translate texts on document images from one language to another. It is a multi-modal task involving cooperation of text and layout. Current ...