Abstract: Video-Sentence Retrieval and Grounding (VSRG) task aims to retrieve the corresponding video from a video corpus based on a single sentence query and accurately localize the temporal boundary ...