Alexander Martin

Ph.D. Student at Johns Hopkins University

prof_pic.jpg

I am a Ph.D. student at Johns Hopkins University, advised by Dr. Ben Van Durme. I am broadly interested in multimodal understanding and retrieval, especially towards advancing end-to-end content generation and reasoning. The core of my current research focuses on generating text that is grounded in retrieved documents and video. The north star goal of my research is to be able take a query from a user and render a Wikipedia-style like response with integrated multimodal information.

The core of my research focuses on generating Wikipedia-style like text that is grounded in both documents and videos. I have published on:

My keywords are: video understanding, multimodal RAG (retrieval-augmented generation), multimodal generation, grounded generation.

Before Johns Hopkins, I got my B.S. from the University of Rochester advised by Dr. Jiebo Luo and Dr. Aaron Steven White.

[Resume]

news

Feb 26, 2025 2/2 for papers at CVPR 2025!
Aug 26, 2024 Starting Ph.D. at JHU

selected publications

  1. wikivideo_teaser.png
    WikiVideo: Article Generation from Multiple Videos
    Alexander Martin, Reno Kriz, William Gantt Walden, Kate Sanders, Hannah Recknor, Eugene Yang, Francis Ferraro, and Benjamin Van Durme
    2025
  2. video_colbert_teaser.png
    Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
    Arun Reddy*Alexander Martin*, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M Melo, Benjamin Van Durme, and Rama Chellappa
    In IEEE Conference on Computer Vision and Pattern Recognition, Jun 2025
  3. MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
    Reno Kriz, Kate Sanders, David Etter, Kenton Murray, Cameron Carpenter, Kelly Van Ochten, Hannah Recknor, Jimena Guallar-Blasco, Alexander Martin, Ronald Colaianni, Nolan King, Eugene Yang, and Benjamin Van Durme
    In IEEE Conference on Computer Vision and Pattern Recognition, Jun 2025
  4. grounded_figs.png
    Grounding Partially-Described Events in Multimodal Data
    Kate Sanders, Reno Kriz, David Etter, Hannah Recknor, Alexander Martin, Cameron Carpenter, Jingyang Lin, and Benjamin Van Durme
    In Conference on Empirical Methods in Natural Language Processing, Nov 2024