AI Research Internships for Undergraduates

The Johns Hopkins University Center for Language and Speech Processing is hosting the Tenth Frederick Jelinek Memorial Summer Workshop this summer, and seeking outstanding members of the current junior class enrolled in US-universities to join this residential research workshop on human language technologies (HLT) from June 10 to August 2, 2024.

The internship includes a comprehensive 2-week summer school on HLT, followed by intensive research projects on select topics for 6 weeks.

The 8-week workshop provides an intense, dynamic intellectual environment.  Undergraduates work closely alongside senior researchers as part of a multi-university research team, which has been assembled for the summer to attack HLT problems of current interest.

Teams and Topics

The teams and topics being considered for 2024 are:

We hope that this stimulating and highly selective experience will encourage students to pursue graduate study in HLT and AI, as it has been doing for many years.

The summer workshop provides:

  • An opportunity to explore an exciting new area of research
  • A two-week tutorial on current speech and language technology
  • Mentoring by experienced researchers
  • Participation in project planning activities
  • Use of cloud computing services
  • A $6,000 stipend and $2,800 towards meals and incidental expenses
  • Private furnished accommodation for the duration of the workshop
  • Travel expenses to and from the workshop venue

Applications should be received by Monday, March 4th, 2024. The applicant must provide the name and contact information of a faculty nominator, who will be asked to upload a recommendation by Friday, March 15th, 2024.

Questions can be directed to the JSALT 2024 organizing committee at [email protected] 

Applicants are evaluated only on relevant skills, employment experience, past academic record, and the strength of letters of recommendation.  No limitation is placed on the undergraduate major.  Women and minorities are encouraged to apply.

 APPLY HERE

The Application Process

The application process has three stages.

  1. Completion and submission of the application form by March 4, 2024.
  2. Submitting applicant’s CV to [email protected] by March 8, 2024.
  3. Applicant’s Faculty Nominator, whose contact was provided in stage 1, will provide a recommendation letter in support of applicant’s admission to the program.  The letter is to be submitted electronically, to [email protected] by March 15, 2024.

Please note that the application will not be considered complete until it includes both the CV and the letter.

Team Descriptions

Sign Language Translation (SLT)

About the Project: In the realm of Sign Language Translation (SLT), groundbreaking advancements have emerged, creating a bridge between sign language and spoken language. This rapidly developing field spans Natural Language Processing, Machine Translation, Visual Processing, Multi-modal Data Representation, and SL Linguistics. Our project aims to harness the collective expertise of these domains to achieve a common goal.

Why SLT Matters: Sign Language Translation stands at the forefront of inclusivity, breaking barriers in communication accessibility for the Deaf and hard-of-hearing communities. It’s not merely a technological advancement; it’s a pivotal tool that bridges the gap between sign language—a rich, expressive mode of communication—and the spoken word. By enabling seamless translation between these languages, SLT empowers individuals to interact, learn, and engage universally, fostering social integration and equal opportunities. This field holds the key to creating a world where communication knows no bounds.

Leveraging Large Language Models (LLMs): We’re at the forefront of leveraging the power of Large Language Models (LLMs) in SLT. By tapping into pre-trained LLMs, we can decode translation models efficiently, utilizing their inherent linguistic knowledge. These models also play a pivotal role in guiding image/pose encoders toward robust linguistic representations of input sequences.

Why You Should Join: This workshop offers a unique opportunity for young minds to delve into a pioneering field that not only merges technology and language but also bridges communication gaps for the hearing-impaired. As a participant, you’ll collaborate with experts, engage in hands-on experimentation, and contribute to shaping the future of communication technology.

Who Should Apply: Passionate students with a keen interest in Natural Language Processing, Machine Translation, Visual Processing, Multi-modal Data Representation, SL Linguistics, or related fields. No prior experience in SLT is required—just a thirst for knowledge and a drive to innovate.

Join us in unraveling the mysteries of Sign Language Translation. Together, we’ll pioneer advancements that redefine communication paradigms and make a lasting impact on society.

Multimodal Multi-Task Audio-Visual Transformer

This groundbreaking proposal heralds the development of an unprecedented AI system geared toward universal audio processing. At its core lies an ambitious initiative—the creation of an “All-In-One audio (AIO) transformer,” leveraging cutting-edge modular transformer architecture, notably the Mixture of Experts (MoE) and Self-Supervised Learning (SSL).

Our aim is to unite visionary minds in the pursuit of a transformative AI solution. We seek collaborators passionate about pushing the boundaries of audio processing and AI. Together, we aspire to emulate the intricate complexities of the human auditory system and develop an adaptable, all-encompassing AIO transformer.

By harnessing the power of MoE and SSL, we envisage a platform capable of transcending traditional limitations, revolutionizing how we approach diverse audio tasks. This proposal invites enthusiastic researchers and experts in AI, audio processing, and transformer architectures to join forces, contributing their unique insights and expertise to sculpt an unparalleled innovation in the realm of AI-driven audio processing.

Join us in this pioneering venture to shape the future of AI-powered audio technology. Together, let’s create an AIO transformer that not only echoes the prowess of the human auditory system but redefines the possibilities of AI in audio processing.

Evaluating LLM Performance in Research Astronomy

Large language models (LLMs) are being used not only for common-knowledge information retrieval, but also for specialized disciplines such as cutting-edge astronomical research. However, in specialized domains, we lack robust, realistic, and user-oriented evaluations of LLM capabilities. Human evaluations are time-intensive, subjective, and difficult to reproduce, while automated metrics like perplexity or task benchmarks fail to reflect realistic performance. We seek to advance understanding of LLM capabilities for supporting scientific research through user-centric analysis and the development of robust evaluation standards; while we expect outputs from this workshop to be generalizable, we will focus on astronomy. Astronomy has open data and a vibrant and active community that is open to partnering in the design, experimentation, and evaluation processes. The primary goal of the workshop is to develop a quantifiable metric or objective function for evaluating LLMs in astronomy research, thereby taking humans out of the evaluation loop. A secondary goal is to understand how the evaluation criteria for a specialized use case (astronomy) compares to the evaluation criteria for typical English conversations. Our proposal will explore the first step toward a lofty goal: how can AI transform science for the better, by first evaluating what it means to be better.

AI-Curated Democratic Discourse 

We seek ways to make the social media experience more prosocial. We will develop a new user interface designed to increase the rate of substantive and constructive conversations, including conversations across political differences and conversations between like-minded strangers. Specifically, we will use generative AI to: 

  1. augment the current conversation by showing relevant, high-quality posts from other conversations;
  2. react to a user’s draft post with advice and simulated replies while they are still writing it. 

This design goes beyond the traditional threaded conversation model (Usenet, Reddit, Facebook, Twitter, Nextdoor) where disjoint conversations grow one post at a time. It situates posts in a broader curated landscape of viewpoints and supporting information. Users have varying reasons to use social media, but we conjecture that they will sometimes look at good argumentation and competing viewpoints if we make these easy and enjoyable to see. User interfaces shape user behavior. In this more diverse landscape, posters will have to raise their game. They will be challenged more often by direct responses from strangers, or indirectly by the automatic display of related posts alongside theirs. Thus it becomes harder for them to get away with lazy or specious arguments. We will help them as they write their posts, by previewing simulated reactions and offering suggestions before they submit.

Center for Language and Speech Processing