Research Scientist (Intern): Foundation Models for Data Management & Lakehouses


IBM Research

Company Website


United States - Multiple cities


Your Role and Responsibilities

This is for a 2024 summer internship with the following start dates: May – August or June – September for quarter system schools.

We are broadly interested in making foundation models (FMs) effective for a range of data management tasks, particularly those related to the management of structured data in enterprise data lakes and lakehouses.

Topics of interest include research on effective and efficient tuning techniques, knowledge-driven reasoning, and causality-driven alignment for better control and run-time performance of FMs and their use in enterprise data tasks. Tasks of interest include semantic enrichment of structured data, semantic data management with metadata and knowledge graphs, code generation for data retrieval with transformations, and various data wangling tasks in the end-to-end data lifecycle in data lakes.

Tuning-related research spans full-space and parameter-efficient tuning techniques with supervised as well as reinforcement learning with reward functions that capture end-use performance. Grounding the generation of tuned models in domain-specific vocabulary, efficient techniques for human-in-the-loop adaptation at inference time, and retrieval augmentation techniques for data management tasks will be of interest.

For knowledge-driven reasoning, formulations and benchmarks that treat the database query-answering process as a knowledge-extraction task will be useful for experimenting with reasoning over database tables at different levels of complexity to improve and expand the reasoning skills of FMs.

For causal alignment, we’re interested in formulations that study and show the causal relationships behind the effectiveness of different prompt optimization methods, where a small set of prompt augmentation tokens improves FMs for issues like delusions, alignment, and transfer.



IBM Research Scientists are charting the future of Artificial Intelligence, creating breakthroughs in quantum computing, discovering how blockchain will reshape the enterprise, and much more. Join a team that is dedicated to applying science to some of today’s most complex challenges, whether it’s discovering a new way for doctors to help patients, teaming with environmentalists to clean up our waterways or enabling retailers to personalize customer service.


Required Technical and Professional Expertise

  •  Applicants should be PhD & MS students pursuing graduate studies in computer science and related fields
  •  Having at least one research publication, preferrably at a top conference in AI or data management
  •  Familiarity with the basics of data management and data lakes
  •  Familiarity and working expertise with large language models (LLMs) or other Foundation Models


Preferred Technical and Professional Experience

Candidates should have basic knowledge in one or more of the following skills:

  •    Familiarity with ontologies, knowledge graphs, and description logic
  •    Familiarity with reinforcement learning, causal graphical models, and prompt optimization