A political science professor from the University of New Mexico has joined a major national effort to develop transparent artificial intelligence. The $152 million project aims to create open-source AI models specifically designed to accelerate scientific research by using high-quality, verifiable data.
Sarah Dreier, an assistant professor at UNM, is a key investigator in the Open Multimodal AI Infrastructure to Accelerate Science (OMAI) project. The initiative is led by the Allen Institute for AI and seeks to address a fundamental problem with many existing AI systems: their reliance on unverified internet data, which often contains inaccuracies and biases.
Key Takeaways
- The OMAI project has secured $152 million in funding to create open and transparent AI models for scientific use.
- Funding comes from the U.S. National Science Foundation ($75 million) and Nvidia Corp. ($77 million).
- UNM professor Sarah Dreier is the sole social scientist on the research team, focusing on data curation.
- The project aims to solve the problem of AI models being trained on unreliable internet data, which hinders scientific progress.
- The new models will be fully open, allowing researchers to inspect, adapt, and build upon them, fostering collaboration and reproducibility.
A National Effort for Transparent AI
The OMAI project represents a significant investment in the future of scientific research. It is designed to provide the U.S. scientific community with a suite of advanced, open-source AI models. This work aligns with a broader national strategy to ensure the United States remains a leader in developing trustworthy and effective AI technologies.
The project is led by the Allen Institute for AI, a prominent non-profit research organization. The substantial funding is provided by two major entities: the U.S. National Science Foundation is contributing $75 million, and technology company Nvidia Corp. is providing $77 million.
Project Funding Breakdown
- Total Funding: $152 million
- National Science Foundation: $75 million
- Nvidia Corp.: $77 million
- Project Duration: Five years
Noah Smith, a senior director at the Allen Institute and a professor at the University of Washington, leads the initiative. He emphasized the project's goal of providing essential tools for researchers nationwide.
"This funding will provide critical infrastructure — advanced computing systems, open-source models, and tools — that will enable researchers across partner universities, including the University of New Mexico, to accelerate breakthroughs in fields ranging from energy to biology," Smith stated.
The Challenge of 'Dirty Data' in AI
A primary motivation for the OMAI project is the questionable quality of data used to train many popular AI models. These systems often learn from vast amounts of text and images scraped from the internet, a source filled with misinformation, falsehoods, and biases.
For scientific applications, where accuracy and reliability are non-negotiable, this presents a major obstacle. Professor Dreier explained that the engineers building these models often lack insight into the data itself.
"The engineers [who] are training these models, they don’t know what the data is," Dreier said. "They’re not reading unfathomably large amounts of text to feed into their model."
Closed vs. Open Models
Another significant issue is that many of the most powerful large language models are "closed." This means the data, code, and methods used to create them are kept private by the companies that own them. According to Smith, this secrecy creates a significant barrier to scientific advancement.
Why Open Models Matter for Science
Scientific progress relies on principles of transparency, collaboration, and reproducibility. When AI models are closed, other scientists cannot verify the results, understand the model's limitations, or adapt it for new research questions. Open models allow the entire community to inspect the underlying data and code, which builds trust and accelerates innovation.
"Open models are essential for transparency, reproducibility, and collaboration — the core of how scientific progress happens," Smith explained. The OMAI project is committed to making its models fully accessible to the research community.
A Social Scientist's Unique Role in a Tech Project
While the research team is largely composed of computer scientists and engineers, Sarah Dreier's role as a political scientist is critical. With a funding allocation of $200,000, her primary responsibility is to guide the data curation process. She will help the team think expansively about the types of high-quality data needed to make the AI useful across different scientific disciplines.
Her perspective ensures the models are not just technically proficient but also relevant for tasks that researchers in fields like sociology, political science, and other social sciences need to perform, such as analyzing research papers or generating code for statistical analysis.
"Obviously, as a social scientist, I’m going to be thinking most immediately [about] the kinds of data that could be useful to political scientists, sociologists," Dreier noted. She aims to identify datasets that would be most valuable if large language models were used to support the scientific pipeline in these fields.
Dreier previously worked as a postdoctoral research fellow in Smith's lab at the University of Washington before joining the faculty at UNM, and their collaboration has continued on tasks similar to the goals of the OMAI project.
Accelerating Scientific Discovery Across Fields
The ultimate goal of the OMAI project is to create AI that acts as a powerful tool for scientists. By training models on clean, curated, and relevant data, the team hopes to enable faster breakthroughs in numerous areas.
The five-year project will tackle two intertwined challenges: advancing the fundamental science of AI and applying that technology to accelerate discoveries in other fields of science and engineering.
Smith outlined the practical benefits for researchers.
"Our models will help scientists in other fields to be able to process and analyze vast amounts of research, generate code and visualizations, and connect new insights to past discoveries," he said. "In practice, that means faster breakthroughs in areas like materials science, protein function prediction, and energy research."
The team of principal investigators also includes researchers from other institutions, such as Hanna Hajishirzi from the University of Washington, Travis Mandel from the University of Hawaiʻi at Hilo, and Samuel Carton from the University of New Hampshire, ensuring a broad base of expertise for this ambitious undertaking.