Carnegie-Mellon University
universityPittsburgh, PA
Total disclosed
$123,882,735
Award count
258
Distinct programs
3
First → last award
1980 → 2031
Disclosed awards
Showing 51–75 of 258. Public data only — SR&ED tax credits are confidential and not shown.
- SoS: BIO: The Anatomy of Scientific Biomedical Open-Source Software--From Code to Communities$241,506
NIH Research Projects · FY 2025 · 2025-09
This project addresses critical gaps in understanding biomedical open-source software (OSS) infrastructure that underpins modern health research. The broad objective is to systematically characterize, model, and evaluate the sustainability of biomedical OSS to optimize research productivity and ensure reliable scientific outcomes that advance NIH's mission of improving human health. Open-source software has become essential infrastructure for biomedical research, yet the origins, sustainability, and impact of scientific biomedical OSS remain poorly understood. Unlike generic OSS, scientific software requires domain expertise and operates within unique constraints of academic funding cycles and publication incentives. The research addresses three specific aims: (1) identify and characterize biomedical OSS infrastructure through computational census of software origins, evolution, and usage patterns; (2) model the health and sustainability of biomedical OSS by analyzing maintenance practices, team dynamics, and abandonment risk factors; and (3) quantify the relationship between software maintenance status and scientific outcomes, including research productivity, reproducibility, and resource efficiency. The study employs a mixed-methods approach using three integrated datasets. Popular biomedical OSS packages will be identified from the Chan Zuckerberg Initiative's dataset of 15 million software citations. Repository contribution histories will be analyzed using World of Code, which aggregates metadata from OSS repositories worldwide. Contributors will be linked to academic institutions through OpenAlex data and NSF/NIH funding records.
- Temporal dynamics and network mechanisms of articulatory feature encoding during speech production.$54,538
NIH Research Projects · FY 2025 · 2025-09
Project Summary The overall goal of this fellowship is to provide excellent training in representational analysis, intracranial neurophysiology, and network level dynamics by exploring articulatory feature encoding in ventral sensorimotor cortex (vSMC). The vSMC is the primary cortical region responsible for controlling the precise articulatory movements required for fluent speech. Despite its well-established role in speech production, it is not clear when it is engaged relative to earlier stages of processing in speech (lexical access) and overt response execution, and how its motor plans are shaped by upstream language areas. This project addresses two core hypotheses regarding the timing and input dynamics of articulatory feature encoding in vSMC. In Aim 1, we test when articulatory feature encoding, or the word-specific motor plan for production, emerges in vSMC. Using stereo-EEG (SEEG) recordings in a delayed naming paradigm, patients will prepare to name pictures but will only speak after a variable delay (0, 400, or 800 ms). This temporally dissociates `early' lexical access from `late' motor execution, allowing us to determine whether articulatory representations in vSMC are time-locked to stimulus presentation or the initiation of speech. If encoding is linked to lexical acesss, representational structure should appear at a consistent post-stimulus time across delays. If it is tied to the go-cue, we expect a [delay × onset time] interaction in stimulus-locked analyses, and consistent timing in go-cue-locked analyses. These results will clarify whether articulatory plans in vSMC are accessed automatically with lexical access or gated by response initiation. In Aim 2, we investigate how upstream language regions—specifically the posterior middle temporal gyrus (pMTG), angular gyrus, inferior frontal gyrus (IFG), and supplementary motor area (SMA)—contribute functional input to vSMC. Using event-related causality and high-definition fiber tractography (HDFT) to identify structurally connected contacts, we will test whether input from pMTG and angular gyrus is time-locked to lexical access (stimulus onset), while input from IFG and SMA is time-locked to speech initiation (go-cue). This predicts a [delay × onset time] interaction in stimulus-locked analyses for IFG and SMA, but not pMTG and angular gyrus. Secondary analyses will examine whether psycholinguistic complexity (e.g., word frequency, phoneme length) modulate the strength of causal input from upstream regions, providing insight into whether these inputs are modulated by content-specific information. By advancing our understanding of how linguistic units are mapped onto motor plans, this work has the potential to constrain language processing models, to improve functional neurosurgical mapping, and inform the development of brain-computer interfaces for communication in patients with primary cortical aphasias. Through advanced training in analytic and clinical methods under expert mentorship, this project is well placed to prepare me for the next steps of my training, and sets an important foundation for a career as a clinician- scientist at the intersection of speech neuroscience and translational care.
NSF Awards · FY 2025 · 2025-09
With the support of the Chemical Catalysis Program in the Division of Chemistry, Professor Isaac Garcia-Bosch at Carnegie Mellon University is studying metal complexes bearing redox-active ligands (electron-coupled proton buffers, ECPBs) that perform multi-electron multi-proton transformations. The goal of this research is to use these new complexes to perform challenging reactions in a selective fashion and under mild conditions. This project will lead to a better understanding of ubiquitous reactions in biological and industrial processes, and contribute to the design of new catalysts based on cheap metals such as copper or iron. During the execution of this project, several graduate and undergraduate students at Carnegie Mellon University will be trained in the synthesis and characterization of metal complexes, as well as study reaction mechanisms. The project also includes scientific collaborations with renowned experts in the field of spectroscopy and computational chemistry. Outreach activities in elementary schools and high-schools in the Pittsburgh area will be carried out, benefitting K-12 students with limited access to science. With the support of the Chemical Catalysis Program in the Division of Chemistry, Professor Isaac Garcia-Bosch at Carnegie Mellon University is studying fundamental questions regarding the geometry, electronic structure, spectroscopy, and reactivity of ECPBs based on 3d metals and redox-active ligands. Current project activities include 1) synthesis and characterization of ECPBs in different protonation/oxidation states; 2) study of the thermochemistry of ECPB reactivity; and 3) the use of ECPBs to promote substrate oxidation (using O2 and H2O2 as oxidants) and substrate reduction (using H2 and N2H4 as reductants). The results of these activities will provide fundamental knowledge on metal-ligand cooperativity, leading to the development of efficient, practical, and selective catalysts based on 1st row transition metals. The work will be presented at research conferences and reported in scientific journals. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
The cascading societal, economic, and ecological consequences of extreme weather across the Northeast highlight the urgent need for resilient urban planning. Nature-based solutions (NBS), including scalable green infrastructure (GI) interventions like meadows, woodlands, rain gardens, and bioswales, offer a promising pathway to mitigate urban heat, manage stormwater, improve water quality, stabilize slopes, and enhance ecosystem services and resilience. Allegheny County, home to Pittsburgh, PA, over 1.2 million people, and multiple municipalities is a mixture of urban and suburban communities that can consider different types of GI. For a GI initiative to get traction at scale, the solutions must recognize and address the needs, concerns, priorities, and perceived barriers of communities in which the work will be done. The project will apply real-time methods to improve GI implementation, maintenance, and monitoring practices. This Northeast region project will convene a broad group of stakeholders with expertise in hydrological systems, stormwater engineering, community engagement, land management, and education to develop an Allegheny County-wide green infrastructure (GI) adoption and installation plan that will address stormwater runoff, water quality, urban heat and carbon sequestration. The goals of the project are to prioritize and monitor GI installations, engage and support municipalities and advance GI education and outreach. These goals are to build toward a longer-term goal of developing a comprehensive county-wide GI implementation and adoption plan that aligns GI solutions, municipal priorities and education strategies. The project will create a thorough assessment of municipal staff policy and management needs to create a municipal GI implementation guide, as well as a communications framework for municipalities to share the value and benefits of GI with their constituents. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
- RAPID: Flight Planning and Immediate Post-Campaign Data Analysis for the HALO-South Field Campaign$128,741
NSF Awards · FY 2025 · 2025-09
This RAPID project supports the participation of US scientists in the international HALO-south field campaign to measure clouds and aerosols over the Southern Ocean in austral spring. The HALO-south mission is a large collaborative project with partners in both Germany and New Zealand. The Southern Ocean is one of the cloudiest areas on Earth and is key for hemispheric constraints on Earth’s radiation budget and global cloud feedbacks. Currently, atmospheric models poorly represent both aerosols and clouds in this region. This effort will lead to a better understanding of the sources and sinks of cloud-nucleating aerosols over the Southern Ocean. This project will make a substantial contribution to guiding flight planning and mission science during the HALO-south field campaign. Aerosol number concentrations will be forecast, with a focus on predicting new particle formation events and aerosol-cloud interactions. Data collected during the field campaign will be analyzed to better understand aerosol sources and sinks in the Southern Ocean and why most general circulation models fail to accurately predict Southern Ocean cloud concentration nuclei and cloud droplet concentrations. Several graduate students will participate in the project. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
- Probability flows for high-dimensional problems and applications to sampling and generative modeling$300,000
NSF Awards · FY 2025 · 2025-09
Machine learning, and more generally artificial intelligence (AI), is having a transformational impact on science, technology, and our daily lives. Approaches based on deep neural networks have achieved remarkable success in a variety of applications, such as the impressive performance on image, video, and text generation tasks when combined with diffusion models. Despite this, the present architectures lack reliability and performance guarantees. The goal of this project is to develop mathematically sound approaches to some of the relevant sampling tasks in high dimensions, which is typical in the AI setting and where classical computational approaches are unfeasible. The project is motivated by two tasks: one is the sampling of measures given by their density, which is the typical setting in many applications in physical sciences and Bayesian inference, and the other is creating new samples based on a family of examples, namely sampling in the generative setting. The goal is to create efficient approaches with rigorous performance guarantees. Graduate and undergraduate students will be involved in the research of this project, training a new generation of mathematicians who both have knowledge of modern techniques of applied analysis and are cognizant of important questions arising in data science and artificial intelligence. The project will investigate flow-based approaches to sampling that take the view of variational inference. In contrast to popular approaches to sampling where one generates individual samples (e.g. Hamiltonian Monte Carlo sampling), the goal is to flow an initial measure, parameterized in a tractable way, towards the desired target measure. The project will also investigate flows in new geometries on the space of measures, namely the Radon-Wasserstein geometry and its modifications. The aim is to show that the gradient flows of the Kullback-Leibler divergence converge towards the desired target measure and can be approximated well and efficiently by interacting particles in high dimensions. For generative sampling, two approaches will be studied. One is based on modifications of denoising diffusion models that ensure that the reverse flow converges to a close approximation of the true target measure rather than the training samples, thus resolving the memorization issue. The other is based on flows of particle configurations with respect to maximal mean discrepancy (MMD) which can be approximated well in high dimensions by particles. The project will investigate fundamental questions regarding the MMD based models and how accurately both approaches recover the target measure. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Living systems, from cancer cells to microbial colonies and human populations, are inherently spatial, meaning that individuals interact within complex networks of relationships. However, current theory attempting to understand and predict the future evolution of these systems largely overlooks this spatial complexity. This project will build new, more realistic frameworks to understand how the intricate "shape" of biological populations, such as their patterns of interaction and reproduction, influences their future evolution. This includes studying how quickly mutations spread, how pathogenic populations expand and how organisms adapt to new treatments and environments. By combining cutting-edge mathematics, computational tools, and data from real biological systems, the project will create open-source software and visualization tools to help researchers interpret emerging, highly detailed spatial datasets. These tools will provide a foundation for studying evolutionary dynamics in spatially structured populations across diverse areas, including cancer progression, aging and microbial ecosystems. The project also emphasizes education and public engagement, incorporating interactive visualizations and undergraduate and graduate research experiences to make complex evolutionary concepts accessible and inspiring to a broad audience and train the future US scientific and industrial workforce. At a technical level, the project develops novel mathematical and computational models to study evolution in populations with complex spatial topologies, going beyond traditional assumptions of symmetry and homogeneity in the arrangement of individuals in a population. Key research objectives include modeling and understanding the role of heterogeneous spatial structure in shaping the competition between beneficial mutations, how populations cross rugged fitness landscapes, and understanding its effects on the genetic hitchhiking of neutral mutations on the background of advantageous ones. The project introduces new metrics to quantify how the topological properties of population structure amplify or suppress natural selection and affect the speed of evolutionary change. These models and topological metrics will then be used to develop inference tools for estimating the evolutionary forces shaping observed genetic data. A major innovation of the project is the creation of algorithms and tools to compare, compress, and visualize spatial population structures, providing both theoretical insights and practical applications for interpreting evolutionary processes in real-world biological systems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
- EAGER: Extinct Does Not Imply Unfit: Paleobiology, Defossilization, and New Sources for Novel Robots$271,912
NSF Awards · FY 2025 · 2025-09
This EArly-concept Grant for Exploratory Research (EAGER) award supports research to extend the sources of bio-inspired robotics beyond extant species and into the fossil record. Present-day challenges in robot locomotion may be solved by unique strategies evolved by animals that subsequently became extinct for unrelated reasons. These strategies may be revealed through careful study of surviving skeletons, including computer simulation of likely muscle and tendon arrangements, a process sometimes referred to as “defossilization.” The specific goal of this collaboration between a roboticist and a paleobiologist is to demonstrate the potential for transformative robot engineering through reconstructing backbone and limb anatomy from a variety of four-legged animals that lived over 235 million years ago. New robots created based on the results look to highlight how body shape, joint configurations, and movement patterns may be customized to robustly traverse uneven ground and confined spaces, allowing freedom of movement through dense vegetation areas or in industrial clutter. The project will also develop a "Dinosaurs and Robots" educational module suitable for afterschool programs. Ninety-nine percent of all species that have ever lived are now extinct. Many of these species had unique morphologies not seen today, and their extinction was typically unrelated to their functional fitness. This project seeks to demonstrate the value of these lost adaptations through a study of spine and limb features in a group of Triassic and Permian species, including Massetognathus, Lycaenops, and Orobates. These species differ notably from the configurations seen in living animals, namely pitch spines with upright legs that dominate in mammals, or yaw spines with sprawled legs that prevail among modern reptiles. Feasible force space analysis will be applied to reveal how these animals may have circumvented small obstacles and navigated narrow passages and to identify relationships and trade-offs in environmental conditions, anatomical morphology, and musculature performance. The project looks to incorporate these insights into new robot designs, which will be tested in both simulation and hardware experiments, to demonstrate performance advantages over conventional robots. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
This Pathways to Enable Open-Source Ecosystems (POSE) project centers on an ecosystem for universal and accessible generative artificial intelligence AI (genAI) deployment. As the utilization of genAI techniques becomes increasingly prevalent across various sectors, such as government, enterprise, and personal applications, the need for a variety of adaptable deployments capabilities grows. This project develops a foundational software infrastructure that supports genAI applications across a range of environments, from cloud platforms and personal laptops to mobile devices and web browsers. The ecosystem enhances the deployment flexibility, enabling users to make informed decisions regarding cost, accessibility, and data privacy. By providing unified solutions that streamlines genAI deployments, this ecosystem empowers users to explore the full potential of genAI. This Pathways to Enable Open-Source Ecosystems (POSE) project builds on foundational technical components of machine learning (ML) compilers and distributed, portable ML runtimes that span both cloud and edge environments. The project integrates cloud-edge development flows, sets up formal governance processes, builds reusable community infrastructures for continuous quality and efficiency monitoring, and develops comprehensive documentation. By supporting more informed tradeoffs between privacy and accuracy across different deployment scenarios, the ecosystem empowers research communities to better adapt generative AI techniques to their specific needs. The ecosystem also brings shared infrastructure to help significantly reduce the runtime and optional costs of developing generative AI solutions, especially for new emerging platforms. In doing so, the open-source ecosystem will significantly improve the accessibility of customized generative AI solutions. Such efficiency and accessibility improvement enable researchers and practitioners to share and leverage a broader spectrum of customized models, opening up new avenues for collaborative research. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
This project introduces a new way to develop speech-to-text systems for language varieties in which little digital data is available, especially when a related variety already has plenty of digital data. Languages differ from one another, but they also show a great deal of variation within themselves. People in different regions or countries often pronounce words in the same language differently. For example, many people in the southern US pronounce the “i” in words like “ride” as an “a” sound (similar to the one found in “rad”), while most other Americans do not. Most speech-to-text systems—which turn spoken words into written text—focus on just one variety of each language. However, in everyday life, many people speak other varieties. Existing speech-to-text systems often do not work well for these varieties. Improving speech recognition for low-resource varieties represents a business opportunity, one that can help more Americans access voice-powered tools and services. This project takes one step towards this goal. It innovates by leveraging the fact that the differences between various varieties of the same language often follow predictable patterns. For example, since the pronunciations of words in different regions change following rules that apply to the whole vocabulary, one can often predict how a word will be pronounced in one variety if one knows how it is pronounced in another. The project will develop a powerful AI model (POWSM) that can transcribe pronunciation (using a universal system developed by linguists to represent sounds). This approach enables the development of speech-to-text systems for previously unsupported language variants, even when little recorded training data exists for them. It works by learning both the similarities and the systematic differences between well-resourced varieties and others. The project builds an encoder-decoder foundation speech model called POWSM (Phonetic Open Whisper-style Speech Model), which is trained to recognize speech as sequences of phones (consonants and vowels) in any language. The project will include three applications using this model: 1) Prompting POWSM with vector representations of the systematic sound correspondences between a low-resource variety (LRV) and a high-resource variety (HRV), enabling the model to recognize the LRV as a transformed variant of the HRV. 2) Constructing stochastic weighted finite-state transducers that can generate synthetic LRV data based on linguist-curated knowledge about sound changes in the HRV and LRV. Using this synthetic data to train a language model that can be used to decode LRV output from POWSM. 3) Learning phonetic correspondences between language varieties automatically from transcribed audio using a novel form of unsupervised bilingual lexicon induction (UBLI) that leverages both text and audio. Audio aligned in this way can be used to train basic speech-to-speech translation models that, when used in conjunction with POWSM, enable speech technologies for LRVs without requiring linguistic annotation beyond transcription. The proposed methods are intended to cover most HRV-LRV scenarios. They will be evaluated on major languages including varieties in English, Italian, Chinese, German, Arabic, German, and Dutch, English, as well as endangered languages like Nahuatl and Mixtec. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Modern data analysis and statistical learning are characterized by two defining features: complex data structures and black-box algorithms. The complexity of data structures arises from advanced data collection technologies and data-sharing infrastructures, such as imaging, remote sensing, wearable devices, and genomic sequencing. In parallel, black-box algorithms—particularly those stemming from advances in deep neural networks—have demonstrated remarkable success on modern datasets. This confluence of complex data and opaque models introduces new challenges for uncertainty quantification and statistical inference, a problem we refer to as ``black-box inference''. This research project aims to develop flexible, valid inference procedures for modern complex data that harness the strengths of black-box machine learning algorithms. These contributions have potential applications in areas such as policy evaluation, model selection, treatment effect identification, and algorithmic fairness auditing. A central focus of the project is the development of novel variants of a classical statistical tool: cross-validation, repurposed to enable adaptive inference in conjunction with powerful black-box models. Although cross-validation is widely used for evaluating estimator performance, its theoretical foundations remain limited, particularly in the context of complex data and modern algorithms. This research will begin with a multi-population comparison problem, using a stabilized cross-validation framework, and will then investigate performance guarantees of cross-validation in more general settings. The project will also develop new methods for adaptive population comparisons in high-dimensional and nonparametric regimes. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Modern AI & robotics systems, such as general-purpose humanoid robots, must make intelligent decisions in real time, in unstructured environments. At the heart of these capabilities lies a fundamental problem: How can we compute optimal actions when such robotic systems are governed by highly complex dynamics (nonlinear, even discontinuous)? Traditional optimal control tools, while powerful, often fall short—they can be slow, unreliable, or rely on overly simplified assumptions about the real physics. This project aims to advance a promising category of methods known as Sampling-Based Optimal Control, which has recently gained popularity due to its flexibility (can easily handle complex systems), scalability (can leverage massively parallel computation on GPUs), and empirical success in solving complex robotic planning and control problems. However, these methods currently lack a solid theoretical foundation and systematic design principles. This project will fill that gap by developing new mathematical frameworks to understand, analyze, and improve these method—eventually leading to more reliable, efficient, and intelligent decision-making methods for real-world robotic and autonomous systems. By bridging theory and practice, this work supports NSF’s mission to promote transformative research in AI & robotics, and has the potential to impact a wide range of fields where autonomous systems must operate reliably and effectively in the real world. Optimal control for complex nonlinear and contact‑rich systems is notoriously nonconvex and often discontinuous, rendering classical optimization tools computationally burdensome and prone to suboptimal local minima. Sampling‑Based Optimal Control (SBOC) has emerged as an attractive alternative because its stochastic trajectory rollouts handle severe nonlinearity, discontinuities, and intricate cost landscapes while exploiting GPU‑based parallelism; it now underpins path planning, robotic control, and model‑based RL. Yet SBOC remains largely empirical: there is a lack of principled guidance on parameter tuning, sampling schedules, and, crucially, theoretical guarantees of convergence or sub‑optimality. This project closes these gaps through three coordinated thrusts. Thrust 1 develops asymptotic and finite‑time convergence theories that characterize when SBOC attains global optima and how performance scales with algorithmic and system parameters. Thrust 2 translates these insights into new, more efficient algorithms via automated hyper‑parameter optimization and a diffusion‑style annealing strategy that accelerates exploration while preserving convergence. Thrust 3 couples the resulting controllers with learned value functions, nominal policies, and dynamics models, yielding a powerful subroutine for model‑based reinforcement learning that unites the strengths of control and learning. The framework will be validated on demanding robotic benchmarks, including humanoid whole‑body locomotion and dexterous manipulation through extensive real hardware experiments. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
The final stages in the evolution of the most massive stars play a crucial role in astrophysics, injecting energy and enriched metals into the interstellar medium, and producing compact objects – neutron stars, black holes, and the wide range of transient phenomena associated with them. The predictive power of theoretical models for massive stars, however, is severely limited by the large uncertainties associated with this regime of stellar evolution. The project will constrain some of the most uncertain processes in massive stellar evolution, such as the efficiency of mass loss and the impact of supernova kicks. This project will develop new tools that can maximize the science extracted from the wealth of data on resolved stellar populations and object catalogs that will be available in the coming decade with the advent of facilities, like the Vera C. Rubin Observatory. The project will also broaden the impact of the research through a mixture of new and well-established outreach projects designed to foster engagement by educators and students in science. The spatially resolved stellar populations in Local Group galaxies will be used to place unprecedented constraints on the evolution of the most massive stars. To this end, the investigators will precisely measure the formation efficiencies and evolutionary timescales (i.e., the delay time distributions, or DTDs), for the main outcomes of massive stellar evolution: Wolf-Rayet and Intermediate Mass Stripped Stars, Blue, Yellow and Red Supergiants, X-ray Binaries, and Supernova Remnants. The investigators will model each of these DTDs using the state-of-the-art population synthesis code COSMIC. A systematic comparison between measured DTDs and COSMIC predictions will therefore constrain some of the most uncertain processes in massive stellar evolution, such as the efficiency of mass loss, the effects of the common envelope phase, and the impact of supernova kicks. These constraints will provide a new level of detail in our ability to trace and quantify the energetics and enrichment from the progenitors of many astrophysical transients and gravitational wave sources. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Neural and generative artificial intelligence (AI) is opening up vast new opportunities for mathematical reasoning. This project introduces novel approaches for developing AI for mathematics, combining neural networks with symbolic, logic-based methods. The project goals aim to achieve a deeper understanding of how AI can learn and reason, as well as how to best combine symbolic and neural approaches. Additionally, the project involves the development of new practical tools that mathematicians can use to formalize mathematical definitions and proofs, enabling proofs to be represented in a digital format so that they can be processed and verified by a computer. The methods to be developed in this project are intended to make the digitization of mathematics easier and more accessible. In turn, this is intended to lead to new paradigms for using, teaching, and learning mathematics, as well as a deeper understanding of mathematics itself. The project is designed to be translational through collaborations with Lean developers as well as with industry. Educational and outreach efforts include developing tutorials on this topic, training students who will contribute to this research, and integrating the proposed research into existing courses. Open-source course material and code will be made available. The project aims to achieve these goals by pursuing three thematic lines of research. First, the investigators will develop novel techniques that synergistically combine the features of machine learning and symbolic AI. This includes using machine learning for tasks that symbolic methods are unable to achieve and using symbolic methods to produce data and signals that can be used to train neural systems effectively. Second, the investigators will develop new ways of making mathematical understanding explicit with the goal of enabling AI to understand mathematical proofs as well as help humans understand proofs generated by AI. The team will develop novel methods for extracting and learning from symbolic information, inferring the informal reasoning underlying a formal proof, and working with new definitions and lemmas. Finally, the investigators will explore mechanisms for training machine learning systems to carry out focused tasks requiring specific mathematical expertise. The mathematical focus will center on proving inequalities. Novel methods will be designed to learn on their own by exploring a space of actions and consequences. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Statistical experiments can improve our understanding of complex systems, including the communities, organizations, and local economies that make up our society. In these systems, individuals may be connected by mechanisms such as social influence, economic competition, sharing of information, or the transmission of disease. The presence of such mechanisms is known as interference. Interference greatly complicates statistical analysis. It weakens the conclusions that may be drawn from an experiment, and requires the usage of assumptions whose correctness may be difficult to judge. As a result, statistical conclusions drawn under interference can have considerable caveats or limitations. This project will study interference and how it can be more safely modeled. Doing so can help researchers think more clearly about their experimental results when interference is present. It can also help researchers make fewer assumptions when they interpret their data. This research will help investigators in fields like economics, which influence daily lives of people in society. The project will extend a promising approach for detecting and describing interference, so that it may be applied to a broader variety of settings with no assumptions beyond what is known about the design of an experiment. It will also develop a new semiparametric approach for modeling interference, which can help researchers to draw credible statistical conclusions even when interference is strongly measurable over long distances. These conclusions will include test statistics with improved standard errors, and confidence intervals for numbers of individuals violating assumptions about interference. As part of this project, freely available software will be released to help researchers use the methods that are developed. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Frontier AI models have pushed the boundaries of machine learning and artificial intelligence research and sparked transformative technological innovation in many US industries. These large-scale AI models are able to process and generate text, image, audio, and video, and currently require massive amounts of data and computing. This Mathematical Foundations of Artificial Intelligence (MFAI) project aims to uncover the mathematical principles that explain when and why these highly advanced AI models are so effective, and to overcome the fundamental limits of brute-force scale presently employed to surpass human expert intelligence in benchmarks. The project will advance the capabilities of AI models to conduct inference in new situations in which there is no training data, and to perform complex reasoning and problem-solving tasks. This research will ensure that the US remains the global leader in AI, advancing economic prosperity, national security, and global competitiveness. This project aims to rigorously characterize the mathematical frontiers of generative AI models, including state-of-the-art large language models (LLMs), by developing new theoretical frameworks and modeling principles rooted in machine learning, probability theory, variational analysis, mathematical statistics, and information theory. The research will investigate how frontier AI models achieve remarkable performance despite fundamental theoretical barriers and will identify the key mathematical quantities that drive their generalization abilities. The project will develop new mathematical analyses of diffusion-based generative AI models, design novel data strategies for AI models used towards zero-shot inference, and discover scaling laws enabling models to achieve compute-optimal accuracy tradeoffs for inference and generation. This award is jointly funded by the Directorate for Mathematics and Physical Sciences, Division Of Mathematical Sciences; Directorate for Engineering, Division of Civil, Mechanical, & Manufacturing Innovation, and Directorate for Computer & Information Science & Engineering, Division of Computing and Communication Foundations and Division of Information & Intelligent Systems. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Hyperspectral remote sensing is a data gathering technique that uses advanced sensors - attached to satellites, drones or other devices - to measure the reflection of light off of the Earth's surface. These data can be used to analyze the shape and makeup of a landscape, and can be used to make inferences about the underlying mineral patterns, tectonics, and magmatic processes of a scanned region. This project will develop a new artificial intelligence (AI) framework to analyze hyperspectral data to test critical hypotheses about the formation of ore deposits. By improving the effectiveness of hyperspectral mineral mapping, this project will accelerate the identification of critical mineral resources to improve the nation's economic competitiveness and security. The project will also help develop a modern workforce by training graduate students at the intersection of geosciences and AI. Outreach through workshops, mentorship opportunities, undergraduate internships, and participation of community college students will further broaden the impact. By demonstrating the power of integrating AI with domain expertise in geosciences, this work will serve as a model for interdisciplinary collaboration that can be applied to other disciplines facing similar data-intensive challenges. The proposed research introduces significant innovations at the intersection of AI and geosciences. First, a novel encoder-decoder architecture will be developed for decomposing hyperspectral data into physically meaningful latent structures, enabling efficient compression while preserving the nonlinear spectral relationships essential for accurate mineral identification. Second, a new hierarchical spectral alignment approach will coherently integrate multi-resolution hyperspectral data while systematically quantifying uncertainties inherent in real-world data. Third, the AI models will be aligned with geological principles to support geoscience. Together, these innovations will yield a unified framework that improves the accuracy, efficiency, and scientific rigor of hyperspectral data analysis. This research will simultaneously test geoscience hypotheses about the spatial relationships between surface mineral assemblages and the underlying tectonic and magmatic processes, enabling quantitative analysis of the geological parameters which coincide with ore deposit formation, regardless of their age or location. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NIH Research Projects · FY 2025 · 2025-09
Project Summary String similarity is one of the foundational problems in computational biology – it is what allows piecing together a genome from short sequenced read substrings, determining that human genomes share a more recent common ancestor with chimpanzee than they do with octopus, and detecting the horizontal transfer of genetic material across bacterial species, one of our current application focuses. Exact sequence matches between genomes is a long-solved problem, but in biology, due to the prevalence of both mutation and sequencing error, we care more about approximate matches, which are much harder to find and characterize. One of the major recent advances in bioinformatics has been the advent of increasingly sophisticated string transformations (sketching, k-mer-ization, alphabet reductions) that change the distribution of exact matches on the transformed strings, allowing for the development of faster software. My nascent research lab has been one of the pioneers in both developing rigorous mathematical theory to understand sequence transformations and in engineering software that turns that theory into usable bioinformatics software. Relevantly, in prior work, we gave the first rigorous proofs that k-mer sketching works with alignment [Shaw & Yu, Genome Research, 2023] successfully translated our theoretical understanding of k-mer sketching theory into a new metagenomics software “skani” [Shaw & Yu, Nature Methods, 2023] for computing pairwise average nucleotide identity (ANI), a standard measure of genome similarity. Skani is both more accurate and orders of magnitude (20x) faster than the state-of-the-art. Building on skani, we further produced skandiver [Zhang et al., Bioinformatics, 2024], a tool for detecting large intercellular mobile genetic elements by comparing all the chunks of a sequence against all the whole genomes in a database, without needing a reference database of mobile genetic elements specifically. Over the next five years, my research program will continue to straddle the line between advancing string algorithm theory and using that to build bioinformatics tools. On the theory side, we want to combine ideas from k-mer sketching theory with alphabet reductions to expand the utility of sketching for more dissimilar sequences (such as found in protein databases). On the applied side, we are going to push forward string similarity tools for microbial analysis, focusing on better characterizing mobile genetic elements – our proof-of-concept skandiver only does long-all-to-all-sequence similarity, and its design principles don’t work for intra-species MGEs, small MGEs, or even annotate the boundaries of the MGEs it does find.
NSF Awards · FY 2025 · 2025-09
The Institute for Computer-Aided Reasoning in Mathematics (ICARM) is a national institute dedicated to catalyzing fundamental advances in mathematics by harnessing the ongoing revolution in AI and computer-assisted reasoning. Its mission is to empower mathematicians by providing them with the tools and expertise to effectively integrate artificial intelligence, machine learning, formal methods, and automated reasoning into their research. Mathematics is integral to scientific and technological achievement, underpinning advances across critical areas such as quantum computing, cybersecurity, data science, computational modeling, and engineering. The institute serves as a crucial resource for leveraging recent breakthroughs in artificial intelligence to accelerate progress in mathematics. ICARM emphasizes interdisciplinary collaboration among mathematicians, computer scientists, and students. The institute helps to train the next generation of researchers in computational methodologies, preparing them for a broad range of scientific careers, and actively expanding the research community in mathematics. ICARM provides specialized technical expertise to support mathematicians in adopting and utilizing advanced tools for computer-aided reasoning to power mathematical research. The institute provides direct support for automated reasoning, formal verification, machine learning, and AI, significantly enhancing mathematical research capabilities. ICARM organizes targeted events, including workshops, collaborative research visits, and intensive summer training programs, to disseminate these skills and promote the integration of these advanced techniques into mathematical practice. These collective efforts foster new mathematical insights, stimulate interdisciplinary collaboration, and ensure robust, hands-on support for advancing research in mathematics through cutting-edge machine-driven reasoning. This award by the Division of Mathematical Sciences is also supported by the Office of Advanced Cyberinfrastructure and the Division of Computing and Communication Foundations in the NSF Directorate for Computer and Information Science and Engineering and the Office of Strategic Initiatives in the NSF Directorate for Mathematical and Physical Sciences. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-09
Although gravitational wave (GW) instruments have been detecting the mergers of two compact objects (either black holes or neutron stars) for nearly a decade, there is still uncertainty about how these binary systems form and develop over time. A research collaboration between Carnegie Mellon University (CMU) and the University of Arizona (UA) will investigate the formation of merging double compact objects by combining state-of-the-art population synthesis tools, used to model large populations of stellar objects, with detailed modelling of binary system development. The project will also support science teacher training programs at both universities: the Physics Teacher Program to connect high school physics teachers with CMU researchers, and the UA University Borderlands Education Center to create workshops that empower high school teachers to use research products in their classrooms. The use of binary population synthesis and detailed binary development modeling has been widely applied to understanding how isolated binary star populations can produce merging double compact objects. However, the assumptions usually made in population synthesis are unable to resolve the effects of the interior structural development of each stellar component in a given binary. This project will unite these previously disparate efforts through a new technique, BackPop, which simulates joint posterior distributions for uncertain binary interaction parameters that reproduce the observed properties of individual binary systems. These joint distributions can then be used to initialize detailed binary development models, which capture the effects of binary mass exchange on the interior structure of each star, thus testing the interaction parameters. The research will focus on three key populations: the binary black holes that make up the global merger rate maximum, the asymmetric mass ratio mergers that are treated as outliers in GW population analysis, and finally, the remaining population consisting of more massive black holes. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-08
As advancements in connected technology and AI produce new threats to national security, the federal government needs proven cybersecurity engineers to protect its people, systems and interests. This project builds upon a 24-year effort at Carnegie Mellon University (CMU) to strengthen the pipeline of federal cybersecurity professionals while increasing cross-disciplinary collaboration to meet the nation's evolving security needs in the era of AI. To address the unfulfilled demand for skilled security professionals in federal service, this project provides graduate students at CMU’s Information Networking Institute with full scholarships to support their successful pursuit of a master’s degree in information security or artificial intelligence and information security. The foundation of this project is the MS in Information Security (MSIS), CMU’s flagship cybersecurity degree founded in 2003, which is the cornerstone of the university’s designation as a National Center for Academic Excellence in Cybersecurity. Coursework that pairs a deeply technical curriculum in cybersecurity and AI engineering with interdisciplinary skills in policy, ethics, business and research results in Scholarship for Service graduates that are exceptionally prepared for service in the federal executive branch after graduation. Through this project, CMU leverages its expertise in cybersecurity and AI to advance the field and deepen national relationships. This proposal leverages CMU’s existing strengths in cybersecurity and AI research and education to craft new opportunities for advanced research, cross-disciplinary collaboration and intellectual exchange. The MSIS program continues to produce graduates who excel in cybersecurity roles, equipped with the interdisciplinary knowledge and career-ready skills they need to thrive. The M.S. in Artificial Intelligence Engineering - Information Security degree responds to the rapidly evolving cybersecurity field by integrating security foundations and AI engineering, providing students with a deep understanding of AI and machine learning methods, systems, toolchains and cross-cutting issues, including security, privacy and ethical and policy challenges. All participating scholars complete coursework from both degree programs, ensuring that these students are well equipped to handle the intersecting challenges and opportunities of cybersecurity and AI. Through a CyberFX Capstone Semester, Scholarship for Service students apply their skills to projects that will have a direct impact on the broader cybersecurity community, choosing from: security and AI-focused research or application; creation of software or tools to support cybersecurity education; or cybersecurity competitions or hackathons that result in knowledge transfer. Publications will benefit other students, researchers and professionals in the field. This project is supported by the CyberCorps® Scholarship for Service (SFS) program, which funds proposals establishing or continuing scholarship programs in cybersecurity and aligns with the U.S. National Cyber Strategy to develop a superior cybersecurity workforce. Following graduation, scholarship recipients are required to work in cybersecurity for a federal, state, local, or tribal Government organization for the same duration as their scholarship support. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-08
Awards are made to Carnegie Mellon University and Purdue University to enable the development of a cyberinfrastructure that supports the analysis of cryo-electron tomography (cryo-ET) data. Cryo-ET is a cutting-edge imaging technology for revealing the structures and spatial organizations of subcellular components, in particular, macromolecular complexes, inside cells. This project will build an open-access, annotated database of cryo-ET images—both simulated and experimentally obtained—alongside a robust toolbox of computational methods for their analysis. The resulting resources will lower the entry barrier for new researchers, promote collaboration, and accelerate scientific discoveries across the life sciences. Educational outreach includes training Ph.D., graduate, and undergraduate students through interdisciplinary coursework, hands-on research, and workshops at both institutions. Workshops will also be held for the broader research community, including educators and students at the high school level. Beyond biology, the tools developed will support innovations in medical imaging, and materials science, ultimately contributing to workforce development in data-driven scientific fields. The intellectual merit of this project lies in establishing a foundational infrastructure for cryo-ET data analysis that addresses a critical gap in the field: the lack of well-curated, annotated datasets and standardized computational tools. By developing realistic simulated datasets, manually and semi-automatically annotated experimental data, and a benchmark database, the project supports rigorous method development and validation. The project also integrates state-of-the-art machine learning and computer vision algorithms, including novel simulation methods and segmentation frameworks. Together, these innovations will catalyze the development of new computational techniques and deepen our understanding of the structures and spatial organizations of subcellular components within cells, advancing the frontiers of structural and cell biology. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-08
The aim of this project is to better understand mathematical structures that are 1) discrete, and 2) are of geometric and algebraic nature. The examples of discrete structures include networks, matrices and arrangements of convex sets. There are a number of instances where the optimal discrete objects necessarily possess non-trivial algebro-geometric properties. This project is devoted to understanding this phenomenon. Students will be mentored as part of the project. The specific problems include algebraic and geometric questions related to the Turan problems and combinatorial questions in finite geometry. Particular attention will be devoted to algebraic constructions, especially in mixed characteristic. The potential impacts include a new method for testing conjectures in discrete geometry and better locally decodable codes. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-08
This project aims to make progress on several areas at the interface of probability, geometry, and combinatorics. Three specific directions will be investigated that range from more theoretical to more applied. The project also involves a research experience for undergraduates. The first direction of the project is that of enabling statistical inference in the realm of probability distributions characterized by geometric constraints. For example, can we efficiently choose a uniformly random partition of a region into geometrically nice pieces? When can we efficiently choose a uniformly random partition of a graph into, say, connected subgraphs? The second directions concerns the minimum lengths of combinatorial structures like spanning trees and Hamilton cycles among "cities" in geometric space. Here, some aims push our knowledge in directions that are more closely motivated by applications, while others probe the limits of our current theoretical approaches. Finally, the third direction aims to explore competitive/online analogs of Radon's theorem for high-dimensional point-sets. This direction places results in geometry and classical machine learning in a new context, while offering new paradigms for inquiry. One example is the notion of a pseudo-randomized trial, in which controlling curse-of-dimensionality effects can replace randomization in rigorous comparison between groups under a linear effects model. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Awards · FY 2025 · 2025-08
Financial practitioners routinely face statistical challenges involving data that is gathered or observed sequentially over time, rather than as a fixed and complete dataset. Examples include risk assessment and monitoring, evaluation of hedging and trading strategies, and estimation of price impact from trades. This project uses recent advances from the field of anytime-valid statistics to address these issues. This is a fast-growing field at the intersection of statistics, online learning, information theory, and game theory. Using ideas from mathematical finance, this project will make meaningful contributions to the theoretical foundations and broader goals of anytime-valid statistics, in particular toward mitigating what is known as the replication crisis in science. The project revolves around the concept of the numeraire e-variable, an optimal test statistic for general statistical hypotheses that is based on the testing-by-betting framework. This approach creates strong links to mathematical finance and offers powerful methodologies in complex, non-parametric, and temporally dependent settings-- scenarios common in financial applications. The primary objectives of this project are to (i) develop a general theory of the numeraire process, a dynamic analog of the numeraire e-variable; (ii) establish a formal framework for the effective null hypothesis, a fundamental object associated with the numeraire e-variable and e-process; and (iii) develop a particular application in finance, namely anytime-valid price impact estimation. Estimating price impact is an important and difficult inference problem which is amenable to the approaches developed here. This project will showcase the power and usefulness of anytime-valid statistics in finance and pave the way for its use in a broad range of financial applications. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.