Stanford Data Science

Annual Update

Fall 2023 - Fall 2024

A message from the director



Dear Friends of Stanford Data Science,

Since 2018, SDS has fostered an ecosystem at Stanford where data-intensive research will drive breakthroughs that were once thought impossible. This update illustrates our exciting progress and accomplishments from the past year.

In addition to nurturing data science leaders through our flagship Data Science Scholars and SDS Postdoctoral Fellows programs, we recruited four talented early-career faculty data scientists (two of whom are featured below). These scholars are already sparking innovation in fields such as psychology, the neurosciences, chemical engineering, medicine, and computer science, among others.

We launched a fifth research center, the Center for Decoding the Universe @ Stanford. In addition to fueling exciting astrophysics research, the center will generate methodological innovations for multimodal data and anomaly detection that have potential applications in fields like financial services.

As an early advocate for the university's recent investment in GPU-centric infrastructure, SDS is thrilled to see strong faculty demand for access to Marlowe, Stanford’s most advanced computing resource. Data-intensive methods are changing almost every field and Marlowe will help ensure Stanford’s continued leadership in research. It is already helping Stanford recruit top-tier faculty.

SDS kicked off 2025 by moving to its new base in the Computing and Data Science (CoDa) complex, adjacent to the quad. This interdisciplinary hub will boost our efforts to cultivate a campus-wide community where data-intensive science can thrive.

I am grateful for the many students, faculty, researchers, staff, and generous supporters who have helped SDS ensure that Stanford remains at the forefront of discovery for years to come. Together we are opening new avenues for what will become an extraordinary legacy.

Sincerely,

Emmanuel Candès
Barnum-Simons Chair in Mathematics and Statistics
Professor of Mathematics
Director, Stanford Data Science

+

Recruiting data science pioneers


Over the past 18 months, SDS partnered with two schools and one institute to hire four impressive junior faculty: Laura Gwilliams (SDS, Psychology, Wu Tsai Neurosciences); Brian Hie (SDS and Chemical Engineering); Brian Trippe (SDS and Statistics); and Ludwig Schmidt (SDS and Computer Science). All SDS faculty teach in the undergraduate Data Science major in the School of Humanities and Sciences. See the Stanford Report for a deeper dive on Laura and Brian, two of our first hires.

Laura Gwilliams

Laura aims to create an algorithmically precise account of how the human brain acquires, produces, and understands language. Using a new non-invasive brain recording device, she will develop the first dataset that spans single neurons, cortical columns, and region-wide structures. Laura’s research will help us better understand human intelligence, build more intelligent machines, and infuse innovative data science techniques into the neurosciences. See her discuss the algorithms of human language at the 2024 SDS conference.

Brian Hie

As the Dieter Schwarz Foundation SDS Faculty Fellow and an Innovation Investigator at the Arc Institute, Brian works at the intersections of biology and machine learning. Brian collaborated with Sarafan ChEM-H Faculty Scholar Peter Kim (Biochemistry) to develop a large language model with information about a protein’s 3D shape that could help scientists probe evolution, investigate diseases, and even develop new treatments. The model, Evo, was featured on the November 14, 2024 cover of Science and in the Stanford Report, among other outlets, and it earned Brian and his co-investigators a New York Times Good Tech Award, which celebrate technological advancements that significantly benefit humanity. All photos of Brian are courtesy of Stanford Engineering.

Training the next generation

Since 2018, the Data Science Scholars and Postdoctoral Fellows Programs have fostered a cross-campus network of rising talent. Participants pursue independent research and form a multidisciplinary community of practice, learning techniques from one another and gaining a deeper understanding of how advanced data science and computing methods can accelerate discovery in their fields.

Sydney Erikson, PhD ’27 (Physics), studies strong gravitational lenses to gain insights into the expansion of the universe. Because she works with vast amounts of multimodal astrophysical data, Sydney must use machine learning and novel data science approaches to analyze data sets and unearth patterns. Researchers like Sydney will help pioneer cutting-edge data science methodologies as members of the new Center for Decoding the Universe.

Rylan Schaeffer, PhD ’27 (Computer Science), won an "Outstanding Paper" award at the 2023 NeurIPS, the world's preeminent conference on AI. Rylan demonstrated that emergent properties of large language models (LLMs) aren’t dependent on scaling up; rather, these properties disappear with different metrics or with better statistics. This finding has major implications given the significant cost and energy consumption of LLMs. You can watch his presentation here and learn more about his development as a scholar here.

As a graduate student interested in geotechnics, SDS Postdoctoral Fellow Haojie Wang (Medicine) used machine learning, satellite imagery, and geospatial data to identify and forecast landslides. Now, he leverages his expertise in data science and remote sensing to develop a new way of monitoring global health indicators from space. Compared to traditional in-person monitoring, this approach stands to update health indicators much more rapidly, enabling governments and decision-makers around the world to craft better healthcare policy and more effectively allocate medical resources.



Building communities of practice

Stanford Data Science hosted its third annual conference in May 2024. This student-organized conference included poster sessions and panel discussions featuring the work of early-career Stanford researchers, networking sessions, and talks by SDS faculty. A recap of the event and videos of each session can be found here.

Five faculty-led research centers at Stanford Data Science continue to foster diverse communities of practice around specific themes with faculty, staff, and students who are deeply invested in data-intensive science. In addition to the new Center for Decoding the Universe (see below) these hubs include the Causal Science Center (SC²), Center for Open and Reproducible Science (CORES), Center for Sustainability Data Science (SuDS), and Data Science for Health Center.

Launched in October 2024, the Center for Decoding the Universe (CDU) is a joint venture of SDS and the Kavli Institute for Particle Astrophysics and Cosmology (KIPAC). It aims to help us better understand how the universe works by using innovative data science to extract insights from massive observational astrophysics data sets.

Photo left to right: Emmanuel Candès, Risa Wechsler, Susan Clark, and Chris Mentzel.

In October, faculty co-directors Susan Clark (Physics) and Risa Wechsler (Physics, and Director of KIPAC), hosted the inaugural CDU Quarterly Forum. Researchers from a variety of disciplines such as physics, computer science, electrical engineering, and bioengineering discussed the latest machine learning, computation, statistics, data science, and analysis techniques. A recap can be found here.


“On the astrophysics side of things, we are in a data revolution. We are moving into an era where the volume and rate of new astronomical data absolutely can’t be taken advantage of by many of our traditional techniques.”

- Susan Clark



Shared computational resources to power research

In September 2024, President Jonathan Levin and Provost Jenny Martinez announced that advancing the university’s leadership in data-driven discovery and AI would be one of their first-year priorities. They highlighted the expected completion of Marlowe, an on-premises, GPU-centric hub first proposed by Stanford Data Science.

This resource – comprising hardware, software, and expert scientific staff – is unleashing breakthrough science across Stanford by enabling faculty across a wide range of disciplines (including physics, sustainability, the life sciences, and social sciences) to efficiently analyze and perform computation on large and complex data sets.

Following beta testing in the fall, it is now open to applications from all faculty. SDS hired Craig Kapfer, formerly with Chan Zuckerberg Biohub, as its inaugural Senior Director of Research Data Science. Craig is hiring a team of highly skilled Research Data Scientists who will partner closely with faculty and students throughout the discovery process, from pre-processing data to strategizing advanced methodologies and interpreting results. See the Stanford Report to learn more about the faculty and kinds of research that will benefit from Marlowe.

“Marlowe has been a game-changer for our
research. Its 10x faster training speed enabled us to achieve state-of-the-art results in computer graphics. Without Marlowe, the project would have stalled despite months of prior work.”
– Gordon Wetzstein (Electrical Engineering)
Gordon Wetzstein working with a graduate student in the Stanford Computational Imaging Lab.
“Marlowe is turbocharging antimicrobial discovery. Pairing it with our pre-trained genomic model, Evo, we could quickly prioritize promising candidates for lab testing. Though still early, our results are promising—several candidates are already being tested against harmful bacteria, showcasing the power of this innovative process!
– Brian Hie

Stanford Data Science

To learn about Stanford Data Science, please contact Chris Mentzel, Executive Director (cmentzel@stanford.edu).

To learn about making a gift to Stanford Data Science, please contact Krysten Hommel,
Senior Associate Director of Strategic Initiatives (khommel@stanford.edu).