Jiecao Yu 俞杰草

Research Scientist

Meta Platforms, Inc.

Email: jiecaoyu [AT] meta [DOT] com


[ About Me ] [ Experiences ] [ Education ] [ Publications ]


About Me

I am currently a staff research scientist at Meta Platforms, Inc. My current work focuses on optimizing LLM training and inference efficiency, especially low-precision (e.g., FP8) training / inference, efficiency-aware model arch design, distillation and pruning. I was the main contributor developing the FP8 training framework for Llama 4.

I obtained my Ph.D. from the University of Michigan - Ann Arbor under the supervision of Prof. Scott Mahlke. Before joining UMich in August 2014, I received my B.Eng. degree of Electrical Engineering in Zhejiang University, China.

Google Scholar   /   LinkedIn   /   Github   /   Contact Me


Experiences

-  Meta Platforms, Inc.
   Staff Research Scientist, 02/2023 - Present
   Senior Research Scientist, 08/2021 - 02/2023
   Research Scientist, 10/2019 - 08/2021
   AI System SW/HW Co-Design Group
   Menlo Park, CA

-  Facebook, Inc.
   Research Intern, 05/2018 - 08/2018, Menlo Park, CA

-  ARM Inc.
   Research Intern, 05/2017 - 07/2017, Austin, TX
   Research Intern, 06/2016 - 08/2016, Austin, TX


Education

-  Ph.D. Candidate, Computer Science & Engineering, 08/2014 - 09/2019
   Advisor: Prof. Scott Mahlke
   University of Michigan, Ann Arbor, MI

-  M.S. Computer Science & Engineering, 08/2014 - 12/2015
   University of Michigan, Ann Arbor, MI

-  B.Eng. Electronic & Information Engineering, 08/2010 - 06/2014
   Honored Minor, Advanced Honor Class of Engineering Education (ACEE)
   Zhejiang University, Hangzhou, China


Publications

Fast and Simplex: 2-Simplicial Attention in Triton
Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil
arXiv:2507.02754

Scaling Llama 3 Training with Efficient Parallelism
Weiwei Chu*, Xinfeng Xie*, Jiecao Yu*, Jie Wang*, Jongsoo Park, Naman Goyal, Vedanuj Goswami, Abhishek Kadian, Yuchen Hao, Jianyu Huang, Andrew Gu, Min Si, Ching-Hsiang Chu, Pavan Balaji, Feng Tian, Xiaodong Wang, Chris Cai, Jun Wang, Mustafa Ozdal, Amar Phanishayee, CQ Tang (*: co-primary authors)
The International Symposium on Computer Architecture (ISCA), Jun, 2025

Accelerating Transformer Inference and Training with 2: 4 Activation Sparsity
Daniel Haziza, Timothy Chou, Dhruv Choudhary, Jesse Cai, Luca Wehrstedt, Francisco Massa, Jiecao Yu, Geonhwa Jeong, Supriya Rao, Patrick Labatut
Workshop on Sparsity in LLMs (SLLM) at the International Conference on Learning Representations (ICLR), Apr, 2025

The Llama 3 Herd of Models
Jiecao Yu as one of the core contributors.
Llama 3 release. Also core contributor to pruning/distillation framework for Llama 3.2 1B and 3B models
arXiv:2407.21783

BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks
Yunjie Pan, Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke.
ACM Transactions on Embedded Computing Systems / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Sep, 2023

First-Generation Inference Accelerator Deployment at Facebook
Michael Anderson, et al., Jiecao Yu
arXiv:2107.04140

Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs
Xiaowei Wang, Vidushi Goyal, Jiecao Yu, Valeria Bertacco, Andrew Boutros, Eriko Nurvitadhi, Charles Augustine, Ravi Iyer and Reetuparna Das
International Symposium On Field-Programmable Custom Computing Machines (FCCM), May, 2021

Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal
IEEE International Conference on Machine Learning and Applications (ICMLA), Dec, 2021

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data
Mao Ye, Dhruv Choudhary, Jiecao Yu, Ellie Wen, Zeliang Chen, Jiyan Yang, Jongsoo Park, Qiang Liu, Arun Kejariwal
arXiv: 2010.08655

TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers
Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
ESWEEK-TECS special issue / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct, 2019

Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks
Xiaowei Wang, Jiecao Yu, Charles Augustine, Ravi Iyer, Reetuparna Das
The 25th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb, 2019

Spatial-Winograd Pruning Enabling Sparse Winograd Convolution
Jiecao Yu, Jongsoo Park, Maxim Naumov
arXiv: 1901.02132

Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
Jiecao Yu, Andrew Lukefahr, David Palframa, Ganesh Dasika, Reetuparna Das, Scott Mahlke
The 44th International Symposium on Computer Architecture (ISCA), Jun, 2017
Selected in ISCA@50 25-Year Retrospective: 1996-2020 (98 significant papers selected out of 1,077 accepted papers from 1996 through 2020)

Adaptive Cache Partitioning on a Composite Core
Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke
The PRISM-3 Workshop at ISCA, Jun, 2015


  Last updated: Jul, 2025