
Jiecao Yu 俞杰草
Research Scientist
Meta Platforms, Inc.
Email: jiecaoyu [AT] meta [DOT] com
About Me
I am currently a staff research scientist at Meta Platforms, Inc. My current work focuses on optimizing LLM training and inference efficiency, especially low-precision (e.g., FP8) training / inference, efficiency-aware model arch design, distillation and pruning. I was the main contributor developing the FP8 training framework for Llama 4.
I obtained my Ph.D. from the University of Michigan - Ann Arbor under the supervision of Prof. Scott Mahlke. Before joining UMich in August 2014, I received my B.Eng. degree of Electrical Engineering in Zhejiang University, China.
Google Scholar / LinkedIn / Github / Contact Me
Experiences
- Meta Platforms, Inc.
Staff Research Scientist, 02/2023 - Present
Senior Research Scientist, 08/2021 - 02/2023
Research Scientist, 10/2019 - 08/2021
AI System SW/HW Co-Design Group
Menlo Park, CA
- Facebook, Inc.
Research Intern, 05/2018 - 08/2018, Menlo Park, CA
- ARM Inc.
Research Intern, 05/2017 - 07/2017, Austin, TX
Research Intern, 06/2016 - 08/2016, Austin, TX
Education
- Ph.D. Candidate, Computer Science & Engineering, 08/2014 - 09/2019
Advisor: Prof. Scott Mahlke
University of Michigan, Ann Arbor, MI
- M.S. Computer Science & Engineering, 08/2014 - 12/2015
University of Michigan, Ann Arbor, MI
- B.Eng. Electronic & Information Engineering, 08/2010 - 06/2014
Honored Minor, Advanced Honor Class of Engineering Education (ACEE)
Zhejiang University, Hangzhou, China
Publications
Fast and Simplex: 2-Simplicial Attention in Triton
Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil
arXiv:2507.02754
Scaling Llama 3 Training with Efficient Parallelism
Weiwei Chu*, Xinfeng Xie*, Jiecao Yu*, Jie Wang*, Jongsoo Park, Naman Goyal, Vedanuj Goswami, Abhishek Kadian, Yuchen Hao, Jianyu Huang, Andrew Gu, Min Si, Ching-Hsiang Chu, Pavan Balaji, Feng Tian, Xiaodong Wang, Chris Cai, Jun Wang, Mustafa Ozdal, Amar Phanishayee, CQ Tang (*: co-primary authors)
The International Symposium on Computer Architecture (ISCA), Jun, 2025
Accelerating Transformer Inference and Training with 2: 4 Activation Sparsity
Daniel Haziza, Timothy Chou, Dhruv Choudhary, Jesse Cai, Luca Wehrstedt, Francisco Massa, Jiecao Yu, Geonhwa Jeong, Supriya Rao, Patrick Labatut
Workshop on Sparsity in LLMs (SLLM) at the International Conference on Learning Representations (ICLR), Apr, 2025
The Llama 3 Herd of Models
Jiecao Yu as one of the core contributors.
Llama 3 release. Also core contributor to pruning/distillation framework for Llama 3.2 1B and 3B models
arXiv:2407.21783
BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks
Yunjie Pan, Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke.
ACM Transactions on Embedded Computing Systems / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Sep, 2023
First-Generation Inference Accelerator Deployment at Facebook
Michael Anderson, et al., Jiecao Yu
arXiv:2107.04140
Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs
Xiaowei Wang, Vidushi Goyal, Jiecao Yu, Valeria Bertacco, Andrew Boutros, Eriko Nurvitadhi, Charles Augustine, Ravi Iyer and Reetuparna Das
International Symposium On Field-Programmable Custom Computing Machines (FCCM), May, 2021
Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal
IEEE International Conference on Machine Learning and Applications (ICMLA), Dec, 2021
Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data
Mao Ye, Dhruv Choudhary, Jiecao Yu, Ellie Wen, Zeliang Chen, Jiyan Yang, Jongsoo Park, Qiang Liu, Arun Kejariwal
arXiv: 2010.08655
TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers
Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
ESWEEK-TECS special issue / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct, 2019
Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks
Xiaowei Wang, Jiecao Yu, Charles Augustine, Ravi Iyer, Reetuparna Das
The 25th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb, 2019
Spatial-Winograd Pruning Enabling Sparse Winograd Convolution
Jiecao Yu, Jongsoo Park, Maxim Naumov
arXiv: 1901.02132
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
Jiecao Yu, Andrew Lukefahr, David Palframa, Ganesh Dasika, Reetuparna Das, Scott Mahlke
The 44th International Symposium on Computer Architecture (ISCA), Jun, 2017
Selected in ISCA@50 25-Year Retrospective: 1996-2020 (98 significant papers selected out of 1,077 accepted papers from 1996 through 2020)
Adaptive Cache Partitioning on a Composite Core
Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke
The PRISM-3 Workshop at ISCA, Jun, 2015
Last updated: Jul, 2025