
Jiecao Yu 俞杰草
Senior Staff Research Scientist
Meta Platforms, Inc.
Email: jiecaoyu [AT] meta [DOT] com
About Me
I am currently a senior staff research scientist at Meta Superintelligence Labs, Meta Platforms, Inc. My work focuses on LLM training infrastructure, especially kernel, numeric and parallelism optimizations.
I obtained my Ph.D. from the University of Michigan - Ann Arbor under the supervision of Prof. Scott Mahlke. Before joining UMich in August 2014, I received my B.Eng. degree of Electrical Engineering in Zhejiang University, China.
Google Scholar / LinkedIn / Github / Contact Me
Experiences
- Meta Platforms, Inc.
Senior Staff Research Scientist, 08/2025 - Present
Staff Research Scientist, 02/2023 - 08/2025
Senior Research Scientist, 08/2021 - 02/2023
Research Scientist, 10/2019 - 08/2021
Menlo Park, CA
- Facebook, Inc.
Research Intern, 05/2018 - 08/2018, Menlo Park, CA
- ARM Inc.
Research Intern, 05/2017 - 07/2017, Austin, TX
Research Intern, 06/2016 - 08/2016, Austin, TX
Education
- Ph.D. Candidate, Computer Science & Engineering, 08/2014 - 09/2019
Advisor: Prof. Scott Mahlke
University of Michigan, Ann Arbor, MI
- M.S. Computer Science & Engineering, 08/2014 - 12/2015
University of Michigan, Ann Arbor, MI
- B.Eng. Electronic & Information Engineering, 08/2010 - 06/2014
Honored Minor, Advanced Honor Class of Engineering Education (ACEE)
Zhejiang University, Hangzhou, China
Publications
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
Liang Luo, Yinbin Ma, Quanyu Zhu, Vasiliy Kuznetsov, Yuxin Chen, Jian Jiao, Jiecao Yu, Buyun Zhang, Tongyi Tang, Xiaohan Wei, Yanli Zhao, Zeliang Chen, Yuchen Hao, Venkatesh Ranganathan, Sandeep Parab, Yantao Yao, Maxim Naumov, Chunzhi Yang, Shen Li, Ellie Wen, Wenlin Chen, Santanu Kolay, Chunqiang Tang
The International Symposium on Computer Architecture (ISCA), Jun, 2026
Unveiling the potential of quantization with mxfp4: Strategies for quantization error reduction
Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim
International Conference on Machine Learning (ICML), Jul, 2026
Fast and Simplex: 2-Simplicial Attention in Triton
Aurko Roy, Timothy Chou, Sai Surya Duvvuri, Sijia Chen, Jiecao Yu, Xiaodong Wang, Manzil Zaheer, Rohan Anil
arXiv:2507.02754
Scaling Llama 3 Training with Efficient Parallelism
Weiwei Chu*, Xinfeng Xie*, Jiecao Yu*, Jie Wang*, Jongsoo Park, Naman Goyal, Vedanuj Goswami, Abhishek Kadian, Yuchen Hao, Jianyu Huang, Andrew Gu, Min Si, Ching-Hsiang Chu, Pavan Balaji, Feng Tian, Xiaodong Wang, Chris Cai, Jun Wang, Mustafa Ozdal, Amar Phanishayee, CQ Tang (*: co-primary authors)
The International Symposium on Computer Architecture (ISCA), Jun, 2025
Accelerating Transformer Inference and Training with 2: 4 Activation Sparsity
Daniel Haziza, Timothy Chou, Dhruv Choudhary, Jesse Cai, Luca Wehrstedt, Francisco Massa, Jiecao Yu, Geonhwa Jeong, Supriya Rao, Patrick Labatut
Workshop on Sparsity in LLMs (SLLM) at the International Conference on Learning Representations (ICLR), Apr, 2025
The Llama 3 Herd of Models
Jiecao Yu as one of the core contributors.
Llama 3 release. Also core contributor to pruning/distillation framework for Llama 3.2 1B and 3B models
arXiv:2407.21783
BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks
Yunjie Pan, Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke.
ACM Transactions on Embedded Computing Systems / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Sep, 2023
First-Generation Inference Accelerator Deployment at Facebook
Michael Anderson, et al., Jiecao Yu
arXiv:2107.04140
Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs
Xiaowei Wang, Vidushi Goyal, Jiecao Yu, Valeria Bertacco, Andrew Boutros, Eriko Nurvitadhi, Charles Augustine, Ravi Iyer and Reetuparna Das
International Symposium On Field-Programmable Custom Computing Machines (FCCM), May, 2021
Alternate Model Growth and Pruning for Efficient Training of Recommendation Systems
Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal
IEEE International Conference on Machine Learning and Applications (ICMLA), Dec, 2021
Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data
Mao Ye, Dhruv Choudhary, Jiecao Yu, Ellie Wen, Zeliang Chen, Jiyan Yang, Jongsoo Park, Qiang Liu, Arun Kejariwal
arXiv: 2010.08655
TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers
Jiecao Yu, Andrew Lukefahr, Reetuparna Das, Scott Mahlke
ESWEEK-TECS special issue / the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Oct, 2019
Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks
Xiaowei Wang, Jiecao Yu, Charles Augustine, Ravi Iyer, Reetuparna Das
The 25th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Feb, 2019
Spatial-Winograd Pruning Enabling Sparse Winograd Convolution
Jiecao Yu, Jongsoo Park, Maxim Naumov
arXiv: 1901.02132
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
Jiecao Yu, Andrew Lukefahr, David Palframa, Ganesh Dasika, Reetuparna Das, Scott Mahlke
The 44th International Symposium on Computer Architecture (ISCA), Jun, 2017
Selected in ISCA@50 25-Year Retrospective: 1996-2020 (98 significant papers selected out of 1,077 accepted papers from 1996 through 2020)
Last updated: May, 2026