Zhipeng Zhang (张智鹏)

I am a software engineer at Alibaba, working on pre-training and post-training infrastructure for LLMs. I am a core contributor to the pre-training infrastructure that powers Qwen2.5, Qwen3, and Qwen3.5. My recent work focuses on Agentic RL infrastructure for the Qwen 3.5/3.6 series. Previously, I co-founded Flink ML, a distributed machine learning framework built on Apache Flink, where I serve as a Flink Committer.

I received my Ph.D. in Computer Science and Technology from Peking University in 2020, advised by Prof. Bin Cui. My doctoral research focused on big data systems and distributed machine learning systems. I earned my B.S. from Shandong University (Taishan College) in 2015, advised by Prof. Xiaohui Yu.

Work Experience

  • Staff Engineer, Alibaba, Beijing. (07/2020 - Present)
  • Research Intern, Tencent, Beijing. (11/2018 - 11/2019)
  • Visiting Researcher, ETH Zurich, Switzerland. (07/2017 - 01/2018)

Publications

  • VRouter: Micro-batch Level Load Balance via Inter-EP Routing for MoE Training

    Haiquan Wang, Zhipeng Zhang, Guanshujie Fu, Youhui Bai, Jiangfei Duan, Yuan Man, Langshi Chen, Hongqing Chen, Siyu Wang, Xiulong Yuan, Yunfei Mao, Si Chang, Linlang Jiang, Yingtao Li, Yan Wang, Yong Li, Wei Lin, Cheng Li

    Preprint, 2026

  • AdaHC: Accelerating Multi-Token Prediction with Adaptive Head Chunking with Pipeline Parallelism

    Yan Wang, Chang Si, Kaiming Yang, Zhipeng Zhang, Weijian Liu, Man Yuan, Mingzhen Li, Yong Li

    ICML, 2026

  • Continuum: An Interruption-Resilient Runtime for ML Training

    ChonLam Lao, Jiaqi Gao, Jiamin Cao, Zhipeng Zhang, Pengcheng Zhang, Jiangfei Duan, Minlan Yu, Aditya Akella, Zhilong Zheng, Yu Guan, Yichi Xu, Yong Li, Ennan Zhai, Dennis Cai, Zhengping Qian, Jingren Zhou

    OSDI, 2026

  • Qwen3 Technical Report

    Qwen Team (LLM infra contributor)

    arXiv, 2025 (arXiv)

  • Qwen2.5 Technical Report

    Qwen Team (LLM infra contributor)

    arXiv, 2024 (arXiv)

  • Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

    Bin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin

    arXiv, 2024 (arXiv)

  • Model Averaging in Distributed Machine Learning: A Case Study with Apache Spark

    Yunyan Guo, Zhipeng Zhang, Wentao Wu, Jiawei Jiang, Ce Zhang, Bin Cui, Jianzhong Li

    VLDBJ, 2021

  • Distributed Optimization and Implementation of Graph Embedding Algorithms

    Wentao Zhang, Bin Yuan, Zhipeng Zhang, Bin Cui

    JOS, 2021

  • ColumnSGD: A Column-oriented Framework for Distributed Stochastic Gradient Descent

    Zhipeng Zhang, Wentao Wu, Jiawei Jiang, Lele Yu, Bin Cui, Ce Zhang

    ICDE, 2020

  • Category-aware Graph Neural Networks for Improving E-commerce Review Helpfulness Prediction

    Xiaoru Qu, Zhao Li, Jialin Wang, Zhipeng Zhang, ..., Jun Gao

    CIKM, 2020

  • PSGraph: How Tencent trains large-scale graphs with Spark?

    Jiawei Jiang, Pin Xiao, Lele Yu, Xiaosen Li, Jiefeng Cheng, Xupeng Miao, Zhipeng Zhang, Bin Cui

    ICDE, 2020

  • A Reinforcement Learning-based Method for Join Optimization

    Xinyi Zhang, Zhipeng Zhang, Bin Cui

    NDBC, 2020 (Best Student Paper)

  • PS2: Parameter Server on Spark

    Zhipeng Zhang, Bin Cui, Yingxia Shao, Lele Yu, Jiawei Jiang, Xupeng Miao

    SIGMOD, 2019

  • MLlib*: Fast Training of GLMs using Spark MLlib

    Zhipeng Zhang, Jiawei Jiang, Wentao Wu, Ce Zhang, Lele Yu, Bin Cui

    ICDE, 2019

  • Angel+: A Large-Scale Machine Learning Platform on Angel

    Zhipeng Zhang, Jiawei Jiang, Lele Yu, Bin Cui

    Frontiers of Data and Computing, 2019

  • An Experimental Evaluation of SimRank-based Similarity Search Algorithms

    Zhipeng Zhang, Yingxia Shao, Bin Cui, Ce Zhang

    VLDB, 2017

  • StroMAX: Partitioning-based Scheduler for Real-time Stream Processing System

    Jiawei Jiang, Zhipeng Zhang, Bin Cui, Yunhai Tong, Ning Xu

    DASFAA, 2017

  • Resume Activeness Prediction in Online Recruitment Scenarios

    Shuyang Shi, Zhipeng Zhang, Bin Cui

    NDBC, 2017

Awards

  • SIGMOD Systems Award 2023, for Apache Flink — Aljoscha Krettek, Andrey Zagrebin, ..., Zhipeng Zhang, ..., and Zili Chen
  • NDBC 2020 Best Student Paper
  • Second-Class Scholarship of Peking University, 2019
  • Miaozhen Scholarship of Peking University, 2019
  • Academic Innovation Award of Peking University, 2018 & 2017
  • President Scholarship of Peking University, 2017 & 2016
  • Sohu Scholarship of Peking University, 2016

Open Source