Zhengxu Yu

AI Researcher, Huawei London Research Centre (ex-Alibaba)

Email: yuzxfred AT gmail.com

Download PDF

Zhengxu Yu

Email: yuzxfred@gmail.com | DBLP: https://dblp.org/pid/246/3155 | Website: https://zhengxuyu.github.io

Experience

AI Research Scientist, Huawei R&D (UK) Ltd., London, UK (Jan. 2026 - Now)

Research on self-organizing multi-agent system.
Research on LLM Reinforcement Learning and continuous learning methods.

Senior Research Engineer, Inephany Ltd., London, UK (Aug. 2025 - Jan. 2026)

Developing an meta-learning based reinforcement learning (RL) system to automatically train conventional models such as LLMs and ViTs.

Algorithm Expert, Apsara Lab. (former City Brain Lab., DAMO Institute), Alibaba Cloud, Alibaba Group (April 2021 - July 2025)

Research on post-training methods for reasoning LLMs, primarily focusing on reinforcement learning-based approaches. Recently proposed several methods to enhance LLM reasoning performance and efficiency, accepted by NeurIPS 2025.
Developed a LLM agent (dynamic reasoning and tool utilisation ability) system to autonomously solve challenging real-world agentic tasks (e.g., deep research) and operational research tasks. Applied in Olympic Games schedule optimisation projects.
Led the development of a city-level digital twin system integrating reinforcement learning-based layout optimisation and multimodal deep learning, achieving a 20%+ improvement in urban CCTV deployment efficiency.
Supervise research interns and promote high-impact publications focused on reinforcement learning.

Research Intern, City Brain Lab., DAMO Institute, Alibaba Group (January 2018 - March 2021)

Conducted research on large-scale multi-agent reinforcement learning and published two first-author papers in CCF-A journals/conferences.
Granted 7 national invention patents in reinforcement learning and computer vision algorithms.

Education

Ph.D, Zhejiang University, Department of Computer Science (Sep. 2017 - Mar. 2021)
- Under the supervision of Prof. Deng Cai & Prof. Xiaofei He
- Research interests: large-scale reinforcement learning, deep representation learning.
MS.c, University of Surrey, Department of Computer Science (Sep. 2015 - Dec. 2016)
- Under the supervision of Prof. H. Lilian Tang
- Thesis: CNN-based Mycobacterium Cells Segmentation for Time-lapse Images
Bachelor, Jilin University, Department of Communication Engineering (Sep. 2011 - Jun. 2015)

Awards and Honours

Outstanding Intern Award, Alibaba Group DAMO Institute (2018, 2019, 2021)
Outstanding Graduate Student Award, Zhejiang University (2019, 2020)

Academic Services

PC Member of top AI conferences, including IEEE TIP, IEEE TMM, IEEE TCDS, NeurIPS, IJCAI, AAAI, ECCV, and ICLR.

Publications

Zhang, Y.*, Yu, Z.* (*Co-first author), Pan, W., Jin, Z., Fu, Q., Lin, B., Cai, D., Ye J. TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs. NeurIPS 2025 (main track).
Wei, Pan, Yu, Z., Wu Y., Liang, X., Jin, Z., Fu, Q., Shang, P., Lin, B., He, X., Ye, J. FGD-Align: Pluralistic Alignment for Large Language Models via Fuzzy Group Decision-Making. AAAI 2026
Pan, W., Lin, B., Wang, Y., Yu, Z, Zhao, X., He, X., Ye, J. Cooperative Driving at Multiple Unsignalized Intersections in Fully Autonomous Driving Scenarios. IEEE-TITS (2025)
Xiang, C., Jin, Z., Yu, Z., Hua, X. S., Hu, Y., Qian, W., … & He, X. (2023). Optimizing traffic efficiency via a reinforcement learning approach based on time allocation. International Journal of Machine Learning and Cybernetics, 14(10), 3381-3391.
Peng, L., Liu, F., Yu, Z., Yan, S., Deng, D., Yang, Z., … & Cai, D. (2022, October). Lidar point cloud guided monocular 3d object detection. In European conference on computer vision (pp. 123-139). Cham: Springer Nature Switzerland.
Yu, Z., Jin, Z., Wei, L., Huang, J., Cai, D., He, X., Hua, X.S. “Progressive Transfer Learning.” IEEE Transactions on Image Processing (TIP), vol. 31, pp. 1340-1348, 2022, doi: 10.1109/TIP.2022.3141258.
Wang, W., Yu, Z., Fu, C., Cai, D., & He, X. (2021). COP: customized correlation-based Filter level pruning method for deep CNN compression. Neurocomputing, 464, 533-545.
Guo, X.*, Yu, Z.* (*Co-first author), Wang, P., Jin, Z., Huang, J., Cai, D., He, X., Hua, X.S. “Urban Traffic Light Control via Active Multi-agent Communication and Supply-Demand Modeling.” IEEE Transactions on Knowledge and Data Engineering (2021), doi: 10.1109/TKDE.2021.3130258.
Yu, Z. *, Liang, S.* (*Co-first author), Wei, L., Jin, Z., Huang, J., Cai, D., He, X., Hua, X.S. “MaCAR: Urban Traffic Light Control via Active Multi-agent Communication and Action Rectification.” IJCAI ‘2020 (Acceptance Rate: 12.3% (592/4717)).
Yu, Z., Jin, Z., Wei, L., Guo, J., Huang, J., Cai, D., He, X., Hua, X.S. “Progressive Transfer Learning for Person Re-identification.” IJCAI ‘2019 (Acceptance Rate: 17.9% (850/4752)).
Yu, Z.*, Zhao, Y.* (*Co-first author), Hong, B., Jin, Z., Huang, J., Cai, D., Hua, X.S. “Apparel-invariant Feature Learning for Person Re-identification. “ IEEE Transactions on Multimedia, doi: 10.1109/TMM.2021.3119133.
Xie, L., Xiang, C., Yu, Z., Xu, G., Yang, Z., Cai, D., He, X. “PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module.” AAAI ‘2020 (Acceptance Rate: 16.2% (1150/7095)).
Wei, L., Wei, Z., Jin, Z., Yu, Z., Huang, J., Cai, D., He, X., Hua, X.S. “SIF: Self-Inspirited Feature Learning for Person Re-Identification.” IEEE Transactions on Image Processing (TIP) 29: 4942-4951 (2020).