Skip to content

About me

selfie

I am Cheng Zhang, a PhD student in Circuits and System Group, the Department of Electrical and Electronic Engineering, Imperial College London, supervised by Dr Yiren (Aaron) Zhao and Prof George A. Constantinides.

My research is currently sponsored by the Scaling Compute Program under ARIA, and previously by SpatialML. My research interests mainly include efficient machine learning and AI acceleration. My CV can be found here.

Education

  • PhD in Electrical and Electronic Engineering, Imperial College London, Jan 2023 - Current
  • MSc in Electronics, The University of Edinburgh, Sep 2021 - Aug 2022
  • BEng in Automation, Beihang University, Sep 2017 - Jun 2021

Publications

\(^\dag\) denotes equal contribution

  • [ISCA2026] Haoran Wu, Can Xiao, Jiayi Nie, Xuan Guo, Binglei Lou, Jeffrey T. H. Wong, Zhiwen Mo, Cheng Zhang, Przemyslaw Forys, Wayne Luk, Hongxiang Fan, Jianyi Cheng, Timothy M. Jones, Rika Antonova, Robert Mullins, Aaron Zhao. PLENA: Breaking the Memory Walls for Agentic LLM Inference. The 53rd Annual International Symposium on Computer Architecture.

  • [ACL2025] Xinxin Liu, Aaron Thomas, Cheng Zhang, Jianyi Cheng, Yiren Zhao, Xitong Gao. Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models. The 63rd Annual Meeting of the Association for Computational Linguistics.

  • [ICML2025] Cheng Zhang\(^\dag\), Hanna Foerster\(^\dag\), Robert D. Mullins, Yiren Zhao, Ilia Shumailov. Hardware and Software Platform Inference. The 42nd International Conference on Machine Learning.

  • [ICLR2025]. Cheng Zhang, Jeffrey T. H. Wong, Can Xiao, George A. Constantinides, Yiren Zhao. QERA: an Analytical Framework for Quantization Error Reconstruction. The 13th International Conference on Learning Representations.

  • [IEEE SaTML2025]. Eleanor Clifford, Adhithya Saravanan, Harry Langford, Cheng Zhang, Yiren Zhao, Robert Mullins, Ilia Shumailov, Jamie Hayes. Locking Machine Learning Models into Hardware. The 3rd IEEE Conference on Secure and Trustworthy Machine Learning.

  • [ICML2024] Cheng Zhang, Jianyi Cheng, George A. Constantinides, Yiren Zhao. LQER: Low-Rank Quantization Error Reconstruction for LLMs. The Forty-first International Conference on Machine Learning.

  • [FPL2024]. Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao. HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

  • [EMNLP2023] Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George Anthony Constantinides, Yiren Zhao. Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference? Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.

  • [ICLR2026 Workshop] Jeffrey T. H. Wong, Cheng Zhang, Louis Mahon, Wayne Luk, Anton Isopoussu, Yiren Zhao. On the Existence and Behavior of Secondary Attention Sinks. ICLR 2026 Unifying Concept Representation Learning Workshop.

  • [ICML2024 Workshop]. Yuang Chen, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao. Optimised Grouped-Query Attention Mechanism for Transformers.

  • [ICML2024 Workshop] Zixi Zhang, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao. Unlocking the Global Synergies in Low-Rank Adapters.

  • [NeurIPS2023 Workshop] Cheng Zhang, Jianyi Cheng, Zhewen Yu, Yiren Zhao. MASE: An Efficient Representation for Software-Defined ML Hardware System Exploration. Workshop on ML for Systems at NeurIPS 2023.