Wei Tan

Senior Research Engineer
Citadel LLC

[Turn on javascirpt to check the link]


quantitative research, machine learning, GPU, high-performance computing, distributed systems.


Before joining Citadel he was a Research Staff Member at the Cognitive Computing division, IBM T. J. Watson Research Center. Wei has a wide range of research interests in distributed computing, machine learning and GPU computing. Specifically, he worked on GPU accelerated platform for large-scale machine learning. He developed cuMF, by far the fastest matrix factorization library on GPUs.

Prior to that, he worked with Prof. Ian Foster on grid computing, at the University of Chicago and Argonne National Laboratory. He received his Ph.D. from Tsinghua University, China.

He held adjunct professor positions at Tsinghua University and Tianjin University. He is a recipient of the IEEE Peter Chen Big Data Young Researcher Award in 2016, and IBM Outstanding Technical Acoomplishment Award for many times. He also received best (student) paper awards from IEEE ICWS, IEEE SCC and IEEE ccGrid. His research and software have been incorporated in IBM products, open-source offerings and patent portfolio.


  1. "CuLDA: Solving Large-scale LDA Problems on GPUs", HPDC 19. (arXiv, code)
  2. The most recent paper on cuMF, "Matrix Factorization on GPUs with Memory Optimization and Approximate Computing", ICPP, August 13-16, 2018, Eugene, Oregon, USA. (arXiv)
  3. Area Vice Chair, IEEE IPDPS, Vancouver, Canada. May 21-25, 2018.
  4. Invited Talk, "Matrix Factorization on GPUs: A Tale of Two Algorithms", ParLearning Workshop, IEEE IPDPS 2018.

Selected Publications

For the full publication list see Google Scholar.

  1. Business and Scientific Workfows: A Web Service-Oriented Approach.
    Wei Tan, MengChu Zhou.
    Wiley-IEEE Press, 2013. [Book]
  2. CuMF SGD: Fast and Scalable Matrix Factorization.
    Xiaolong Xie, Wei Tan, Liana L. Fong, Yun Liang.
    ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2017.
    [arXiv Preprint] [code]
  3. Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs.
    Wei Tan, Liangliang Cao, Liana L. Fong.
    ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2016.
    [arXiv Preprint] [code]
  4. SR-LDA: Mining Effective Representations for Generating Service Ecosystem Knowledge Maps.
    Bing Bai, Yushun Fan, Wei Tan, Jia Zhang
    IEEE International Conference on Services Computing, (SCC), 2017. [Paper]
    Best Paper Award
  5. Dilated Recurrent Neural Networks.
    Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael J. Witbrock, Mark A. Hasegawa-Johnson, Thomas S. Huang
    Neural Information Processing Systems, (NIPS), 2017. [Paper]
  6. DeepSea: Progressive Workload-Aware Partitioning of Materialized Views in Scalable Data Analytics.
    Jiang Du, Renée J Miller, Boris Glavic, Wei Tan
    International Conference on Extending Database Technology, (EDBT), 2017. [Paper]

Open-source Software

cuMF: CUDA Matrix Factorization library for recommender systems, link prediction, word embedding and other latent models. Also integrated with Spark to accelerate ALS in MLLlib (cuMF, IBMSparkGPU).


  1. 2017, 2016, 2014 Outstanding Technical Achievement Award, IBM
  2. 2017 Best Paper Award, IEEE SCC
  3. 2016 Peter Chen Big Data Young Researcher Award, IEEE
    To who "has made significant contributions to Big Data research as evidenced by top publications, citations and awards"
  4. 2016 Best Student Paper Award Runner-Up, IEEE ICWS
  5. 2015 Best Paper Award, IEEE/ACM CCGrid
  6. 2014 Best Student Paper Award, IEEE ICWS
  7. 2011 Best Paper Award, IEEE SCC
  8. 2010 Pacesetter Award, Argonne National Laboratory, USA
    "for excellence in achievement and performance which truly surpasses normal job expectations"
  9. 2008 caBIG Teamwork Award, National Cancer Institute, USA for who "made significant contributions to the caBIG community"
  10. 2008 Outstanding Poster Award, Biomedical Informatics Without Borders Meeting, National Cancer Institute (NCI), USA and National Cancer Research Institute (NCRI), UK
  11. 2006 IBM Ph.D. Fellowship Award
    "honors exceptional Ph.D. students who have an interest in solving problems that are important to IBM and fundamental to innovation"


  1. US10319069, Matrix factorization with approximate computing.
  2. US10310908, Dynamic usage balance of central processing units and accelerators..
  3. US10203988, Adaptive parallelism of task execution on machines with accelerators.
  4. US10346505, System, method, and recording medium for differentiated and partial feature update in alternating least square.
  5. US10169275, System, method, and recording medium for topology-aware parallel reduction in an accelerator.
  6. US9626736, Memory-aware matrix factorization.
  7. US10423575, Computational storage for distributed computing.
  8. US9998531, Computer-based, balanced provisioning and optimization of data transfer resources for products and services.
  9. US9998550, Network based service composition with variable conditions.
  10. US9460147, Partition-based index management in hadoop-like data stores.
  11. US9218383, Differentiated secondary index maintenance in log structured NoSQL data stores.
  12. US9996568, Index maintenance based on a comparison of rebuild vs. update.
  13. US9311252, Hierarchical storage for LSM-based NoSQL stores.
  14. US9736199, Dynamic and collaborative workflow authoring with cloud-supported live feedback.
  15. US8843894, Preferential execution of method calls in hybrid systems.

Last updated in Sept, 2019.