Predicting LLM Inference Latency: A Roofline-Driven ML Method
- Saki Imai
- Rina Nakazawa
- et al.
- 2024
- NeurIPS 2024
Dr. Marcelo is a Research Scientist at IBM Research Tokyo, where he is part of the IBM Hybrid-Cloud department. His research is mainly focused on cloud computing workloads, workload optimized systems, resource management, parallel and distributed systems and system performance evaluation. His work also includes novel methods for performance optimization of micro-service workload and improvements in the scalability of hybrid-cloud control plane. Most recently, he has engaged in AI inference optimization analysis to identify AI system and infrastructure limitations and propose optimizations for both GPU and IBM Artificial Intelligence Unit (AIU).
He received his Ph.D in computer engineering from Polytechnic University of Catalonia (Spain), his M.Sc. and B.Sc. degree degrees in computer engineering from University of Sao Paulo (Brazil).