Scientific Publications

DBI2 publications

Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC

Type Conference paper
Year 2025
Author River (J.L.F.) Betting, Chris I. De Zeeuw, Christos Strydis
Link to publication Link to the publication
DOI 10.1109/HiPC58850.2023.00044

The cloud has become a powerful and useful environment for the deployment of High-Performance Computing (HPC) applications, but the large number of available instance types poses a challenge in selecting the optimal platform. Users often do not have the time or knowledge necessary to make an optimal choice. Recommender systems have been developed for this purpose but current state-of-the-art systems either require large amounts of training data, or require running the application multiple times; this is costly. In this work, we propose Oikonomos-II, a resource-recommendation system based on reinforcement learning for HPC applications in the cloud. Oikonomos-II models the relationship between different input parameters, instance types, and execution times. The system does not require any preexisting training data or repeated job executions, as it gathers its own training data opportunistically using user-submitted jobs, employing a variant of the Neural-LinUCB algorithm. When deployed on a mix of HPC applications, Oikonomos-II quickly converged towards an optimal policy. The system eliminates the need for preexisting training data or auxiliary runs, providing an economical, general-purpose, resource-recommendation system for cloud HPC.

River (J.L.F.) Betting, Chris I. De Zeeuw, Christos Strydis
10.1109/HiPC58850.2023.00044
High-Performance Computing (HPC), Cloud Deployment, Resource-Recommendation System, Reinforcement Learning, Neural-LinUCB Algorithm

Want to join DBI²?

Browse the vacancies page

Contact details

Radboud University
DBI2 Office
Heyendaalseweg 135
6525 AJ Nijmegen

Socials

twitter   displaying 19 gallery images for linkedin logo png 25