Euro-Par 2024 | Accepted papers | International European Conference on Parallel and Distributed Computing Accepted papersEuro-par
  • Yunkun Liao, Jingya Wu, Wenyan Lu, Xiaowei Li and Guihai Yan. Efficient RNIC Cache Side-channel Attack Detection through DPU-driven Architecture
  • Dian-Lun Lin, Tsung-Wei Huang, Joshua San Miguel and Umit Ogras. TaroRTL: Accelerating RTL Simulation using Coroutine-based Heterogeneous Task Graph Scheduling
  • Yunkun Liao, Hanyue Lin, Jingya Wu, Wenyan Lu, Huawei Li, Xiaowei Li and Guihai Yan. Athena: Add More Intelligence to RMT-based Network Data Plane with Low-bit Quantization
  • Alexis Bandet, Francieli Boito and Guillaume Pallez. Scheduling distributed I/O resources in HPC systems
  • Danilo Carastan-Santos, Georges Da Costa, Millian Poquet, Patricia Stolf and Denis Trystram. Light-weight prediction for improving energy consumption in HPC platforms
  • Jonas Hahnfeld, Jakob Blomer and Thorsten Kollegger. Parallel Writing of Nested Data in Columnar Formats
  • Zhi Lu, Songfeng Lu, Yongquan Cui, Junjun Wu, Hewang Nie, Jue Xiao and Zepu Yi. Lightweight Byzantine-Robust and Privacy-Preserving Federated Learning
  • Guangyao Zhou, Haocheng Lan, Yuanlun Xie, Wenhong Tian, Jiahong Qian and Teng Su. CSIMD: Cross-Search Algorithm with Improved Multi-Dimensional Dichotomy for Micro-batch-based Pipeline Parallel Training in DNN
  • Thiago Maltempi, Sandro Rigo, Marcio Pereira, Hervé Yviquel, Jessé Costa and Guido Araujo. Combining Compression and Prefetching to Improve Checkpointing for Inverse Seismic Problems in GPUs
  • Eunji Lee, Yoonsang Han and Gordon Moon. Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication
  • Xingbin Wang, Yan Wang, Rui Hou and Dan Meng. FakeGuard: A Novel Accelerator Architecture for Deepfake Detection Networks
  • Stef Graillat, Fabienne Jézéquel, Théo Mary, Roméo Molina and Daichi Mukunoki. Reduced-Precision and Reduced-Exponent Formats for Adaptive-Precision Sparse Matrix-Vector Product
  • Yiming Yao, Yingwei Luo, Xiaolin Wang, Zhenlin Wang, Liujia Li, Jianyu Wu and Liren Zhu. EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for Redis
  • Stepan Nassyr and Dirk Pleiter. Exploring processor micro-architectures optimised for BLAS3 micro-kernels
  • Tingkai Liu, Huili Tao, Yicheng Lu, Zhongbo Zhu, Marquita Ellis, Sara Kokkila-Schumacher and Volodymyr Kindratenko. Automated Data Management and Learning-based Scheduling for Ray-based Hybrid HPC-Cloud Systems
  • Guofeng Feng, Hongyu Wang, Zhuoqiang Guo, Mingzhen Li, Tong Zhao, Zhou Jin, Weile Jia, Guangming Tan and Ninghui Sun. Accelerating Large-scale Sparse LU Factorization for RF Circuit Simulation
  • M.A. Anju and Rupesh Nasre.. FlexiGran: Flexible Granularity Locking in Hierarchies
  • Matthieu Robeyns, Marc Baboulin, Simplice Donfack, Oguz Kaya and Theo Mary. Mixed precision randomized low-rank approximation with GPU tensor cores
  • Dario Muñoz-Muñoz, Félix García-Carballeira, Diego Camarmas-Alonso, Alejandro Calderón-Mateos and Jesús Carretero. Fault tolerant in the Expand Ad-Hoc parallel file system
  • Hongbing Tan, Xiaowei He, Libo Huang, Guichu Sun, Yuanhu Cheng, Jing Zhang, Zhong Zheng, Quan Deng, Bingcai Sui, Yongwen Wang and Liquan Xiao. ImSPU: Implicit Sharing of Computation Resources between Vector and Scalar Processing Units
  • Andrzej Lingas. Boolean Matrix Multiplication for Highly Clustered Data on the Congested Clique
  • Dengke Han, Meng Wu, Runzhen Xue, Mingyu Yan, Xiaochun Ye and Dongrui Fan. ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation
  • Jiajun Song, Jiajun Luo, Rongwei Lu, Shuzhao Xie, Bin Chen and Zhi Wang. A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning
  • Xianlong Zhou, Pei Li, Jiageng Chen and Shixiong Yao. Accelerating Stencil Computation with Fully Homomorphic Encryption Using GPU
  • Cristian Tatu, Javier Conejero, Fernando Vazquez and Rosa M. Badia. GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework
  • Suren Harutyunyan Gevorgyan, Anna Sikora, Eduardo Cesar, Jiří Filipovič, Akash Dutta, Ali Janessari and Jordi Alcaraz. Efficient Code Region Characterization through Automatic Performance Counters Reduction using Machine Learning Techniques
  • Richard Angersbach, Sebastian Kuckuk and Harald Köstler. Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication
  • Louis-Claude Canon, Anthony Dugois and Loris Marchal. Solving the Restricted Assignment Problem to Schedule Multi-Get Requests in Key-Value Stores
  • Zhuoyao Huang, Nan Zhang, Jingran Shen, Georgios Diamantopoulos, Zhengchang Hua, Nikos Tziritas and Georgios Theodoropoulos. Distributed Simulation for Digital Twins of Large-Scale Real-World DiffServ-Based Networks
  • Yuxiang Zhang, Xin Liu, Meng Wu, Mingyu Yan, Wei Yan, Xiaochun Ye and Dongrui Fan. Disttack: Graph Adversarial Attacks Toward Distributed GNN Training
  • Yuandou Wang, Neel Kanwal, Kjersti Engan, Chunming Rong, Paola Grosso and Zhiming Zhao. PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds
  • Mohammad Zubair and Christoph Bauinger. ESIMD GPU implementations of Deep Learning Sparse Matrix Kernels
  • L. Felipe Romero, Marcos Lupión Lorente, N. C. Cruz, Luis F. Romero and Pilar M. Ortigosa. On the use of hybrid computing for accelerating EEG preprocessing
  • Xuan Zhang, Zhuoran Song, Fangxin Liu, Zhezhi He, Li Jiang and Xiaoyao Liang. Watt: A Write-optimized RRAM-based Accelerator for Attention
  • Marius Meyer, Tobias Kenter, Kenneth O'Brien, Lucian Petrica, Michaela Blott and Christian Plessl. Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL
  • Roy Nissim, Oded Schwartz and Yuval Spiizer. Communication Minimizing Toom-Cook Algorithms
  • Filippo Ziche, Federico Busato, Rosalba Giugno and Nicola Bombieri. GPU-Accelerated BFS for Dynamic Networks
  • Dazheng Liu, Xiaoli Ren, Jianping Wu, Wenjuan Liu, Juan Zhao and Shaoliang Peng. Pipe-AGCM: A Fine-grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation Model
  • Helena Schubert da Incarnacao Lima da Silva, Maria Clicia Stelling de Castro, Fabricio Alves Barbosa da Silva and Alba Cristina Magalhaes Alves de Melo. A Framework for Automated Parallel Execution of Scientific Multi-Workflow Applications in the Cloud with Work Stealing
  • Olivier Beaumont, Rémi Bouzel, Lionel Eyraud-Dubois, Esragul Korkmaz, Laercio Pilla and Alexandre Van Kempen. 1.25(1+ε)-Approximation Algorithm for Scheduling with Rejection Costs Proportional to Processing Times
  • Krishna Teja Chitty-Venkata, Sanjif Shanmugavelu, Varuni Katti Sastry, Murali Emani, Venkatram Vishwanath and Sylvia Howland. WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators
  • Subhajit Sahu, Kishore Kothapalli, Hemalatha Eedi and Sathya Peri. DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs
  • Alexander Lyashevsky, Peter Caday, Greg Henry and Eric Petit. Deconstructing HPL-MxP benchmark: a numerical perspective
  • Zechun Zhou, Jingwei Sun, Hengquan Mei and Guangzhong Sun. DProbe: Profiling and Predicting Multi-Tenant Deep Learning Workloads for GPU Resource Scaling
  • Hewang Nie, Songfeng Lu, Mu Wang, Jue Xiao, Zhi Lu and Zepu Yi. VeriChroma: Ownership Verification for Federated Models via RGB Filters
  • Andoni Salcedo Navarro, Juan Gutiérrez Aguado, Miguel Garcia Pineda, Raúl Peña Ortiz and José M. Claver. Cloud-native GPU-enabled architecture for parallel video encoding
  • Haoran Dang, Meng Wu, Mingyu Yan, Xiaochun Ye and Dongrui Fan. GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference
  • Bizhao Shi, Tuo Dai, Sunan Zou, Xinming Wei and Guojie Luo. ImageMap: Enabling Efficient Mapping from Image Processing DSL to CGRA
  • Thorsten Wittkopp, Philipp Wiesner and Odej Kao. LogRCA: Log-based Root Cause Analysis for Distributed Services
  • Filip Mikina, Paweł Żuk and Krzysztof Rzadca. sAirflow: Adopting Serverless in a Legacy Workflow Scheduler
  • Jie Jia, Yi Liu, Yifan Chen, Yanke Liu and Fang Lin. AdapCK: Optimizing I/O for Checkpointing on Large-scale High Performance Computing Systems
  • Yuang Chen and Jeffery Xu Yu. Vectorizing Sparse Blocks of Graph Matrices for SpMV
  • Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen and Yue Gao. Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading
  • Chuhui Wang, Zewen Ye, Haibin Shen and Kejie Huang. A Folded Computation-in-Memory Accelerator for Fast Polynomial Multiplication in BIKE
  • Lucas Van Lanker, Hugo Taboada, Elisabeth Brunet and François Trahay. Predicting GPU kernel's performance on upcoming architectures
  • Farah Ait Salaht, Nora Izri and Maher Rebai. Optimizing Service Replication and Placement for IoT Applications in Fog Computing Systems
  • Zhengda Wu, Yixiao Feng, Mingtai Lv, Sining Yang and Bo Zhang. Deadline-driven Enhancements and Response Time Analysis of ROS2 Multi-threaded Executors
  • Milo Lurati, Stijn Heldens, Alessio Sclocco and Ben van Werkhoven. Bringing auto-tuning to HIP: Analysis of tuning impact and difficulty on AMD and Nvidia GPUs
  • Qasim Abbas, Mohsen Koohi Esfahani, Ian Overton and Hans Vandierendonck. QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique
  • Jiazhi Jiang, Hongbin Zhang, Deyin Liu, Jiangsu Du, Xiaojiao Yao, Jinhui Wei, Pin Chen, Dan Huang and Yutong Lu. Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters
  • Héctor Martínez, Francisco D. Igual, Rafael Rodríguez-Sánchez, Sandra Catalan, Adrián Castelló and Enrique S. Quintana-Orti. Inference with Transformer Encoders on ARM and RISC-V Multicore Processors
  • Ivo Gabe de Wolff, Daniel Anderson, Gabriele K. Keller and Aleksei Seletskiy. A Fast Wait-Free Solution to Read-Reclaim Races in Reference Counting
  • Hamidreza Ramezanikebrya and Matei Ripeanu. (re)Assessing PiM Effectiveness for Sequence Alignment
  • Leandro Fiorin and Cristina Silvano. MEPAD: A Memory-efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks
  • Keegan Sanchez, Alex Gavin, Suren Byna, Kesheng Wu and Xuechen Zhang. A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory
  • Xiaokang Fan, Zhen Ge, Sifan Long, Tao Tang, Chun Huang, Lin Peng and Canqun Yang. VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures
  • Mohammad Hafezan, Reza Jahadi and Ehsan Atoofian. PCTC: Hardware and Software Co-Design for Pruned Capsule Networks on Tensor Cores
  • Yibing Lin, Binbin Feng and Zhijun Ding. Context-aware Runtime Type Prediction for Heterogeneous Microservices
  • Jiguang Lv, Shuchun Xu, Xiaodong Zhan, Tao Liu, Dapeng Man and Wu Yang. FedGG: Leveraging Generative Adversarial Networks and Gradient Smoothing for Privacy Protection in Federated Learning
  • Yuhang Li, Tong Liu, Wenfeng Shen, Yangguang Cui and Weijia Lu. Improving Generalization and Personalization in Long-Tailed Federated Learning via Classifier Retraining
  • Weigang Zhang, Biyu Zhou, Xing Wu, Chaochen Gao, Zhibing Liu, Xuehai Tang, Ruixuan Li, Jizhong Han and Songlin Hu. Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models
  • Vincent Fagnon, Giorgio Lucarelli and Christophe Rapine. Makespan Minimization for Scheduling on Heterogeneous Platforms with Precedence Constraints
  • Qian Yang, Xuyan Jiang, Wei Quan, Rulin Liu and Zhigang Sun. Node Bundle Scheduling: An Ultra-Low Latency Traffic Scheduling Algorithm for TAS-based Time-Sensitive Networks
  • Pedro Rigon, Brenda Schussler, Alexandre Sardinha, Pedro Mario Silva, Fábio Alves de Oliveira, Alexandre Carissimi, Jairo Panetta, Arthur Lorenzon and Philippe Navaux. Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil & Gas Simulations in HPC
  • Tiago Carneiro, Engin Kayraklioglu, Guillaume Helbecque and Nouredine Melab. Investigating Portability in Chapel for Tree-based Optimization on GPU-powered Clusters
  • Steef Hegeman, Daan Wöltgens, Anton Wijs and Alfons Laarman. Compact Parallel Hash Tables on the GPU
  • Kåre von Geijer and Philippas Tsigas. How to Relax Instantly: Elastic Relaxation of Concurrent Data Structures
  • Bengisu Elis, David Boehme, Olga Pearce and Martin Schulz. A Mechanism to Generate Interception Based Tools for HPC Libraries
  • Júnior Löff, Dalvan Griebler, Luiz Gustavo Fernandes and Walter Binder. MPR: An MPI Framework for Distributed Self-Adaptive Stream Processing
  • Gabriel Gomez Lopez, Miguel Sánchez de la Rosa, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles and Pierre-Axel Lagadec. Hybrid Congestion Control for BXI-based Interconnection Networks
  • Haibo Tang, Huan Zhang, Zhenyu Zhang, Zhao Zhang, Cheqing Jin and Aoying Zhou. Towards High-Performance Transactions via Hierarchical Blockchain Sharding
  • Sharon Boddu and Maleq Khan. ALZI: An Improved Parallel Algorithm for Finding Connected Components in Large Graphs
  • Kohei Hiraga and Osamu Tatebe. PEANUTS: A Persistent Memory-Based Network Unilateral Transfer System for Enhanced MPI-IO Data Transfer
  • Mengde Zhu, Wanyi Ning, Qi Qi, Jingyu Wang, Zirui Zhuang, Haifeng Sun, Jun Huang and Jianxin Liao. FLUK: Protecting Federated Learning against Malicious Clients for Internet of Vehicles
  • Pranjal Naman and Yogesh Simmhan. Optimizing Federated Learning Over Graph Neural Networks
  • Le Chen, Arijit Bhattacharjee, Nesreen Ahmed, Niranjan Hasabnis, Gal Oren, Vy Vo and Ali Jannesari. OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
  • Sixing Yu, Pablo Munoz and Ali Jannesari. Resource-Aware Heterogeneous Federated Learning with Specialized Local Models
  • Lin Wang, Yuchong Hu, Yuxue Liu, Renzhi Xiao and Dan Feng. Asymmetric Coded Distributed Computation for Resilient Prediction Serving Systems

In cooperation

Organized by

Supported by

Awards support