Yunkun Liao, Jingya Wu, Wenyan Lu, Xiaowei Li and Guihai Yan. Efficient RNIC Cache Side-channel Attack Detection through DPU-driven Architecture
Dian-Lun Lin, Tsung-Wei Huang, Joshua San Miguel and Umit Ogras. TaroRTL: Accelerating RTL Simulation using Coroutine-based Heterogeneous Task Graph Scheduling
Yunkun Liao, Hanyue Lin, Jingya Wu, Wenyan Lu, Huawei Li, Xiaowei Li and Guihai Yan. Athena: Add More Intelligence to RMT-based Network Data Plane with Low-bit Quantization
Alexis Bandet, Francieli Boito and Guillaume Pallez. Scheduling distributed I/O resources in HPC systems
Danilo Carastan-Santos, Georges Da Costa, Millian Poquet, Patricia Stolf and Denis Trystram. Light-weight prediction for improving energy consumption in HPC platforms
Jonas Hahnfeld, Jakob Blomer and Thorsten Kollegger. Parallel Writing of Nested Data in Columnar Formats
Zhi Lu, Songfeng Lu, Yongquan Cui, Junjun Wu, Hewang Nie, Jue Xiao and Zepu Yi. Lightweight Byzantine-Robust and Privacy-Preserving Federated Learning
Guangyao Zhou, Haocheng Lan, Yuanlun Xie, Wenhong Tian, Jiahong Qian and Teng Su. CSIMD: Cross-Search Algorithm with Improved Multi-Dimensional Dichotomy for Micro-batch-based Pipeline Parallel Training in DNN
Thiago Maltempi, Sandro Rigo, Marcio Pereira, Hervé Yviquel, Jessé Costa and Guido Araujo. Combining Compression and Prefetching to Improve Checkpointing for Inverse Seismic Problems in GPUs
Eunji Lee, Yoonsang Han and Gordon Moon. Accelerated Block-Sparsity-Aware Matrix Reordering for Leveraging Tensor Cores in Sparse Matrix-Multivector Multiplication
Xingbin Wang, Yan Wang, Rui Hou and Dan Meng. FakeGuard: A Novel Accelerator Architecture for Deepfake Detection Networks
Stef Graillat, Fabienne Jézéquel, Théo Mary, Roméo Molina and Daichi Mukunoki. Reduced-Precision and Reduced-Exponent Formats for Adaptive-Precision Sparse Matrix-Vector Product
Yiming Yao, Yingwei Luo, Xiaolin Wang, Zhenlin Wang, Liujia Li, Jianyu Wu and Liren Zhu. EKRM: Efficient Key-Value Retrieval Method to Reduce Data Lookup Overhead for Redis
Stepan Nassyr and Dirk Pleiter. Exploring processor micro-architectures optimised for BLAS3 micro-kernels
Tingkai Liu, Huili Tao, Yicheng Lu, Zhongbo Zhu, Marquita Ellis, Sara Kokkila-Schumacher and Volodymyr Kindratenko. Automated Data Management and Learning-based Scheduling for Ray-based Hybrid HPC-Cloud Systems
Guofeng Feng, Hongyu Wang, Zhuoqiang Guo, Mingzhen Li, Tong Zhao, Zhou Jin, Weile Jia, Guangming Tan and Ninghui Sun. Accelerating Large-scale Sparse LU Factorization for RF Circuit Simulation
M.A. Anju and Rupesh Nasre.. FlexiGran: Flexible Granularity Locking in Hierarchies
Matthieu Robeyns, Marc Baboulin, Simplice Donfack, Oguz Kaya and Theo Mary. Mixed precision randomized low-rank approximation with GPU tensor cores
Dario Muñoz-Muñoz, Félix García-Carballeira, Diego Camarmas-Alonso, Alejandro Calderón-Mateos and Jesús Carretero. Fault tolerant in the Expand Ad-Hoc parallel file system
Hongbing Tan, Xiaowei He, Libo Huang, Guichu Sun, Yuanhu Cheng, Jing Zhang, Zhong Zheng, Quan Deng, Bingcai Sui, Yongwen Wang and Liquan Xiao. ImSPU: Implicit Sharing of Computation Resources between Vector and Scalar Processing Units
Andrzej Lingas. Boolean Matrix Multiplication for Highly Clustered Data on the Congested Clique
Dengke Han, Meng Wu, Runzhen Xue, Mingyu Yan, Xiaochun Ye and Dongrui Fan. ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation
Jiajun Song, Jiajun Luo, Rongwei Lu, Shuzhao Xie, Bin Chen and Zhi Wang. A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning
Xianlong Zhou, Pei Li, Jiageng Chen and Shixiong Yao. Accelerating Stencil Computation with Fully Homomorphic Encryption Using GPU
Cristian Tatu, Javier Conejero, Fernando Vazquez and Rosa M. Badia. GPU Cache System for COMPSs: A Task-Based Distributed Computing Framework
Suren Harutyunyan Gevorgyan, Anna Sikora, Eduardo Cesar, Jiří Filipovič, Akash Dutta, Ali Janessari and Jordi Alcaraz. Efficient Code Region Characterization through Automatic Performance Counters Reduction using Machine Learning Techniques
Richard Angersbach, Sebastian Kuckuk and Harald Köstler. Code Generation for Octree-Based Multigrid Solvers with Fused Higher-Order Interpolation and Communication
Louis-Claude Canon, Anthony Dugois and Loris Marchal. Solving the Restricted Assignment Problem to Schedule Multi-Get Requests in Key-Value Stores
Zhuoyao Huang, Nan Zhang, Jingran Shen, Georgios Diamantopoulos, Zhengchang Hua, Nikos Tziritas and Georgios Theodoropoulos. Distributed Simulation for Digital Twins of Large-Scale Real-World DiffServ-Based Networks
Yuxiang Zhang, Xin Liu, Meng Wu, Mingyu Yan, Wei Yan, Xiaochun Ye and Dongrui Fan. Disttack: Graph Adversarial Attacks Toward Distributed GNN Training
Yuandou Wang, Neel Kanwal, Kjersti Engan, Chunming Rong, Paola Grosso and Zhiming Zhao. PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds
Mohammad Zubair and Christoph Bauinger. ESIMD GPU implementations of Deep Learning Sparse Matrix Kernels
L. Felipe Romero, Marcos Lupión Lorente, N. C. Cruz, Luis F. Romero and Pilar M. Ortigosa. On the use of hybrid computing for accelerating EEG preprocessing
Xuan Zhang, Zhuoran Song, Fangxin Liu, Zhezhi He, Li Jiang and Xiaoyao Liang. Watt: A Write-optimized RRAM-based Accelerator for Attention
Marius Meyer, Tobias Kenter, Kenneth O'Brien, Lucian Petrica, Michaela Blott and Christian Plessl. Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCL
Roy Nissim, Oded Schwartz and Yuval Spiizer. Communication Minimizing Toom-Cook Algorithms
Filippo Ziche, Federico Busato, Rosalba Giugno and Nicola Bombieri. GPU-Accelerated BFS for Dynamic Networks
Dazheng Liu, Xiaoli Ren, Jianping Wu, Wenjuan Liu, Juan Zhao and Shaoliang Peng. Pipe-AGCM: A Fine-grain Pipelining Scheme for Optimizing the Parallel Atmospheric General Circulation Model
Helena Schubert da Incarnacao Lima da Silva, Maria Clicia Stelling de Castro, Fabricio Alves Barbosa da Silva and Alba Cristina Magalhaes Alves de Melo. A Framework for Automated Parallel Execution of Scientific Multi-Workflow Applications in the Cloud with Work Stealing
Olivier Beaumont, Rémi Bouzel, Lionel Eyraud-Dubois, Esragul Korkmaz, Laercio Pilla and Alexandre Van Kempen. 1.25(1+ε)-Approximation Algorithm for Scheduling with Rejection Costs Proportional to Processing Times
Krishna Teja Chitty-Venkata, Sanjif Shanmugavelu, Varuni Katti Sastry, Murali Emani, Venkatram Vishwanath and Sylvia Howland. WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators
Subhajit Sahu, Kishore Kothapalli, Hemalatha Eedi and Sathya Peri. DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs
Alexander Lyashevsky, Peter Caday, Greg Henry and Eric Petit. Deconstructing HPL-MxP benchmark: a numerical perspective
Zechun Zhou, Jingwei Sun, Hengquan Mei and Guangzhong Sun. DProbe: Profiling and Predicting Multi-Tenant Deep Learning Workloads for GPU Resource Scaling
Hewang Nie, Songfeng Lu, Mu Wang, Jue Xiao, Zhi Lu and Zepu Yi. VeriChroma: Ownership Verification for Federated Models via RGB Filters
Andoni Salcedo Navarro, Juan Gutiérrez Aguado, Miguel Garcia Pineda, Raúl Peña Ortiz and José M. Claver. Cloud-native GPU-enabled architecture for parallel video encoding
Haoran Dang, Meng Wu, Mingyu Yan, Xiaochun Ye and Dongrui Fan. GDL-GNN: Applying GPU Dataloading of Large Datasets for Graph Neural Network Inference
Bizhao Shi, Tuo Dai, Sunan Zou, Xinming Wei and Guojie Luo. ImageMap: Enabling Efficient Mapping from Image Processing DSL to CGRA
Thorsten Wittkopp, Philipp Wiesner and Odej Kao. LogRCA: Log-based Root Cause Analysis for Distributed Services
Filip Mikina, Paweł Żuk and Krzysztof Rzadca. sAirflow: Adopting Serverless in a Legacy Workflow Scheduler
Jie Jia, Yi Liu, Yifan Chen, Yanke Liu and Fang Lin. AdapCK: Optimizing I/O for Checkpointing on Large-scale High Performance Computing Systems
Yuang Chen and Jeffery Xu Yu. Vectorizing Sparse Blocks of Graph Matrices for SpMV
Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen and Yue Gao. Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading
Chuhui Wang, Zewen Ye, Haibin Shen and Kejie Huang. A Folded Computation-in-Memory Accelerator for Fast Polynomial Multiplication in BIKE
Lucas Van Lanker, Hugo Taboada, Elisabeth Brunet and François Trahay. Predicting GPU kernel's performance on upcoming architectures
Farah Ait Salaht, Nora Izri and Maher Rebai. Optimizing Service Replication and Placement for IoT Applications in Fog Computing Systems
Zhengda Wu, Yixiao Feng, Mingtai Lv, Sining Yang and Bo Zhang. Deadline-driven Enhancements and Response Time Analysis of ROS2 Multi-threaded Executors
Milo Lurati, Stijn Heldens, Alessio Sclocco and Ben van Werkhoven. Bringing auto-tuning to HIP: Analysis of tuning impact and difficulty on AMD and Nvidia GPUs
Qasim Abbas, Mohsen Koohi Esfahani, Ian Overton and Hans Vandierendonck. QClique: Optimizing Performance and Accuracy in Maximum Weighted Clique
Jiazhi Jiang, Hongbin Zhang, Deyin Liu, Jiangsu Du, Xiaojiao Yao, Jinhui Wei, Pin Chen, Dan Huang and Yutong Lu. Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters
Héctor Martínez, Francisco D. Igual, Rafael Rodríguez-Sánchez, Sandra Catalan, Adrián Castelló and Enrique S. Quintana-Orti. Inference with Transformer Encoders on ARM and RISC-V Multicore Processors
Ivo Gabe de Wolff, Daniel Anderson, Gabriele K. Keller and Aleksei Seletskiy. A Fast Wait-Free Solution to Read-Reclaim Races in Reference Counting
Hamidreza Ramezanikebrya and Matei Ripeanu. (re)Assessing PiM Effectiveness for Sequence Alignment
Leandro Fiorin and Cristina Silvano. MEPAD: A Memory-efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks
Keegan Sanchez, Alex Gavin, Suren Byna, Kesheng Wu and Xuechen Zhang. A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory
Xiaokang Fan, Zhen Ge, Sifan Long, Tao Tang, Chun Huang, Lin Peng and Canqun Yang. VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures
Mohammad Hafezan, Reza Jahadi and Ehsan Atoofian. PCTC: Hardware and Software Co-Design for Pruned Capsule Networks on Tensor Cores
Yibing Lin, Binbin Feng and Zhijun Ding. Context-aware Runtime Type Prediction for Heterogeneous Microservices
Jiguang Lv, Shuchun Xu, Xiaodong Zhan, Tao Liu, Dapeng Man and Wu Yang. FedGG: Leveraging Generative Adversarial Networks and Gradient Smoothing for Privacy Protection in Federated Learning
Yuhang Li, Tong Liu, Wenfeng Shen, Yangguang Cui and Weijia Lu. Improving Generalization and Personalization in Long-Tailed Federated Learning via Classifier Retraining
Weigang Zhang, Biyu Zhou, Xing Wu, Chaochen Gao, Zhibing Liu, Xuehai Tang, Ruixuan Li, Jizhong Han and Songlin Hu. Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models
Vincent Fagnon, Giorgio Lucarelli and Christophe Rapine. Makespan Minimization for Scheduling on Heterogeneous Platforms with Precedence Constraints
Qian Yang, Xuyan Jiang, Wei Quan, Rulin Liu and Zhigang Sun. Node Bundle Scheduling: An Ultra-Low Latency Traffic Scheduling Algorithm for TAS-based Time-Sensitive Networks
Pedro Rigon, Brenda Schussler, Alexandre Sardinha, Pedro Mario Silva, Fábio Alves de Oliveira, Alexandre Carissimi, Jairo Panetta, Arthur Lorenzon and Philippe Navaux. Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil & Gas Simulations in HPC
Tiago Carneiro, Engin Kayraklioglu, Guillaume Helbecque and Nouredine Melab. Investigating Portability in Chapel for Tree-based Optimization on GPU-powered Clusters
Steef Hegeman, Daan Wöltgens, Anton Wijs and Alfons Laarman. Compact Parallel Hash Tables on the GPU
Kåre von Geijer and Philippas Tsigas. How to Relax Instantly: Elastic Relaxation of Concurrent Data Structures
Bengisu Elis, David Boehme, Olga Pearce and Martin Schulz. A Mechanism to Generate Interception Based Tools for HPC Libraries
Júnior Löff, Dalvan Griebler, Luiz Gustavo Fernandes and Walter Binder. MPR: An MPI Framework for Distributed Self-Adaptive Stream Processing
Gabriel Gomez Lopez, Miguel Sánchez de la Rosa, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles and Pierre-Axel Lagadec. Hybrid Congestion Control for BXI-based Interconnection Networks
Haibo Tang, Huan Zhang, Zhenyu Zhang, Zhao Zhang, Cheqing Jin and Aoying Zhou. Towards High-Performance Transactions via Hierarchical Blockchain Sharding
Sharon Boddu and Maleq Khan. ALZI: An Improved Parallel Algorithm for Finding Connected Components in Large Graphs
Kohei Hiraga and Osamu Tatebe. PEANUTS: A Persistent Memory-Based Network Unilateral Transfer System for Enhanced MPI-IO Data Transfer
Mengde Zhu, Wanyi Ning, Qi Qi, Jingyu Wang, Zirui Zhuang, Haifeng Sun, Jun Huang and Jianxin Liao. FLUK: Protecting Federated Learning against Malicious Clients for Internet of Vehicles
Pranjal Naman and Yogesh Simmhan. Optimizing Federated Learning Over Graph Neural Networks
Le Chen, Arijit Bhattacharjee, Nesreen Ahmed, Niranjan Hasabnis, Gal Oren, Vy Vo and Ali Jannesari. OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
Sixing Yu, Pablo Munoz and Ali Jannesari. Resource-Aware Heterogeneous Federated Learning with Specialized Local Models
Lin Wang, Yuchong Hu, Yuxue Liu, Renzhi Xiao and Dan Feng. Asymmetric Coded Distributed Computation for Resilient Prediction Serving Systems