S. Kounev, K.-D. Lange, J. von Kistowski
For Scientists and Engineers
Second Edition
– from the Foreword by Ian T. Foster, Distinguished Fellow, Argonne National Laboratory.
– from the Foreword by David Patterson, 2017 ACM A.M. Turing Award Laureate.
– from the Foreword by John R. Mashey, SPEC Co-Founder and Former Silicon Graphics VP/Chief Scientist.
This book serves as both a textbook and handbook on the benchmarking of systems and components used as building blocks of modern information and communication technology applications. It provides theoretical and practical foundations as well as an in-depth exploration of modern benchmarks and benchmark development.
The book is divided into two parts: foundations and applications. The first part introduces the foundations of benchmarking as a discipline, covering the three fundamental elements of each benchmarking approach: metrics, workloads, and measurement methodology. The second part focuses on different application areas, presenting contributions in specific fields of benchmark development. These contributions address the unique challenges that arise in the conception and development of benchmarks for specific systems or subsystems, and they demonstrate how the foundations and concepts in the first part of the book are being used in existing benchmarks. Further, the book presents a number of concrete applications and case studies based on input from leading benchmark developers from consortia such as the Standard Performance Evaluation Corporation (SPEC) and the Transaction Processing Performance Council (TPC). Besides a number of updates in almost all chapters, for this new edition three chapters are added in Part II of the book: (1) “Machine Learning and Artificial Intelligence” to cater the growing need to evaluate and benchmark ML and AI systems, (2) “Scalability of Networks and Systems” focusing on novel metrics and techniques to evaluate scalability, and (3) "PC, Workstation, Graphics, and Network Benchmarks“ covering popular benchmarks like SYSmark, PCMark, Phoronix Test Suite, 3DMark, the Blender benchmark, and end-to-end network performance tools.
Providing both practical and theoretical foundations, as well as a detailed discussion of modern benchmarks and their development, the book is intended as a handbook for professionals and researchers working in areas related to benchmarking. It offers an up-to-date point of reference for existing work as well as latest results, research challenges, and future research directions. It also can be used as a textbook for graduate and postgraduate students studying any of the many subjects related to benchmarking. While readers are assumed to be familiar with the principles and practices of computer science, as well as software and systems engineering, no specific expertise in any subfield of these disciplines is required.
Samuel Kounev is a Professor of Computer Science and Chair of Software Engineering at the University of Würzburg (Germany). He has been actively involved in the Standard Performance Evaluation Corporation (SPEC), the largest standardization consortium in the area of computer systems benchmarking, since 2002. He serves as the elected chair of the SPEC Research Group, which he initiated in 2010 with the goal of providing a platform for collaborative research efforts between academia and industry in the area of quantitative system evaluation. Samuel is also co-founder of several conferences in the field, including the ACM/SPEC International Conference on Performance Engineering (ICPE) and the IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS), for which he serves on the steering committees. He has published extensively in the area of systems benchmarking, modeling, and evaluation of performance, energy efficiency, reliability, and security.
Klaus-Dieter Lange is a Distinguished Technologist at Hewlett Packard Enterprise (HPE), where he started his professional career in 1998. His focus is on performance and workload optimization, industry-standard benchmark development, server efficiency, and the design of sustainable secure enterprise solutions. He serves on the SPEC Board of Directors and has been on the ICPE Steering Committee since its inception. Klaus is the founding chair of the SPECpower Committee, which, under his technical leadership, develops and maintains the SPECpower_ssj2008 benchmark, the SPEC PTDaemon Interface, the Chauffeur Worklet Development Kit, and the Server Efficiency Rating Tool (SERT) suite, among others. In 2020, Klaus drove the establishment of the SPEC International Standards Group (ISG) for which he has been serving as chair since then.
Jóakim von Kistowski is a Professor of Software Design at Aschaffenburg University of Applied Sciences (Germany). He focuses on modern software architectures, cloud-native software development, DevOps, and Green IT. In his prior role as senior software architect, Jóakim contributed to the adoption of new software performance, load testing, and benchmarking methods in industry. Jóakim has a strong SPEC background, having actively contributed to the SPECpower Committee and served as former chair of the SPEC RG Power Working Group.
The field of computer systems benchmarking is foundational to the advancement of technology. Benchmarks not only serve as vital tools for evaluating the performance, reliability, and efficiency of systems but also enable fair and meaningful comparisons that drive innovation and informed decision-making across academia, industry, and beyond. This indispensable discipline continues to evolve, motivated by the growing complexities of modern computing systems—frequently deployed in vast cloud infrastructures—and the surging demands of applications like artificial intelligence.
In this context, Systems Benchmarking—For Scientists and Engineers emerges as a comprehensive and authoritative resource that bridges the gap between theoretical foundations and practical applications. With its meticulous coverage of metrics, methodologies, and case studies, the book stands as a testament to the authors’ exceptional expertise and dedication to the field.
Samuel Kounev, Klaus-Dieter Lange, and Jóakim von Kistowski are luminaries in the domain of benchmarking. Their collective contributions have shaped how benchmarks are conceived, developed, and utilized. From Professor Kounev’s pioneering leadership in establishing collaborative platforms like the SPEC Research Group to Mr. Lange’s groundbreaking work on energy-efficient server benchmarking and Professor von Kistowski’s innovations in software and Green IT benchmarking, the authors embody a rare combination of academic rigor and industry impact. Their leadership in consortia such as the Standard Performance Evaluation Corporation (SPEC), and their involvement in creating benchmarks like SPECpower and SERT, have set the gold standard for benchmarking practices globally.
This second edition of the book is particularly timely. The addition of chapters on machine learning and AI benchmarking, scalability of systems and networks, and popular consumer-oriented benchmarks reflects the rapid advancements in technology and the diverse needs of the benchmarking community. Moreover, the expanded discussion on reproducibility aligns with the critical importance of FAIR principles in modern research, ensuring benchmarks are not only rigorous but also replicable and trustworthy.
What truly sets this book apart is its dual focus. For professionals, it offers a definitive guide to the state-of-the-art in benchmarking practices, complete with actionable insights and case studies. For educators and students, it serves as an invaluable textbook, seamlessly integrating foundational concepts with practical examples to nurture the next generation of benchmarking experts. Having spent years designing computer systems and teaching their use, I found myself reflecting on how this book could have helped me make better decisions and achieve more impactful results. Fortunately, you, the reader, have the opportunity to benefit from its insights and avoid repeating the mistakes of the past. This book equips you with the tools and knowledge to navigate the challenges of benchmarking, ensuring that future endeavors are informed, precise, and impactful.
I commend the authors for their dedication to advancing the field and for producing a work of such depth, clarity, and utility. Systems Benchmarking—For Scientists and Engineers is an essential resource for anyone engaged in the design, evaluation, or research of computer systems. It is a book that will inspire, educate, and empower its readers to set new benchmarks in their own work.
Chicago, IL, USA
January 2025
Ian T. Foster
Distinguished Fellow, Argonne National Laboratory
In January of 2010, I met Sam and Klaus at the inaugural International Conference on Performance Engineering (ICPE), in San Jose, USA. I gave the keynote address "Software Knows Best: Portable Parallelism Requires Standardized Measurements of Transparent Hardware" to an audience where half was from the industry and half from academia. That was by design, since in their roles as co-founders and steering committee members of ICPE, they drove to establish this forum for sharing ideas and experiences between industry and academia. Thus, I was not surprised to see that their book "Systems Benchmarking—For Scientists and Engineers" has the same underlying tone: to foster the integration of theory and practice in the field of systems benchmarking. Their work is twofold: Part I can be used as a textbook for graduate students as it introduces the foundations of benchmarking. It covers:
Part II features a number of concrete applications and case studies based on input from leading benchmark developers from consortia such as the Standard Performance Evaluation Corporation (SPEC) or the Transaction Processing Performance Council (TPC). It describes a broad range of state-of-the-art benchmarks, their development, and their effective use in engineering and research. In addition to covering classical performance benchmarks—including CPU, energy efficiency, virtualization, and storage benchmarks—the book looks at benchmarks and measurement methodologies for evaluating elasticity, performance isolation, and security aspects. Moreover, some further topics related to benchmarking are covered in detail, such as resource demand estimation.
The authors also ventured to share some insightful retrospectives in regard to benchmark development in industry-standard bodies, as they have been active in SPEC for many years. The information about the formation and growth of SPEC and TPC over the last 30 years is valuable when starting new leading initiatives like Embench or MLPerf.
One of my observations is that benchmarks shape a field, for better or for worse. Good benchmarks are in alignment with real applications, but bad benchmarks are not, forcing engineers to choose between making changes that help end users or making changes that only help with marketing.
This book should be required reading for anyone interested in making good benchmarks.
Berkeley, CA, USA
January 2020
David Patterson
2017 ACM A.M. Turing Award Laureate
I am delighted to write a foreword for this thorough, comprehensive book on theory and practice of benchmarking. I will keep it short, so people can quickly start on the substantial text itself.
Creating good benchmarks is harder than most imagine. Many have been found to have subtle flaws or have become obsolete. In addition, benchmark audiences differ in their goals and needs. Computer system designers use benchmarks to compare potential design choices, so they need benchmarks small enough to simulate before creating hardware. Software engineers need larger examples to help design software and tune its performance. Vendors want realistic benchmarks that deter gimmicks by competitors. They dislike wasting time on those they know to be unrepresentative. Buyers might like to run their own complete workloads, but that is often impractical. They certainly want widely reported, realistic benchmarks they trust that correlate with their own workloads. Researchers like good, relevant examples they can analyze and use in textbooks.
In the 1980s, benchmarks were still often confusing and chaotic, driven by poor examples and much hype. Vendors boasted of poorly defined MIPS, MFLOPS, or transactions, and universities often studied tiny benchmarks. Luckily, the last few decades have seen huge progress, some contributed by the authors themselves. From personal experience, the close interaction of academia and industry has long been very fruitful. The three authors have extensive experience combining academic research, industrial practice, and the nontrivial methods to create good industry-standard benchmarks on which competitors can agree.
I am especially impressed by the pervasive balance of treatments in this book. It aims to serve as both a handbook for practitioners and a textbook for students. It certainly is the former and if I were still teaching college, I would use it as a text.
It starts with the basics of benchmarks and their taxonomies, then covers the theoretical foundations of benchmarking: statistics, measurements, experimental design, and queueing theory. That is very important, from experience giving guest lectures, where I have often found that many computer science students had not studied the relevant statistical methods, even at very good schools. The theory is properly complemented with numerous case studies.
The book explores the current state of the art in benchmark developments, but as important, provides crucial context by examining decades of benchmark evolution, failures and successes. It recounts histories of changes from scattered benchmarks to the more disciplined efforts of industry–academic consortia, such as the Transaction Processing Performance Council (TPC) and especially the Standard Performance Evaluation Corporation (SPEC), both started in late 1988. Much was learned not just about benchmarking technology and good reporting, but in effective ways to organize such groups. Both organizations are still quite active, three decades later, an eternity in computing. Chapter 10’s history of the SPEC CPU benchmarks’ evolution is especially instructive.
From history and long-established benchmarks, the book then moves to modern topics—energy efficiency, virtualization, storage, web, cloud elasticity, performance isolation in complex data centers, resource demand estimation, and research in software and system security. Some of these topics were barely imaginable for benchmarking when we started SPEC in 1988 just to create reasonable CPU benchmarks!
This is a fine book by experts. It offers many good lessons and is well worth the time to study.
Portola Valley, CA, USA
January 2020
John R. Mashey
SPEC Co-Founder and
Former Silicon Valley Graphics VP/Chief Scientist
Teaching materials (lecture slides, exercises, code examples in R) are available on request.
Please contact Samuel Kounev (samuel.kounev∂uni-wuerzburg.de) if you are interested.
The supplementary materials will be further extended and refined - if you are interested in being informed when updated materials are available, you can sign up for our mailing list.
@book{KoLaKi-2025-SystemsBenchmarking,
author = {Samuel Kounev and Klaus-Dieter Lange and Jóakim von Kistowski},
title = {{Systems Benchmarking}},
subtitle = {{For Scientists and Engineers}},
publisher = {Springer International Publishing},
year = {2025},
edition = {2},
isbn = {978-3-031-85633-4},
doi = {10.1007/978-3-031-85634-1},
}