Preface
Part I: Preliminaries
1.
Preliminaries
1.1.
How to Use This Book
1.2.
Who This Book Is For
1.3.
What Makes OCaml Fast
1.4.
The Observability Stack
2.
Philosophy and Methodology
2.1.
The USE Method for OCaml
2.2.
The Performance Triage Process
2.3.
Setting Up Reproducible Benchmarks
2.4.
The Performance Checklist
3.
Build Configuration for Performance
3.1.
Compiler Optimization Levels
3.2.
Flambda: The Optimizing Backend
3.3.
Native Code vs Bytecode
Part II: Measurement, Profiling, and Observation
4.
Profiling and Observability Tools Overview
4.1.
The Observability Landscape
4.2.
Tool Categories for OCaml
4.3.
Choosing the Right Tool
4.4.
Profiling Ecosystem
5.
CPU Profiling
5.1.
Generating Flamegraphs
5.1.1.
Flamegraphs on Linux
5.1.2.
Flamegraphs on macOS
5.2.
Using Instruments on macOS
5.3.
Time-based Profiling with Landmarks
6.
Dynamic Tracing and USDT Probes
6.1.
OCaml Runtime USDT Probes
6.2.
Application-Level USDT
6.3.
Tracing Workflows
6.4.
Building Custom Analysis Tools
7.
Memory Profiling
7.1.
Statistical Memory Profiling with statmemprof
7.1.1.
statmemprof in OCaml 5.x
7.2.
Using memtrace and memtrace-viewer
7.3.
Other Memory Tools
7.4.
GC Statistics and Tuning
7.5.
Combining Memory Profiling with USDT
8.
Debugging for Performance
8.1.
GDB on Linux Debugging Session
8.2.
LLDB on macOS Debugging Session
8.3.
Bytecode Debugging
8.3.1.
Debugging OCaml with Emacs DAP
8.4.
Inspecting Compiler Output
9.
Benchmarking
9.1.
Macro-benchmarking Strategies
9.2.
Continuous Benchmarking in CI
Part III: Optimizations
10.
Compiler-Level Optimizations
10.1.
Inlining
10.2.
Unboxing and Specialization
10.3.
Tail Call Optimization
11.
Data Structure Choices
11.1.
Hash Tables and Maps
11.2.
Strings and Buffers
11.3.
Records and Tuples
11.4.
Custom Data Structures
12.
Memory and Allocation
12.1.
Boxing and Unboxing
12.2.
GC-Friendly Programming
12.3.
Memory Layout Optimization
13.
Function Optimization
13.1.
Partial Application
13.2.
Higher-Order Functions
13.3.
Exception Handling
14.
I/O Performance
14.1.
Network I/O
14.2.
Serialization
Part IV: Parallelism and Concurrency (OCaml 5)
15.
Understanding OCaml 5's Parallelism Model
15.1.
Domains: True Parallelism
15.2.
The Multicore GC
15.3.
Effects for Concurrency
15.4.
Observing Multicore Behavior
16.
Parallel Programming with Domainslib
16.1.
Async/Await Patterns
16.2.
Channels and Communication
16.3.
Common Pitfalls
17.
Concurrent I/O with Eio
17.1.
Fiber Scheduling
17.2.
Combining Eio and Domainslib
17.3.
Performance Tuning
18.
Memory and Synchronization
18.1.
Lock-Free Data Structures
18.2.
Avoiding Contention
18.3.
Cache Coherency Costs
19.
Migrating from OCaml 4 to 5
19.1.
Adapting Multi-process Code
19.2.
Common Migration Issues
Part V: Interoperability
20.
C Bindings Performance
20.1.
The Foreign Function Interface
20.2.
Minimizing Crossing Overhead
20.3.
Callback Costs
20.4.
Memory Management Across Boundaries
20.5.
ctypes vs Hand-Written Stubs
21.
JavaScript Compilation
21.1.
Optimization Flags
21.2.
Dead Code Elimination
Part VI: Compile Times
22.
Reducing Compilation Time
22.1.
Understanding What's Slow
22.2.
Module Structure for Fast Builds
22.3.
Flambda Compile Time Trade-offs
22.4.
Parallel Compilation
22.5.
Incremental Builds and Caching
Part VII: Case Studies
23.
Real-World Optimization Examples
23.1.
Case Studies
24.
Common Performance Patterns
24.1.
Continuation-Passing for Tail Calls
24.2.
Memoization Strategies
24.3.
Lazy Evaluation Trade-offs
24.4.
Data-Oriented Design in OCaml
Appendices
25.
Quick Reference: Compiler Flags
26.
Quick Reference: Dune Configuration
27.
Tool Installation Guide
28.
Glossary
29.
Further Reading
Light (default)
Rust
Coal
Navy
Ayu
OCaml Debugging and Performance
The Performance Checklist
Native vs Bytecode Compilation
Optimization Flags (-O2, -O3)
Flambda Enabled?
Debug Symbols and Their Cost
Release Build Configuration in Dune