public inbox for [email protected]
help / color / mirror / Atom feedFrom: Alexandre Felipe <[email protected]>
To: Peter Geoghegan <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Sun, 15 Feb 2026 00:13:39 +0000
Message-ID: <CAE8JnxN_EwnTLLMWGhvgwaomYZ0ysm7NeogA-BqBd=Rs3S7Oqw@mail.gmail.com> (raw)
In-Reply-To: <CAH2-Wznv9_KGqHQ1vCW2pkiA6QskBGcx5NC_-UXnD6GEQasvAQ@mail.gmail.com>
References: <[email protected]>
<5pltwb73d7cynsxo2yb54ygjk7haviatkrx43mnzihc6kkield@ahnstpgof46i>
<CA+hUKGKL3MRvEftAE+kwBuL2PLg2CwUoHEMr=-KSvsWN8pHq9w@mail.gmail.com>
<[email protected]>
<e33gafg4p7iwvo24ytrxuw43nafm5xm3jefpdspnarcbkfurs7@3jbgdiinxem5>
<[email protected]>
<CAH2-Wz=DfvzasnzLv43cu36Q1Ca8Qi70_JjZ7SRbNhDwwgvirg@mail.gmail.com>
<qdl4fojnbfcnm2k7b4zpvgd6gwzwdgtbl5c7shpimrb76dbyy6@scdnspus3ejh>
<bpdeohyqvltb77viyft4bza4xc4peed3jcoep74d2ih6ynqlke@wbnhcwmq3ril>
<CAH2-WznFdjY_OB2S7_BY4iAyeffK+XrE2qsX6aghgP63VocRfQ@mail.gmail.com>
<4zeu5yb73byiquvf3eefsunnrydyqfxy3eup66jrliutrtd4xl@5iifjey4n5m5>
<[email protected]>
<CAH2-WzkAC_gWQgc3MV3MWBx2EUaE4TfPk8XQPkJgs1R_qNvo=A@mail.gmail.com>
<CAH2-WznLRnUZHC4kmR+J3MgGpN0iTUaZ-4xeSHPiaYR=y3C8Og@mail.gmail.com>
<CAH2-WzkHK0++yn3_nmWL2rWzXiu8Qc=2=NPQDG4nfVjvYkRDrQ@mail.gmail.com>
<CAH2-Wzk9=x=a2TbcqYcX+XXmDHQr5=1v9m4Z_v8a-KwF1Zoz0A@mail.gmail.com>
<CAH2-WzmYqhacBH161peAWb5eF=Ja7CFAQ+0jSEMq=qnfLVTOOg@mail.gmail.com>
<CAH2-Wzmm+mXuv_r+eYgcwOKZzNSmbYoEmxi5mdKMUtkmYa_qXw@mail.gmail.com>
<CAH2-WzkC_EdGB_tdEQr63BiOuJbcvZiV=xO+OJYux4wOaxPD8w@mail.gmail.com>
<CAH2-Wzn8whv-RE4E5L2WQB_Ha_jy0UReJq1yypu83gDYHeXq5g@mail.gmail.com>
<CAH2-Wzm7-QuDOs6TcqfhhDsGEZCuHtn=D-SriOTnTZ_fiXNBvA@mail.gmail.com>
<CAH2-WzmH7pVQ0-mYAxb82aWbz29_BiBPq2wV5p7+1o2sRFqDRQ@mail.gmail.com>
<CAH2-Wz=6a7fGz2rALDX+xiFDuEaGQWpZ49xEaBUDKiPH8gcL+Q@mail.gmail.com>
<CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com>
<CAH2-WzmymSyOt5Y2RGbm6cJXg18J_ttfqjdcpodHe6Gp23ConQ@mail.gmail.com>
<CAH2-Wznv9_KGqHQ1vCW2pkiA6QskBGcx5NC_-UXnD6GEQasvAQ@mail.gmail.com>
Hi,
I decided to test this PR.
I didn't take much time to go through the thread or the code in detail yet.
But I have my first benchmark results and I would like to share.
EXPERIMENT
I tested [CF 4351] v10 - Index Prefetching
I created a table with 100k rows and
Sequential is, as guessed, 1,2,3,4 (indexed)
Periodic is a quasi random (i * jump) % num_rows, where gcd(jump, num_rows)
= 1, guarantee that there are no repeated entries (indexed)
Random is a `row_number() over (order by random())` (indexed)
The payload is a fixed 200 character long string, just to make it more
realistic.
For the tests, I disable sorting, sequential scans, index only scans and
bitmap scans.
Since buffer cache always has a significant impact on the query
performance, I shuffled the tests, and tried to adjust for the number of
buffer hit/read, but later I found that the best way to control that was to
use a table small enough to be entirely held in cache, and evict the
buffers.
* off: buffers are kept in cache
* pg: buffers evicted from postgres pg_buffercache_evict from
pg_buffercache extension.
* os: supported only in python, I separated the buffer eviction in
purge_cache as it requires sudo (tested only in MacOS).
I varied
* max_parallel_workers_per_gather (although I guess it wasn't exploited),
* enable_index_prefetch
* the column used as sorting key and, as a result, the index used.
* and buffer eviction mode.
Running from python with psycopg
SUMMARY
I could not see the expected positive impact and when using the python
script and buffers evicted prefetch had a detrimental impact.
APPENDIX
psql run
column_name io_method num_workers evict n off_ms on_ms pct_change
ci_low ci_high
periodic worker 0 off 10 172.4 182.8 6.024162
-3.7 15.7
periodic worker 0 pg 10 222.7 214.8 -3.539448
-13.0 5.9
periodic worker 2 off 10 173.5 172.7 -0.442660
-1.6 0.7
periodic worker 2 pg 10 226.3 213.2 -5.795049
-16.9 5.3
random worker 0 off 10 173.3 174.1 0.476657
-1.0 2.0
random worker 0 pg 10 216.6 218.1 0.704020
-2.8 4.3
random worker 2 off 10 182.7 175.3 -4.031139
-15.6 7.5
random worker 2 pg 10 217.2 213.7 -1.586813
-4.4 1.2
sequential worker 0 off 10 131.1 130.4 -0.568573
-3.4 2.3
sequential worker 0 pg 10 150.7 188.2 24.923924
22.2 27.6
sequential worker 2 off 10 129.7 130.2 0.421814
-1.0 1.9
sequential worker 2 pg 10 151.0 184.1 21.925935
19.5 24.3
psycopg run
column_name io_method num_workers evict n off_ms on_ms pct_change
ci_low ci_high
periodic worker 0 off 10 186.7 193.7 3.701950
-7.3 14.7
periodic worker 0 os 10 207.0 535.2 158.507266
153.7 163.3
periodic worker 0 pg 10 201.2 537.0 166.923995
161.5 172.3
periodic worker 2 off 10 181.2 189.0 4.303359
-0.2 8.8
periodic worker 2 os 10 221.7 548.2 147.322934
131.8 162.9
periodic worker 2 pg 10 203.1 533.5 162.688738
160.4 165.0
random worker 0 off 10 194.6 186.8 -3.986974
-11.5 3.5
random worker 0 os 10 211.4 338.2 59.979252
52.9 67.1
random worker 0 pg 10 212.3 336.8 58.686711
50.2 67.1
random worker 2 off 10 183.3 187.7 2.364585
0.4 4.4
random worker 2 os 10 222.5 341.9 53.659704
42.1 65.2
random worker 2 pg 10 204.1 333.3 63.348746
62.3 64.4
sequential worker 0 off 10 129.9 129.5 -0.260586
-1.9 1.3
sequential worker 0 os 10 150.9 182.4 20.909424
16.6 25.2
sequential worker 0 pg 10 150.3 185.0 23.132583
19.2 27.1
sequential worker 2 off 10 129.9 132.9 2.323179
-4.3 9.0
sequential worker 2 os 10 153.8 189.8 23.402255
13.9 32.9
sequential worker 2 pg 10 151.1 185.9 22.997500
20.1 25.9
Regards,
Alexandre
On Mon, Feb 9, 2026 at 10:45 PM Peter Geoghegan <[email protected]> wrote:
> On Fri, Jan 30, 2026 at 7:18 PM Peter Geoghegan <[email protected]> wrote:
> > Attached is v9.
>
> Attached is v10. There are 2 major areas of improvement in this latest
> revision:
>
> 1. We have enhanced the read stream callback (heapam_getnext_stream,
> which is added by
> v10-0005-Add-prefetching-to-index-scans-using-batch-inter.patch),
> making it yield at key intervals. When we yield, we temporarily
> suspend prefetching -- but only for long enough to give the scan the
> opportunity to return one more matching tuple (don't confuse yielding
> with pausing; we do both, but the goals are rather different in each
> case).
>
> Yielding like this keeps the scan responsive during prefetching: the
> scan should never go too long without returning at least one tuple
> (except when that just isn't possible at all). Testing has shown that
> keeping the scan responsive in this sense is particularly important
> during scans that appear in "ORDER BY ... LIMIT N" queries, as well as
> during scans that feed into certain merge joins. IOW, it is
> particularly important that we "keep the scan responsive" whenever it
> has the potential to allow the scan to shut down before it has
> performed work that turns out to be unnecessary (though it also seems
> to have some benefits even when that isn't the case).
>
> There is a complex trade-off here: we want to yield when we expect
> some benefit from doing so. But we don't want to yield when doing so
> risks compromising the read stream's ability to maintain an adequate
> prefetch distance. There are comments in heapam_getnext_stream that
> describe the theory in more detail. There are heuristics that were
> derived using adversarial benchmarking/stress-testing. I'm sure that
> they need more work, but this does seem like roughly the right idea.
> Note that we now test whether the scan's read stream is using its
> fast-path mode (read stream uses this when the scan reads heap pages
> that are all cached).
>
> 2. A new patch
> (v10-0007-Limit-get_actual_variable_range-to-scan-one-inde.patch)
> compensates for the fact that the main prefetching commit removes
> get_actual_variable_range's VISITED_PAGES_LIMIT mechanism. Since
> VISITED_PAGES_LIMIT cannot easily be ported over to the new table AM
> interface selfuncs.c now uses.
>
> I described the problem that we need to address when I posted v9:
>
> > selfuncs.c problem
> > ------------------
> >
> > Also worth noting that we recently found a problem with selfuncs.c:
> > the VISITED_PAGES_LIMIT stuff in selfuncs.c is broken right now. v9
> > tears that code out, pending adding back a real fix (earlier versions
> > of the patch had VISITED_PAGES_LIMIT, but it didn't work).
> >
> > The underlying problem is that the existing VISITED_PAGES_LIMIT design
> > is incompatible with our new table_index_getnext_slot interface. The
> > new interface doesn't stop scanning until it is able to at least
> > return 1 tuple. But VISITED_PAGES_LIMIT was invented precisely because
> > get_actual_variable_endpoint's index scans took far too long, even
> > though they're only ever required to locate 1 tuple. So that just
> > won't work.
> >
> > We'll need to invent some kind of API that directly acknowledges the
> > needs of the selfuncs.c caller, and others like it. Doing it in an
> > ad-hoc way just doesn't seem possible anymore. That will have to wait
> > for the next revision, though (or the one after that).
>
> The new patch deals with the problem in a completely different way,
> and at a completely different layer: it adds a new
> IndexScanDescData.xs_read_extremal_only field, set only by
> get_actual_variable_range. When nbtree sees that the field has been
> set, it gives up after scanning only one leaf page (the page that
> contains extremal values that are of interest to
> get_actual_variable_range). Note that we completely stop caring about
> heap page fetches with this new approach.
>
> There are good reasons to believe that the new
> IndexScanDescData.xs_read_extremal_only approach will solve existing
> problems with VISITED_PAGES_LIMIT. Recent benchmarking from Mark
> Callaghan [1] (which I've independently recreated with my own minimal
> test suite) shows that VISITED_PAGES_LIMIT becomes completely
> ineffective, once we reach a tipping point where many index tuples at
> one end of the index all have their LP_DEAD bit set.
>
> I'm going to start a new thread to discuss the issues in this area
> later today. I'm aiming to fix an existing, independent issue in this
> new patch, so it makes sense to discuss it on a completely separate
> thread.
>
> [1]
> https://smalldatum.blogspot.com/2026/01/cpu-bound-insert-benchmark-vs-postgres.html
> --
> Peter Geoghegan
>
Attachments:
[image/png] psycopg.png (114.5K, 3-psycopg.png)
download | view image
[image/png] psql.png (82.7K, 4-psql.png)
download | view image
[text/x-python-script] run_analysis.py (6.3K, 5-run_analysis.py)
download | inline:
#!/usr/bin/env python3
"""
Prefetch regression analysis - compute effects and generate forest plot.
Usage: python run_analysis.py [options]
"""
import argparse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import psycopg
from scipy import stats
from matplotlib.lines import Line2D
def parse_args():
p = argparse.ArgumentParser(description='Analyze prefetch regression test results')
p.add_argument('--prefix', '-o', type=str, default='',
help='Output file prefix')
p.add_argument('--dbname', '--db', '-d', type=str, default='postgres',
help='Database name (default: postgres)')
p.add_argument('--host', '-H', type=str, default='/tmp',
help='Database host (default: /tmp)')
p.add_argument('--port', '-p', type=int, default=None,
help='Database port')
return p.parse_args()
def load_timing_data(conn):
"""Load raw results from database."""
return pd.read_sql('''
SELECT column_name, io_method, num_workers, prefetch_enabled,
evict_mode, iteration, execution_time_ms
FROM prefetch_test_results
''', conn)
def compute_ci(group):
"""Compute effect with CI for a single configuration group."""
off = group[~group['prefetch_enabled']]['execution_time_ms']
on = group[group['prefetch_enabled']]['execution_time_ms']
assert len(off) == len(on), "Off and on should have the same length"
if len(off) < 2 or len(on) < 2:
return None
result = stats.ttest_ind(on, off, equal_var=False)
ci = result.confidence_interval(0.95)
return pd.Series({
'n': min(len(off), len(on)),
'off_ms': off.mean(),
'on_ms': on.mean(),
'pct_change': (on.mean() / off.mean() - 1) * 100,
'ci_low': ci.low / off.mean() * 100,
'ci_high': ci.high / off.mean() * 100,
'p_value': result.pvalue,
'significant': result.pvalue < 0.05
})
def compute_effects_with_ci(df):
"""Compute prefetch effect with confidence intervals using Welch's t-test."""
group_keys = ['num_workers', 'column_name', 'io_method', 'evict_mode']
results = df.groupby(group_keys).apply(compute_ci).dropna().reset_index()
results = results.rename(columns={'evict_mode': 'evict'})
return results.sort_values(['column_name', 'io_method', 'num_workers', 'evict'])
def plot_forest(df, prefix=''):
"""Generate forest plot with table labels as yticks."""
from matplotlib.lines import Line2D
df = df.sort_values(['evict', 'column_name', 'num_workers', 'n']).reset_index(drop=True)
n = len(df)
if n == 0:
print("No data to plot!")
return None
# Format table as aligned text for ytick labels
table = df[['num_workers', 'column_name','evict']].copy()
table.columns = ['num_workers', 'column_name', 'eviction']
table_str = table.to_string(index=False, header=True)
lines = table_str.split('\n')
header = lines[0]
labels = lines[1:] + [header] # header at top (highest y)
fig, ax = plt.subplots(figsize=(10, max(4, (n + 1) * 0.35)))
# Forest plot
for i, row in df.iterrows():
color = 'gray'
if row['significant']:
color = 'red' if row['pct_change'] > 0 else 'green'
ax.errorbar(
row['pct_change'], i,
xerr=[[row['pct_change'] - row['ci_low']], [row['ci_high'] - row['pct_change']]],
fmt='s', color=color, markersize=8, capsize=3, linewidth=2
)
ax.annotate(f"{row['pct_change']:.1f}%",
(row['ci_high'] + 2, i), fontsize=8, va='center')
ax.axvline(0, color='black', linewidth=1)
ax.set_yticks(range(n + 1))
ax.set_yticklabels(labels, fontfamily='monospace', fontsize=9)
ax.set_xlabel('% Change in Execution Time (95% CI)')
ax.set_title('Prefetch Effect (Positive = SLOWER)')
ax.grid(axis='x', alpha=0.3)
# Legend
legend = [
Line2D([0], [0], marker='s', color='w', markerfacecolor='green', markersize=10, label='Faster'),
Line2D([0], [0], marker='s', color='w', markerfacecolor='red', markersize=10, label='Slower'),
Line2D([0], [0], marker='s', color='w', markerfacecolor='gray', markersize=10, label='Not significant'),
]
ax.legend(handles=legend, loc='lower right')
plt.tight_layout()
plot_path = f'{prefix}prefetch_forest.png'
plt.savefig(plot_path, dpi=150, bbox_inches='tight')
plt.close()
print(f"Saved: {plot_path}")
return plot_path
def analyze_factors(results):
"""Analyze the effect of each factor on prefetch performance."""
print("\n" + "=" * 50)
print("AVERAGE EFFECT BY FACTOR")
print("=" * 50)
factors = ['evict_mode', 'column_name', 'num_workers', 'prefetch_enabled']
rows = []
for factor in factors:
for state, mean in results.groupby(factor)['execution_time_ms'].mean().sort_index().items():
rows.append({'factor': factor, 'state': str(state), 'mean_ms': round(mean, 1)})
df = pd.DataFrame(rows).set_index(['factor', 'state'])
print(df.to_string())
def main():
args = parse_args()
prefix = args.prefix + "_" if args.prefix else ""
# Connect
connstr = f"dbname={args.dbname} host={args.host}"
if args.port:
connstr += f" port={args.port}"
conn = psycopg.connect(connstr)
# Load data
print("Loading data...")
df = load_timing_data(conn)
conn.close()
print(f"Loaded {len(df)} rows\n")
if len(df) == 0:
print("No data found!")
return
# Compute effects with CI
results = compute_effects_with_ci(df)
# Print summary
print("=" * 90)
print("PREFETCH EFFECT (positive = slower)")
print("=" * 90)
display = results.copy()
for k in ['off_ms', 'on_ms', 'ci_low', 'ci_high']:
display[k] = display[k].round(1)
print(display[['column_name', 'io_method', 'num_workers', 'evict', 'n',
'off_ms', 'on_ms', 'pct_change', 'ci_low', 'ci_high']].to_string(index=False))
# Factor analysis
analyze_factors(df)
# Generate plot
plot_forest(results, prefix)
# Save CSV
csv_path = f'{prefix}prefetch_summary.csv'
results.to_csv(csv_path, index=False)
print(f"Saved: {csv_path}")
print("\nDone.")
if __name__ == '__main__':
main()
[text/x-sh] run_benchmarks.sh (7.3K, 6-run_benchmarks.sh)
download | inline:
#!/bin/bash
# =============================================================================
# INDEX PREFETCH ADVERSARIAL BENCHMARK RUNNER
# =============================================================================
# Usage:
# ./run_benchmarks.sh setup # Create test tables
# ./run_benchmarks.sh run baseline # Run benchmarks, save as baseline
# ./run_benchmarks.sh run patched # Run benchmarks, save as patched
# ./run_benchmarks.sh compare # Compare baseline vs patched
# ./run_benchmarks.sh pgbench baseline # Run pgbench tests as baseline
# ./run_benchmarks.sh pgbench patched # Run pgbench tests as patched
# ./run_benchmarks.sh cold baseline # Run with cold cache
# =============================================================================
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
RESULTS_DIR="${SCRIPT_DIR}/results"
DB="${PGDATABASE:-postgres}"
PGBENCH_TIME="${PGBENCH_TIME:-60}"
PGBENCH_CLIENTS="${PGBENCH_CLIENTS:-4}"
mkdir -p "${RESULTS_DIR}"
drop_caches() {
echo "Skip dropping OS caches (requires sudo)..."
}
restart_postgres() {
echo "Restarting PostgreSQL to clear shared buffers..."
pg_ctl restart -D "${PGDATA}" -w -m fast 2>/dev/null || \
echo "Warning: Could not restart PostgreSQL automatically"
}
case "$1" in
setup)
echo "Setting up adversarial benchmark tables..."
psql -d "${DB}" -f "${SCRIPT_DIR}/adversarial_benchmarks.sql" 2>&1 | \
tee "${RESULTS_DIR}/setup.log"
echo "Setup complete. Tables created in database '${DB}'"
;;
run)
TAG="${2:-$(date +%Y%m%d_%H%M%S)}"
OUTFILE="${RESULTS_DIR}/benchmark_${TAG}.log"
echo "Running benchmarks, saving to ${OUTFILE}..."
echo "=== Benchmark run: ${TAG} ===" > "${OUTFILE}"
echo "Date: $(date)" >> "${OUTFILE}"
echo "PostgreSQL version:" >> "${OUTFILE}"
psql -d "${DB}" -c "SELECT version();" >> "${OUTFILE}" 2>&1
echo "" >> "${OUTFILE}"
# Run the main benchmark queries
psql -d "${DB}" -f "${SCRIPT_DIR}/adversarial_benchmarks.sql" >> "${OUTFILE}" 2>&1
echo "Benchmark complete. Results saved to ${OUTFILE}"
;;
cold)
TAG="${2:-cold_$(date +%Y%m%d_%H%M%S)}"
OUTFILE="${RESULTS_DIR}/benchmark_${TAG}.log"
echo "Running COLD CACHE benchmarks..."
echo "This will restart PostgreSQL and drop OS caches."
# Drop OS caches
drop_caches
# Restart PostgreSQL to clear shared buffers
restart_postgres
sleep 2
echo "Running benchmarks with cold cache, saving to ${OUTFILE}..."
echo "=== COLD CACHE Benchmark run: ${TAG} ===" > "${OUTFILE}"
echo "Date: $(date)" >> "${OUTFILE}"
psql -d "${DB}" -c "SELECT version();" >> "${OUTFILE}" 2>&1
echo "" >> "${OUTFILE}"
psql -d "${DB}" -f "${SCRIPT_DIR}/adversarial_benchmarks.sql" >> "${OUTFILE}" 2>&1
echo "Cold cache benchmark complete. Results saved to ${OUTFILE}"
;;
pgbench)
TAG="${2:-$(date +%Y%m%d_%H%M%S)}"
echo "Running pgbench adversarial tests..."
# Test 1: Short query overhead
echo "=== Short Query Overhead Test ==="
OUTFILE="${RESULTS_DIR}/pgbench_short_${TAG}.log"
echo "Running pgbench short queries for ${PGBENCH_TIME}s..."
pgbench -d "${DB}" \
-f "${SCRIPT_DIR}/pgbench_short_queries.sql" \
-c "${PGBENCH_CLIENTS}" -j 2 \
-T "${PGBENCH_TIME}" \
-P 10 \
2>&1 | tee "${OUTFILE}"
# Test 2: Mixed hot/cold pattern
echo ""
echo "=== Mixed Access Pattern Test ==="
OUTFILE="${RESULTS_DIR}/pgbench_mixed_${TAG}.log"
echo "Running pgbench mixed pattern for ${PGBENCH_TIME}s..."
pgbench -d "${DB}" \
-f "${SCRIPT_DIR}/pgbench_mixed_pattern.sql" \
-c "${PGBENCH_CLIENTS}" -j 2 \
-T "${PGBENCH_TIME}" \
-P 10 \
2>&1 | tee "${OUTFILE}"
# Test 3: LIMIT waste
echo ""
echo "=== LIMIT Waste Test ==="
OUTFILE="${RESULTS_DIR}/pgbench_limit_${TAG}.log"
echo "Running pgbench limit waste for ${PGBENCH_TIME}s..."
pgbench -d "${DB}" \
-f "${SCRIPT_DIR}/pgbench_limit_waste.sql" \
-c "${PGBENCH_CLIENTS}" -j 2 \
-T "${PGBENCH_TIME}" \
-P 10 \
2>&1 | tee "${OUTFILE}"
echo ""
echo "pgbench tests complete. Results in ${RESULTS_DIR}/pgbench_*_${TAG}.log"
;;
compare)
if [ ! -f "${RESULTS_DIR}/benchmark_baseline.log" ] || \
[ ! -f "${RESULTS_DIR}/benchmark_patched.log" ]; then
echo "Error: Need both baseline and patched results."
echo "Run: $0 run baseline"
echo " $0 run patched"
exit 1
fi
echo "=== COMPARISON: baseline vs patched ==="
echo ""
# Extract timing information
echo "--- Execution Times (ms) ---"
echo "Query | Baseline | Patched | Delta"
echo "------|----------|---------|------"
# Simple grep for "Time:" lines and compare
paste <(grep -E "Time:|Execution Time:" "${RESULTS_DIR}/benchmark_baseline.log" | head -20) \
<(grep -E "Time:|Execution Time:" "${RESULTS_DIR}/benchmark_patched.log" | head -20) | \
awk -F'\t' '{print NR, $1, $2}'
echo ""
echo "--- Buffer Statistics ---"
echo "Baseline shared hits:"
grep -o "shared hit=[0-9]*" "${RESULTS_DIR}/benchmark_baseline.log" | \
awk -F= '{sum+=$2} END {print " Total:", sum}'
echo "Patched shared hits:"
grep -o "shared hit=[0-9]*" "${RESULTS_DIR}/benchmark_patched.log" | \
awk -F= '{sum+=$2} END {print " Total:", sum}'
echo ""
echo "Baseline shared reads:"
grep -o "shared read=[0-9]*" "${RESULTS_DIR}/benchmark_baseline.log" | \
awk -F= '{sum+=$2} END {print " Total:", sum}'
echo "Patched shared reads:"
grep -o "shared read=[0-9]*" "${RESULTS_DIR}/benchmark_patched.log" | \
awk -F= '{sum+=$2} END {print " Total:", sum}'
echo ""
echo "For detailed analysis, compare the full log files manually."
;;
*)
echo "Usage: $0 {setup|run|cold|pgbench|compare} [tag]"
echo ""
echo "Commands:"
echo " setup Create test tables in database"
echo " run <tag> Run benchmarks, save with tag (e.g., 'baseline', 'patched')"
echo " cold <tag> Run with cold cache (restarts PG, drops OS cache)"
echo " pgbench <tag> Run pgbench stress tests"
echo " compare Compare baseline vs patched results"
echo ""
echo "Environment variables:"
echo " PGDATABASE Database to use (default: postgres)"
echo " PGDATA Data directory for pg_ctl (for cold cache tests)"
echo " PGBENCH_TIME pgbench duration in seconds (default: 60)"
echo " PGBENCH_CLIENTS pgbench client count (default: 4)"
exit 1
;;
esac
[text/x-python-script] run_regression_test.py (8.5K, 7-run_regression_test.py)
download | inline:
#!/usr/bin/env python3
"""
Run prefetch regression tests.
Usage:
python run_regression_test.py [options]
Examples:
python run_regression_test.py --iterations=5
python run_regression_test.py --evict=off,pg --workers=0,2
python run_regression_test.py --columns=sequential,random --reset
"""
import argparse
import subprocess
import itertools
import random
import os
import psycopg
import pandas as pd
from math import gcd
def parse_args():
p = argparse.ArgumentParser(description='Run prefetch regression tests')
p.add_argument('--iterations', '-n', type=int, default=10,
help='Number of test iterations (default: 10)')
p.add_argument('--columns', '-c', type=str, default='sequential,periodic,random',
help='Columns to test (default: sequential,periodic,random)')
p.add_argument('--workers', '-w', type=str, default='0,2',
help='Worker counts to test (default: 0,2)')
p.add_argument('--evict', '-e', type=str, default='off',
help='Evict modes: off,pg,os or "all" (default: off)')
p.add_argument('--rows', '-r', type=int, default=100000,
help='Number of rows in test table (default: 100000)')
p.add_argument('--reset', action='store_true',
help='Reset tables before running')
p.add_argument('--dbname', '--db', '-d', type=str, default='postgres',
help='Database name (default: postgres)')
p.add_argument('--host', '-H', type=str, default='/tmp',
help='Database host (default: /tmp)')
p.add_argument('--port', '-p', type=int, default=None,
help='Database port (default: use socket)')
return p.parse_args()
def setup_tables(cur, num_rows, reset=False):
"""Create tables and populate test data."""
if reset:
cur.execute('DROP TABLE IF EXISTS prefetch_test_results')
cur.execute('DROP TABLE IF EXISTS prefetch_test_data')
cur.execute('''
CREATE EXTENSION IF NOT EXISTS pg_buffercache
''');
cur.execute('''
CREATE TABLE IF NOT EXISTS prefetch_test_results (
id SERIAL PRIMARY KEY,
run_timestamp TIMESTAMPTZ DEFAULT now(),
io_method TEXT NOT NULL,
num_workers INT NOT NULL DEFAULT 0,
prefetch_enabled BOOLEAN NOT NULL,
evict_mode TEXT NOT NULL DEFAULT 'off',
column_name TEXT NOT NULL,
iteration INT,
execution_time_ms NUMERIC,
rows_returned BIGINT,
blks_hit BIGINT,
blks_read BIGINT
)
''')
cur.execute('''
CREATE TABLE IF NOT EXISTS prefetch_test_data (
id SERIAL PRIMARY KEY,
sequential INT,
periodic INT,
random INT,
payload TEXT
)
''')
cur.execute("SELECT count(*) FROM prefetch_test_data")
row_count = cur.fetchone()[0]
if row_count == 0:
print(f"Populating test data ({num_rows} rows)...")
r = min(10000, num_rows // 5)
while gcd(r, num_rows) != 1:
r += 1
cur.execute(f'''
INSERT INTO prefetch_test_data (sequential, periodic, random, payload)
SELECT
i,
((i * {r}::bigint) % {num_rows} + 1)::int,
row_number() OVER (ORDER BY random()),
repeat('x', 200)
FROM generate_series(1, {num_rows}) i
ORDER BY i;
''')
cur.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_sequential ON prefetch_test_data(sequential)")
cur.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_periodic ON prefetch_test_data(periodic)")
cur.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_random ON prefetch_test_data(random)")
print("Data populated.")
else:
print(f"Test data exists: {row_count} rows")
def purge_os_cache():
"""Purge OS filesystem cache using external script."""
script = os.path.join(os.path.dirname(__file__), 'purge_cache.sh')
try:
subprocess.run([script], capture_output=True, timeout=10)
except:
pass
def evict_pg_buffers(cur):
"""Evict PostgreSQL shared buffers for test table."""
cur.execute('''
SELECT count(pg_buffercache_evict(bufferid))
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)
WHERE c.relname = 'prefetch_test_data'
''').fetchall()
def apply_eviction(cur, evict_mode):
"""Apply eviction based on mode."""
if evict_mode == 'pg':
evict_pg_buffers(cur)
elif evict_mode == 'os':
evict_pg_buffers(cur)
purge_os_cache()
def run_test(cur, column_name, prefetch_enabled, num_workers, evict_mode, iteration):
"""Run a single test and record results."""
cur.execute(f"SET enable_indexscan_prefetch = {'on' if prefetch_enabled else 'off'}")
cur.execute(f"SET max_parallel_workers_per_gather = {num_workers}")
cur.execute("SET enable_bitmapscan = off")
cur.execute("SET enable_seqscan = off")
cur.execute("SET enable_indexonlyscan = off")
cur.execute("SET enable_sort = off")
apply_eviction(cur, evict_mode)
cur.execute(f'''
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT length(payload) FROM prefetch_test_data
ORDER BY {column_name}
''')
result = cur.fetchone()[0]
plan = result[0]['Plan']
exec_time = result[0]['Execution Time']
blks_hit = plan.get('Shared Hit Blocks', 0)
blks_read = plan.get('Shared Read Blocks', 0)
rows = plan.get('Actual Rows', 0)
cur.execute("SHOW io_method")
io_method = cur.fetchone()[0]
cur.execute('''
INSERT INTO prefetch_test_results
(io_method, num_workers, prefetch_enabled, evict_mode, column_name,
iteration, execution_time_ms, rows_returned, blks_hit, blks_read)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
''', (io_method, num_workers, prefetch_enabled, evict_mode,
column_name, iteration, exec_time, rows, blks_hit, blks_read))
pf = 'Y' if prefetch_enabled else 'N'
print(f" [{column_name}] pf={pf} w={num_workers} evict={evict_mode}: {exec_time:.1f}ms")
def print_summary(cur):
"""Print summary of results."""
df = pd.read_sql('''
WITH avgs AS (
SELECT column_name, io_method, num_workers,
prefetch_enabled AS prefetch, evict_mode AS evict,
count(*) AS n, avg(execution_time_ms) AS ms
FROM prefetch_test_results
GROUP BY column_name, io_method, num_workers, prefetch_enabled, evict_mode
)
SELECT a.column_name, a.io_method, a.num_workers, a.evict, a.n,
round(a.ms::numeric, 1) AS off_ms,
round(b.ms::numeric, 1) AS on_ms,
round(((b.ms - a.ms) / NULLIF(b.ms + a.ms, 0) * 100)::numeric, 1) AS effect_pct
FROM avgs a JOIN avgs b USING (column_name, io_method, num_workers, evict)
WHERE NOT a.prefetch AND b.prefetch
ORDER BY column_name, io_method, num_workers, evict
''', cur.connection)
print("\n" + "=" * 80)
print("SUMMARY: Prefetch Effect (positive = slower)")
print("=" * 80)
print(df.to_string(index=False))
def main():
args = parse_args()
# Parse list arguments
columns = [c.strip() for c in args.columns.split(',')]
workers = [int(w.strip()) for w in args.workers.split(',')]
evict_modes = ['off', 'pg', 'os'] if args.evict == 'all' else [e.strip() for e in args.evict.split(',')]
print(f"Config: iterations={args.iterations}, columns={columns}, workers={workers}, evict={evict_modes}")
# Connect
connstr = f"dbname={args.dbname} host={args.host}"
if args.port:
connstr += f" port={args.port}"
conn = psycopg.connect(connstr)
conn.autocommit = True
cur = conn.cursor()
# Setup
setup_tables(cur, args.rows, args.reset)
# Show io_method
cur.execute("SHOW io_method")
io_method = cur.fetchone()[0]
print(f"\nio_method = {io_method}\n")
# Run tests
prefetch_opts = [False, True]
for i in range(1, args.iterations + 1):
print(f"Iteration {i}/{args.iterations}")
configs = list(itertools.product(columns, prefetch_opts, workers, evict_modes))
random.shuffle(configs)
for col, pf, w, evict in configs:
run_test(cur, col, pf, w, evict, i)
print()
# Summary
print_summary(cur)
conn.close()
if __name__ == "__main__":
main()
[application/octet-stream] test_prefetch_regressions.sql (9.0K, 8-test_prefetch_regressions.sql)
download
view thread (87+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: index prefetching
In-Reply-To: <CAE8JnxN_EwnTLLMWGhvgwaomYZ0ysm7NeogA-BqBd=Rs3S7Oqw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox