Re: index prefetching - Alexandre Felipe

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Alexandre Felipe <[email protected]>
To: Peter Geoghegan <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Nazir Bilal Yavuz <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Georgios <[email protected]>
Cc: Konstantin Knizhnik <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: Re: index prefetching
Date: Sun, 15 Feb 2026 00:13:39 +0000
Message-ID: <CAE8JnxN_EwnTLLMWGhvgwaomYZ0ysm7NeogA-BqBd=Rs3S7Oqw@mail.gmail.com> (raw)
In-Reply-To: <CAH2-Wznv9_KGqHQ1vCW2pkiA6QskBGcx5NC_-UXnD6GEQasvAQ@mail.gmail.com>
References: <[email protected]>
	<5pltwb73d7cynsxo2yb54ygjk7haviatkrx43mnzihc6kkield@ahnstpgof46i>
	<CA+hUKGKL3MRvEftAE+kwBuL2PLg2CwUoHEMr=-KSvsWN8pHq9w@mail.gmail.com>
	<[email protected]>
	<e33gafg4p7iwvo24ytrxuw43nafm5xm3jefpdspnarcbkfurs7@3jbgdiinxem5>
	<[email protected]>
	<CAH2-Wz=DfvzasnzLv43cu36Q1Ca8Qi70_JjZ7SRbNhDwwgvirg@mail.gmail.com>
	<qdl4fojnbfcnm2k7b4zpvgd6gwzwdgtbl5c7shpimrb76dbyy6@scdnspus3ejh>
	<bpdeohyqvltb77viyft4bza4xc4peed3jcoep74d2ih6ynqlke@wbnhcwmq3ril>
	<CAH2-WznFdjY_OB2S7_BY4iAyeffK+XrE2qsX6aghgP63VocRfQ@mail.gmail.com>
	<4zeu5yb73byiquvf3eefsunnrydyqfxy3eup66jrliutrtd4xl@5iifjey4n5m5>
	<[email protected]>
	<CAH2-WzkAC_gWQgc3MV3MWBx2EUaE4TfPk8XQPkJgs1R_qNvo=A@mail.gmail.com>
	<CAH2-WznLRnUZHC4kmR+J3MgGpN0iTUaZ-4xeSHPiaYR=y3C8Og@mail.gmail.com>
	<CAH2-WzkHK0++yn3_nmWL2rWzXiu8Qc=2=NPQDG4nfVjvYkRDrQ@mail.gmail.com>
	<CAH2-Wzk9=x=a2TbcqYcX+XXmDHQr5=1v9m4Z_v8a-KwF1Zoz0A@mail.gmail.com>
	<CAH2-WzmYqhacBH161peAWb5eF=Ja7CFAQ+0jSEMq=qnfLVTOOg@mail.gmail.com>
	<CAH2-Wzmm+mXuv_r+eYgcwOKZzNSmbYoEmxi5mdKMUtkmYa_qXw@mail.gmail.com>
	<CAH2-WzkC_EdGB_tdEQr63BiOuJbcvZiV=xO+OJYux4wOaxPD8w@mail.gmail.com>
	<CAH2-Wzn8whv-RE4E5L2WQB_Ha_jy0UReJq1yypu83gDYHeXq5g@mail.gmail.com>
	<CAH2-Wzm7-QuDOs6TcqfhhDsGEZCuHtn=D-SriOTnTZ_fiXNBvA@mail.gmail.com>
	<CAH2-WzmH7pVQ0-mYAxb82aWbz29_BiBPq2wV5p7+1o2sRFqDRQ@mail.gmail.com>
	<CAH2-Wz=6a7fGz2rALDX+xiFDuEaGQWpZ49xEaBUDKiPH8gcL+Q@mail.gmail.com>
	<CAH2-WzkehuhxyuA8quc7rRN3EtNXpiKsjPfO8mhb+0Dr2K0Dtg@mail.gmail.com>
	<CAH2-WzmymSyOt5Y2RGbm6cJXg18J_ttfqjdcpodHe6Gp23ConQ@mail.gmail.com>
	<CAH2-Wznv9_KGqHQ1vCW2pkiA6QskBGcx5NC_-UXnD6GEQasvAQ@mail.gmail.com>

Hi,
I decided to test this PR.

I didn't take much time to go through the thread or the code in detail yet.
But I have my first benchmark results and I would like to share.

EXPERIMENT

I tested [CF 4351] v10 - Index Prefetching

I created a table with 100k rows and
Sequential is, as guessed, 1,2,3,4 (indexed)
Periodic is a quasi random (i * jump) % num_rows, where gcd(jump, num_rows)
= 1, guarantee that there are no repeated entries (indexed)
Random is a `row_number() over (order by random())` (indexed)
The payload is a fixed 200 character long string, just to make it more
realistic.

For the tests, I disable sorting, sequential scans, index only scans and
bitmap scans.
Since buffer cache always has a significant impact on the query
performance, I shuffled the tests, and tried to adjust for the number of
buffer hit/read, but later I found that the best way to control that was to
use a table small enough to be entirely held in cache, and evict the
buffers.

* off: buffers are kept in cache
* pg: buffers evicted from postgres pg_buffercache_evict from
pg_buffercache extension.
* os: supported only in  python, I separated the buffer eviction in
purge_cache as it requires sudo (tested only in MacOS).

I varied
 * max_parallel_workers_per_gather (although I guess it wasn't exploited),
 * enable_index_prefetch
 * the column used as sorting key and, as a result, the index used.
 * and buffer eviction mode.

Running from python with psycopg

SUMMARY

I could not see the expected positive impact and when using the python
script and buffers evicted prefetch had a detrimental impact.


APPENDIX

psql run
column_name io_method  num_workers evict  n  off_ms  on_ms  pct_change
 ci_low  ci_high
   periodic    worker            0   off 10   172.4  182.8    6.024162
 -3.7     15.7
   periodic    worker            0    pg 10   222.7  214.8   -3.539448
-13.0      5.9
   periodic    worker            2   off 10   173.5  172.7   -0.442660
 -1.6      0.7
   periodic    worker            2    pg 10   226.3  213.2   -5.795049
-16.9      5.3
     random    worker            0   off 10   173.3  174.1    0.476657
 -1.0      2.0
     random    worker            0    pg 10   216.6  218.1    0.704020
 -2.8      4.3
     random    worker            2   off 10   182.7  175.3   -4.031139
-15.6      7.5
     random    worker            2    pg 10   217.2  213.7   -1.586813
 -4.4      1.2
 sequential    worker            0   off 10   131.1  130.4   -0.568573
 -3.4      2.3
 sequential    worker            0    pg 10   150.7  188.2   24.923924
 22.2     27.6
 sequential    worker            2   off 10   129.7  130.2    0.421814
 -1.0      1.9
 sequential    worker            2    pg 10   151.0  184.1   21.925935
 19.5     24.3

psycopg run
column_name io_method  num_workers evict  n  off_ms  on_ms  pct_change
 ci_low  ci_high
   periodic    worker            0   off 10   186.7  193.7    3.701950
 -7.3     14.7
   periodic    worker            0    os 10   207.0  535.2  158.507266
153.7    163.3
   periodic    worker            0    pg 10   201.2  537.0  166.923995
161.5    172.3
   periodic    worker            2   off 10   181.2  189.0    4.303359
 -0.2      8.8
   periodic    worker            2    os 10   221.7  548.2  147.322934
131.8    162.9
   periodic    worker            2    pg 10   203.1  533.5  162.688738
160.4    165.0
     random    worker            0   off 10   194.6  186.8   -3.986974
-11.5      3.5
     random    worker            0    os 10   211.4  338.2   59.979252
 52.9     67.1
     random    worker            0    pg 10   212.3  336.8   58.686711
 50.2     67.1
     random    worker            2   off 10   183.3  187.7    2.364585
0.4      4.4
     random    worker            2    os 10   222.5  341.9   53.659704
 42.1     65.2
     random    worker            2    pg 10   204.1  333.3   63.348746
 62.3     64.4
 sequential    worker            0   off 10   129.9  129.5   -0.260586
 -1.9      1.3
 sequential    worker            0    os 10   150.9  182.4   20.909424
 16.6     25.2
 sequential    worker            0    pg 10   150.3  185.0   23.132583
 19.2     27.1
 sequential    worker            2   off 10   129.9  132.9    2.323179
 -4.3      9.0
 sequential    worker            2    os 10   153.8  189.8   23.402255
 13.9     32.9
 sequential    worker            2    pg 10   151.1  185.9   22.997500
 20.1     25.9

Regards,
Alexandre


On Mon, Feb 9, 2026 at 10:45 PM Peter Geoghegan <[email protected]> wrote:

> On Fri, Jan 30, 2026 at 7:18 PM Peter Geoghegan <[email protected]> wrote:
> > Attached is v9.
>
> Attached is v10. There are 2 major areas of improvement in this latest
> revision:
>
> 1. We have enhanced the read stream callback (heapam_getnext_stream,
> which is added by
> v10-0005-Add-prefetching-to-index-scans-using-batch-inter.patch),
> making it yield at key intervals. When we yield, we temporarily
> suspend prefetching -- but only for long enough to give the scan the
> opportunity to return one more matching tuple (don't confuse yielding
> with pausing; we do both, but the goals are rather different in each
> case).
>
> Yielding like this keeps the scan responsive during prefetching: the
> scan should never go too long without returning at least one tuple
> (except when that just isn't possible at all). Testing has shown that
> keeping the scan responsive in this sense is particularly important
> during scans that appear in "ORDER BY ... LIMIT N" queries, as well as
> during scans that feed into certain merge joins. IOW, it is
> particularly important that we "keep the scan responsive" whenever it
> has the potential to allow the scan to shut down before it has
> performed work that turns out to be unnecessary (though it also seems
> to have some benefits even when that isn't the case).
>
> There is a complex trade-off here: we want to yield when we expect
> some benefit from doing so. But we don't want to yield when doing so
> risks compromising the read stream's ability to maintain an adequate
> prefetch distance. There are comments in heapam_getnext_stream that
> describe the theory in more detail. There are heuristics that were
> derived using adversarial benchmarking/stress-testing. I'm sure that
> they need more work, but this does seem like roughly the right idea.
> Note that we now test whether the scan's read stream is using its
> fast-path mode (read stream uses this when the scan reads heap pages
> that are all cached).
>
> 2. A new patch
> (v10-0007-Limit-get_actual_variable_range-to-scan-one-inde.patch)
> compensates for the fact that the main prefetching commit removes
> get_actual_variable_range's VISITED_PAGES_LIMIT mechanism. Since
> VISITED_PAGES_LIMIT cannot easily be ported over to the new table AM
> interface selfuncs.c now uses.
>
> I described the problem that we need to address when I posted v9:
>
> > selfuncs.c problem
> > ------------------
> >
> > Also worth noting that we recently found a problem with selfuncs.c:
> > the VISITED_PAGES_LIMIT stuff in selfuncs.c is broken right now. v9
> > tears that code out, pending adding back a real fix (earlier versions
> > of the patch had VISITED_PAGES_LIMIT, but it didn't work).
> >
> > The underlying problem is that the existing VISITED_PAGES_LIMIT design
> > is incompatible with our new table_index_getnext_slot interface. The
> > new interface doesn't stop scanning until it is able to at least
> > return 1 tuple. But VISITED_PAGES_LIMIT was invented precisely because
> > get_actual_variable_endpoint's index scans took far too long, even
> > though they're only ever required to locate 1 tuple. So that just
> > won't work.
> >
> > We'll need to invent some kind of API that directly acknowledges the
> > needs of the selfuncs.c caller, and others like it. Doing it in an
> > ad-hoc way just doesn't seem possible anymore. That will have to wait
> > for the next revision, though (or the one after that).
>
> The new patch deals with the problem in a completely different way,
> and at a completely different layer: it adds a new
> IndexScanDescData.xs_read_extremal_only field, set only by
> get_actual_variable_range. When nbtree sees that the field has been
> set, it gives up after scanning only one leaf page (the page that
> contains extremal values that are of interest to
> get_actual_variable_range). Note that we completely stop caring about
> heap page fetches with this new approach.
>
> There are good reasons to believe that the new
> IndexScanDescData.xs_read_extremal_only approach will solve existing
> problems with VISITED_PAGES_LIMIT. Recent benchmarking from Mark
> Callaghan [1] (which I've independently recreated with my own minimal
> test suite) shows that VISITED_PAGES_LIMIT becomes completely
> ineffective, once we reach a tipping point where many index tuples at
> one end of the index all have their LP_DEAD bit set.
>
> I'm going to start a new thread to discuss the issues in this area
> later today. I'm aiming to fix an existing, independent issue in this
> new patch, so it makes sense to discuss it on a completely separate
> thread.
>
> [1]
> https://smalldatum.blogspot.com/2026/01/cpu-bound-insert-benchmark-vs-postgres.html
> --
> Peter Geoghegan
>


Attachments:

  [image/png] psycopg.png (114.5K, 3-psycopg.png)
  download | view image

  [image/png] psql.png (82.7K, 4-psql.png)
  download | view image

  [text/x-python-script] run_analysis.py (6.3K, 5-run_analysis.py)
  download | inline:
#!/usr/bin/env python3
"""
Prefetch regression analysis - compute effects and generate forest plot.

Usage: python run_analysis.py [options]
"""

import argparse
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import psycopg
from scipy import stats
from matplotlib.lines import Line2D


def parse_args():
    p = argparse.ArgumentParser(description='Analyze prefetch regression test results')
    p.add_argument('--prefix', '-o', type=str, default='',
                   help='Output file prefix')
    p.add_argument('--dbname', '--db', '-d', type=str, default='postgres',
                   help='Database name (default: postgres)')
    p.add_argument('--host', '-H', type=str, default='/tmp',
                   help='Database host (default: /tmp)')
    p.add_argument('--port', '-p', type=int, default=None,
                   help='Database port')
    return p.parse_args()


def load_timing_data(conn):
    """Load raw results from database."""
    return pd.read_sql('''
        SELECT column_name, io_method, num_workers, prefetch_enabled,
               evict_mode, iteration, execution_time_ms
        FROM prefetch_test_results
    ''', conn)


def compute_ci(group):
    """Compute effect with CI for a single configuration group."""
    off = group[~group['prefetch_enabled']]['execution_time_ms']
    on = group[group['prefetch_enabled']]['execution_time_ms']
    assert len(off) == len(on), "Off and on should have the same length"
    if len(off) < 2 or len(on) < 2:
        return None
    
    result = stats.ttest_ind(on, off, equal_var=False)
    ci = result.confidence_interval(0.95)
    
    return pd.Series({
        'n': min(len(off), len(on)),
        'off_ms': off.mean(),
        'on_ms': on.mean(),
        'pct_change': (on.mean() / off.mean() - 1) * 100,
        'ci_low': ci.low / off.mean() * 100,
        'ci_high': ci.high / off.mean() * 100,
        'p_value': result.pvalue,
        'significant': result.pvalue < 0.05
    })


def compute_effects_with_ci(df):
    """Compute prefetch effect with confidence intervals using Welch's t-test."""
    group_keys = ['num_workers', 'column_name', 'io_method', 'evict_mode']
    results = df.groupby(group_keys).apply(compute_ci).dropna().reset_index()
    results = results.rename(columns={'evict_mode': 'evict'})
    return results.sort_values(['column_name', 'io_method', 'num_workers', 'evict'])


def plot_forest(df, prefix=''):
    """Generate forest plot with table labels as yticks."""
    from matplotlib.lines import Line2D
    
    df = df.sort_values(['evict', 'column_name', 'num_workers', 'n']).reset_index(drop=True)
    n = len(df)
    
    if n == 0:
        print("No data to plot!")
        return None
    
    # Format table as aligned text for ytick labels
    table = df[['num_workers',  'column_name','evict']].copy()
    table.columns = ['num_workers', 'column_name', 'eviction']
    table_str = table.to_string(index=False, header=True)
    lines = table_str.split('\n')
    header = lines[0]
    labels = lines[1:] + [header]  # header at top (highest y)
    
    fig, ax = plt.subplots(figsize=(10, max(4, (n + 1) * 0.35)))
    
    # Forest plot
    for i, row in df.iterrows():
        color = 'gray'
        if row['significant']:
            color = 'red' if row['pct_change'] > 0 else 'green'
        
        ax.errorbar(
            row['pct_change'], i,
            xerr=[[row['pct_change'] - row['ci_low']], [row['ci_high'] - row['pct_change']]],
            fmt='s', color=color, markersize=8, capsize=3, linewidth=2
        )
        ax.annotate(f"{row['pct_change']:.1f}%", 
                    (row['ci_high'] + 2, i), fontsize=8, va='center')
    
    ax.axvline(0, color='black', linewidth=1)
    ax.set_yticks(range(n + 1))
    ax.set_yticklabels(labels, fontfamily='monospace', fontsize=9)
    ax.set_xlabel('% Change in Execution Time (95% CI)')
    ax.set_title('Prefetch Effect (Positive = SLOWER)')
    ax.grid(axis='x', alpha=0.3)
    
    # Legend
    legend = [
        Line2D([0], [0], marker='s', color='w', markerfacecolor='green', markersize=10, label='Faster'),
        Line2D([0], [0], marker='s', color='w', markerfacecolor='red', markersize=10, label='Slower'),
        Line2D([0], [0], marker='s', color='w', markerfacecolor='gray', markersize=10, label='Not significant'),
    ]
    ax.legend(handles=legend, loc='lower right')
    
    plt.tight_layout()
    plot_path = f'{prefix}prefetch_forest.png'
    plt.savefig(plot_path, dpi=150, bbox_inches='tight')
    plt.close()
    print(f"Saved: {plot_path}")
    return plot_path


def analyze_factors(results):
    """Analyze the effect of each factor on prefetch performance."""
    print("\n" + "=" * 50)
    print("AVERAGE EFFECT BY FACTOR")
    print("=" * 50)
    
    factors = ['evict_mode', 'column_name', 'num_workers', 'prefetch_enabled']
    
    rows = []
    for factor in factors:
        for state, mean in results.groupby(factor)['execution_time_ms'].mean().sort_index().items():
            rows.append({'factor': factor, 'state': str(state), 'mean_ms': round(mean, 1)})
    
    df = pd.DataFrame(rows).set_index(['factor', 'state'])
    print(df.to_string())

def main():
    args = parse_args()
    prefix = args.prefix + "_" if args.prefix else ""
    
    # Connect
    connstr = f"dbname={args.dbname} host={args.host}"
    if args.port:
        connstr += f" port={args.port}"
    conn = psycopg.connect(connstr)
    
    # Load data
    print("Loading data...")
    df = load_timing_data(conn)
    conn.close()
    
    print(f"Loaded {len(df)} rows\n")
    
    if len(df) == 0:
        print("No data found!")
        return
    
    # Compute effects with CI
    results = compute_effects_with_ci(df)
    
    # Print summary
    print("=" * 90)
    print("PREFETCH EFFECT (positive = slower)")
    print("=" * 90)
    display = results.copy()
    for k in ['off_ms', 'on_ms', 'ci_low', 'ci_high']:
        display[k] = display[k].round(1)
    print(display[['column_name', 'io_method', 'num_workers', 'evict', 'n', 
                   'off_ms', 'on_ms', 'pct_change', 'ci_low', 'ci_high']].to_string(index=False))
    
    # Factor analysis
    analyze_factors(df)
    
    # Generate plot
    plot_forest(results, prefix)
    
    # Save CSV
    csv_path = f'{prefix}prefetch_summary.csv'
    results.to_csv(csv_path, index=False)
    print(f"Saved: {csv_path}")
    
    print("\nDone.")


if __name__ == '__main__':
    main()

  [text/x-sh] run_benchmarks.sh (7.3K, 6-run_benchmarks.sh)
  download | inline:
#!/bin/bash
# =============================================================================
# INDEX PREFETCH ADVERSARIAL BENCHMARK RUNNER
# =============================================================================
# Usage:
#   ./run_benchmarks.sh setup              # Create test tables
#   ./run_benchmarks.sh run baseline       # Run benchmarks, save as baseline
#   ./run_benchmarks.sh run patched        # Run benchmarks, save as patched
#   ./run_benchmarks.sh compare            # Compare baseline vs patched
#   ./run_benchmarks.sh pgbench baseline   # Run pgbench tests as baseline
#   ./run_benchmarks.sh pgbench patched    # Run pgbench tests as patched
#   ./run_benchmarks.sh cold baseline      # Run with cold cache 
# =============================================================================

set -e

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
RESULTS_DIR="${SCRIPT_DIR}/results"
DB="${PGDATABASE:-postgres}"
PGBENCH_TIME="${PGBENCH_TIME:-60}"
PGBENCH_CLIENTS="${PGBENCH_CLIENTS:-4}"

mkdir -p "${RESULTS_DIR}"

drop_caches() {
    echo "Skip dropping OS caches (requires sudo)..."
}

restart_postgres() {
    echo "Restarting PostgreSQL to clear shared buffers..."
    pg_ctl restart -D "${PGDATA}" -w -m fast 2>/dev/null || \
        echo "Warning: Could not restart PostgreSQL automatically"
}

case "$1" in
    setup)
        echo "Setting up adversarial benchmark tables..."
        psql -d "${DB}" -f "${SCRIPT_DIR}/adversarial_benchmarks.sql" 2>&1 | \
            tee "${RESULTS_DIR}/setup.log"
        echo "Setup complete. Tables created in database '${DB}'"
        ;;
        
    run)
        TAG="${2:-$(date +%Y%m%d_%H%M%S)}"
        OUTFILE="${RESULTS_DIR}/benchmark_${TAG}.log"
        
        echo "Running benchmarks, saving to ${OUTFILE}..."
        echo "=== Benchmark run: ${TAG} ===" > "${OUTFILE}"
        echo "Date: $(date)" >> "${OUTFILE}"
        echo "PostgreSQL version:" >> "${OUTFILE}"
        psql -d "${DB}" -c "SELECT version();" >> "${OUTFILE}" 2>&1
        echo "" >> "${OUTFILE}"
        
        # Run the main benchmark queries
        psql -d "${DB}" -f "${SCRIPT_DIR}/adversarial_benchmarks.sql" >> "${OUTFILE}" 2>&1
        
        echo "Benchmark complete. Results saved to ${OUTFILE}"
        ;;
        
    cold)
        TAG="${2:-cold_$(date +%Y%m%d_%H%M%S)}"
        OUTFILE="${RESULTS_DIR}/benchmark_${TAG}.log"
        
        echo "Running COLD CACHE benchmarks..."
        echo "This will restart PostgreSQL and drop OS caches."
        
        # Drop OS caches
        drop_caches
        
        # Restart PostgreSQL to clear shared buffers
        restart_postgres
        
        sleep 2
        
        echo "Running benchmarks with cold cache, saving to ${OUTFILE}..."
        echo "=== COLD CACHE Benchmark run: ${TAG} ===" > "${OUTFILE}"
        echo "Date: $(date)" >> "${OUTFILE}"
        psql -d "${DB}" -c "SELECT version();" >> "${OUTFILE}" 2>&1
        echo "" >> "${OUTFILE}"
        
        psql -d "${DB}" -f "${SCRIPT_DIR}/adversarial_benchmarks.sql" >> "${OUTFILE}" 2>&1
        
        echo "Cold cache benchmark complete. Results saved to ${OUTFILE}"
        ;;
        
    pgbench)
        TAG="${2:-$(date +%Y%m%d_%H%M%S)}"
        
        echo "Running pgbench adversarial tests..."
        
        # Test 1: Short query overhead
        echo "=== Short Query Overhead Test ===" 
        OUTFILE="${RESULTS_DIR}/pgbench_short_${TAG}.log"
        echo "Running pgbench short queries for ${PGBENCH_TIME}s..."
        pgbench -d "${DB}" \
            -f "${SCRIPT_DIR}/pgbench_short_queries.sql" \
            -c "${PGBENCH_CLIENTS}" -j 2 \
            -T "${PGBENCH_TIME}" \
            -P 10 \
            2>&1 | tee "${OUTFILE}"
        
        # Test 2: Mixed hot/cold pattern
        echo ""
        echo "=== Mixed Access Pattern Test ==="
        OUTFILE="${RESULTS_DIR}/pgbench_mixed_${TAG}.log"
        echo "Running pgbench mixed pattern for ${PGBENCH_TIME}s..."
        pgbench -d "${DB}" \
            -f "${SCRIPT_DIR}/pgbench_mixed_pattern.sql" \
            -c "${PGBENCH_CLIENTS}" -j 2 \
            -T "${PGBENCH_TIME}" \
            -P 10 \
            2>&1 | tee "${OUTFILE}"
        
        # Test 3: LIMIT waste
        echo ""
        echo "=== LIMIT Waste Test ==="
        OUTFILE="${RESULTS_DIR}/pgbench_limit_${TAG}.log"
        echo "Running pgbench limit waste for ${PGBENCH_TIME}s..."
        pgbench -d "${DB}" \
            -f "${SCRIPT_DIR}/pgbench_limit_waste.sql" \
            -c "${PGBENCH_CLIENTS}" -j 2 \
            -T "${PGBENCH_TIME}" \
            -P 10 \
            2>&1 | tee "${OUTFILE}"
        
        echo ""
        echo "pgbench tests complete. Results in ${RESULTS_DIR}/pgbench_*_${TAG}.log"
        ;;
        
    compare)
        if [ ! -f "${RESULTS_DIR}/benchmark_baseline.log" ] || \
           [ ! -f "${RESULTS_DIR}/benchmark_patched.log" ]; then
            echo "Error: Need both baseline and patched results."
            echo "Run: $0 run baseline"
            echo "     $0 run patched"
            exit 1
        fi
        
        echo "=== COMPARISON: baseline vs patched ==="
        echo ""
        
        # Extract timing information
        echo "--- Execution Times (ms) ---"
        echo "Query | Baseline | Patched | Delta"
        echo "------|----------|---------|------"
        
        # Simple grep for "Time:" lines and compare
        paste <(grep -E "Time:|Execution Time:" "${RESULTS_DIR}/benchmark_baseline.log" | head -20) \
              <(grep -E "Time:|Execution Time:" "${RESULTS_DIR}/benchmark_patched.log" | head -20) | \
            awk -F'\t' '{print NR, $1, $2}'
        
        echo ""
        echo "--- Buffer Statistics ---"
        echo "Baseline shared hits:"
        grep -o "shared hit=[0-9]*" "${RESULTS_DIR}/benchmark_baseline.log" | \
            awk -F= '{sum+=$2} END {print "  Total:", sum}'
        echo "Patched shared hits:"
        grep -o "shared hit=[0-9]*" "${RESULTS_DIR}/benchmark_patched.log" | \
            awk -F= '{sum+=$2} END {print "  Total:", sum}'
        
        echo ""
        echo "Baseline shared reads:"
        grep -o "shared read=[0-9]*" "${RESULTS_DIR}/benchmark_baseline.log" | \
            awk -F= '{sum+=$2} END {print "  Total:", sum}'
        echo "Patched shared reads:"
        grep -o "shared read=[0-9]*" "${RESULTS_DIR}/benchmark_patched.log" | \
            awk -F= '{sum+=$2} END {print "  Total:", sum}'
        
        echo ""
        echo "For detailed analysis, compare the full log files manually."
        ;;
        
    *)
        echo "Usage: $0 {setup|run|cold|pgbench|compare} [tag]"
        echo ""
        echo "Commands:"
        echo "  setup              Create test tables in database"
        echo "  run <tag>          Run benchmarks, save with tag (e.g., 'baseline', 'patched')"
        echo "  cold <tag>         Run with cold cache (restarts PG, drops OS cache)"
        echo "  pgbench <tag>      Run pgbench stress tests"
        echo "  compare            Compare baseline vs patched results"
        echo ""
        echo "Environment variables:"
        echo "  PGDATABASE         Database to use (default: postgres)"
        echo "  PGDATA             Data directory for pg_ctl (for cold cache tests)"
        echo "  PGBENCH_TIME       pgbench duration in seconds (default: 60)"
        echo "  PGBENCH_CLIENTS    pgbench client count (default: 4)"
        exit 1
        ;;
esac

  [text/x-python-script] run_regression_test.py (8.5K, 7-run_regression_test.py)
  download | inline:
#!/usr/bin/env python3
"""
Run prefetch regression tests.

Usage:
    python run_regression_test.py [options]

Examples:
    python run_regression_test.py --iterations=5
    python run_regression_test.py --evict=off,pg --workers=0,2
    python run_regression_test.py --columns=sequential,random --reset
"""

import argparse
import subprocess
import itertools
import random
import os
import psycopg
import pandas as pd
from math import gcd


def parse_args():
    p = argparse.ArgumentParser(description='Run prefetch regression tests')
    p.add_argument('--iterations', '-n', type=int, default=10,
                   help='Number of test iterations (default: 10)')
    p.add_argument('--columns', '-c', type=str, default='sequential,periodic,random',
                   help='Columns to test (default: sequential,periodic,random)')
    p.add_argument('--workers', '-w', type=str, default='0,2',
                   help='Worker counts to test (default: 0,2)')
    p.add_argument('--evict', '-e', type=str, default='off',
                   help='Evict modes: off,pg,os or "all" (default: off)')
    p.add_argument('--rows', '-r', type=int, default=100000,
                   help='Number of rows in test table (default: 100000)')
    p.add_argument('--reset', action='store_true',
                   help='Reset tables before running')
    p.add_argument('--dbname', '--db', '-d', type=str, default='postgres',
                   help='Database name (default: postgres)')
    p.add_argument('--host', '-H', type=str, default='/tmp',
                   help='Database host (default: /tmp)')
    p.add_argument('--port', '-p', type=int, default=None,
                   help='Database port (default: use socket)')
    return p.parse_args()


def setup_tables(cur, num_rows, reset=False):
    """Create tables and populate test data."""
    
    if reset:
        cur.execute('DROP TABLE IF EXISTS prefetch_test_results')
        cur.execute('DROP TABLE IF EXISTS prefetch_test_data')
    
    cur.execute('''
        CREATE EXTENSION IF NOT EXISTS pg_buffercache
    ''');
    cur.execute('''
        CREATE TABLE IF NOT EXISTS prefetch_test_results (
            id SERIAL PRIMARY KEY,
            run_timestamp TIMESTAMPTZ DEFAULT now(),
            io_method TEXT NOT NULL,
            num_workers INT NOT NULL DEFAULT 0,
            prefetch_enabled BOOLEAN NOT NULL,
            evict_mode TEXT NOT NULL DEFAULT 'off',
            column_name TEXT NOT NULL,
            iteration INT,
            execution_time_ms NUMERIC,
            rows_returned BIGINT,
            blks_hit BIGINT,
            blks_read BIGINT
        )
    ''')

    cur.execute('''
        CREATE TABLE IF NOT EXISTS prefetch_test_data (
            id SERIAL PRIMARY KEY,
            sequential INT,
            periodic INT,
            random INT,
            payload TEXT
        )
    ''')

    cur.execute("SELECT count(*) FROM prefetch_test_data")
    row_count = cur.fetchone()[0]

    if row_count == 0:
        print(f"Populating test data ({num_rows} rows)...")
        r = min(10000, num_rows // 5)
        while gcd(r, num_rows) != 1:
            r += 1
        cur.execute(f'''
            INSERT INTO prefetch_test_data (sequential, periodic, random, payload)
            SELECT 
                i,
                ((i * {r}::bigint) % {num_rows} + 1)::int,
                row_number() OVER (ORDER BY random()),
                repeat('x', 200)
            FROM generate_series(1, {num_rows}) i
            ORDER BY i;
        ''')
        cur.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_sequential ON prefetch_test_data(sequential)")
        cur.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_periodic ON prefetch_test_data(periodic)")
        cur.execute("CREATE UNIQUE INDEX IF NOT EXISTS idx_random ON prefetch_test_data(random)")
        print("Data populated.")
    else:
        print(f"Test data exists: {row_count} rows")


def purge_os_cache():
    """Purge OS filesystem cache using external script."""
    script = os.path.join(os.path.dirname(__file__), 'purge_cache.sh')
    try:
        subprocess.run([script], capture_output=True, timeout=10)
    except:
        pass


def evict_pg_buffers(cur):
    """Evict PostgreSQL shared buffers for test table."""
    cur.execute('''
        SELECT count(pg_buffercache_evict(bufferid))
        FROM pg_buffercache b 
        JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)
        WHERE c.relname = 'prefetch_test_data'
    ''').fetchall()


def apply_eviction(cur, evict_mode):
    """Apply eviction based on mode."""
    if evict_mode == 'pg':
        evict_pg_buffers(cur)
    elif evict_mode == 'os':
        evict_pg_buffers(cur)
        purge_os_cache()


def run_test(cur, column_name, prefetch_enabled, num_workers, evict_mode, iteration):
    """Run a single test and record results."""
    
    cur.execute(f"SET enable_indexscan_prefetch = {'on' if prefetch_enabled else 'off'}")
    cur.execute(f"SET max_parallel_workers_per_gather = {num_workers}")
    cur.execute("SET enable_bitmapscan = off")
    cur.execute("SET enable_seqscan = off")
    cur.execute("SET enable_indexonlyscan = off")
    cur.execute("SET enable_sort = off")
    
    apply_eviction(cur, evict_mode)
    
    cur.execute(f'''
        EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) 
        SELECT length(payload) FROM prefetch_test_data 
        ORDER BY {column_name}
    ''')
    
    result = cur.fetchone()[0]
    plan = result[0]['Plan']
    exec_time = result[0]['Execution Time']
    blks_hit = plan.get('Shared Hit Blocks', 0)
    blks_read = plan.get('Shared Read Blocks', 0)
    rows = plan.get('Actual Rows', 0)
    
    cur.execute("SHOW io_method")
    io_method = cur.fetchone()[0]
    
    cur.execute('''
        INSERT INTO prefetch_test_results 
        (io_method, num_workers, prefetch_enabled, evict_mode, column_name, 
         iteration, execution_time_ms, rows_returned, blks_hit, blks_read)
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
    ''', (io_method, num_workers, prefetch_enabled, evict_mode, 
          column_name, iteration, exec_time, rows, blks_hit, blks_read))
    
    pf = 'Y' if prefetch_enabled else 'N'
    print(f"  [{column_name}] pf={pf} w={num_workers} evict={evict_mode}: {exec_time:.1f}ms")


def print_summary(cur):
    """Print summary of results."""
    df = pd.read_sql('''
        WITH avgs AS (
            SELECT column_name, io_method, num_workers, 
                   prefetch_enabled AS prefetch, evict_mode AS evict,
                   count(*) AS n, avg(execution_time_ms) AS ms
            FROM prefetch_test_results
            GROUP BY column_name, io_method, num_workers, prefetch_enabled, evict_mode
        )
        SELECT a.column_name, a.io_method, a.num_workers, a.evict, a.n,
               round(a.ms::numeric, 1) AS off_ms, 
               round(b.ms::numeric, 1) AS on_ms,
               round(((b.ms - a.ms) / NULLIF(b.ms + a.ms, 0) * 100)::numeric, 1) AS effect_pct
        FROM avgs a JOIN avgs b USING (column_name, io_method, num_workers, evict)
        WHERE NOT a.prefetch AND b.prefetch
        ORDER BY column_name, io_method, num_workers, evict
    ''', cur.connection)
    
    print("\n" + "=" * 80)
    print("SUMMARY: Prefetch Effect (positive = slower)")
    print("=" * 80)
    print(df.to_string(index=False))


def main():
    args = parse_args()
    
    # Parse list arguments
    columns = [c.strip() for c in args.columns.split(',')]
    workers = [int(w.strip()) for w in args.workers.split(',')]
    evict_modes = ['off', 'pg', 'os'] if args.evict == 'all' else [e.strip() for e in args.evict.split(',')]
    
    print(f"Config: iterations={args.iterations}, columns={columns}, workers={workers}, evict={evict_modes}")
    
    # Connect
    connstr = f"dbname={args.dbname} host={args.host}"
    if args.port:
        connstr += f" port={args.port}"
    conn = psycopg.connect(connstr)
    conn.autocommit = True
    cur = conn.cursor()
    
    # Setup
    setup_tables(cur, args.rows, args.reset)
    
    # Show io_method
    cur.execute("SHOW io_method")
    io_method = cur.fetchone()[0]
    print(f"\nio_method = {io_method}\n")
    
    # Run tests
    prefetch_opts = [False, True]
    
    for i in range(1, args.iterations + 1):
        print(f"Iteration {i}/{args.iterations}")
        
        configs = list(itertools.product(columns, prefetch_opts, workers, evict_modes))
        random.shuffle(configs)
        
        for col, pf, w, evict in configs:
            run_test(cur, col, pf, w, evict, i)
        
        print()
    
    # Summary
    print_summary(cur)
    conn.close()


if __name__ == "__main__":
    main()

  [application/octet-stream] test_prefetch_regressions.sql (9.0K, 8-test_prefetch_regressions.sql)
  download

view thread (87+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: index prefetching
  In-Reply-To: <CAE8JnxN_EwnTLLMWGhvgwaomYZ0ysm7NeogA-BqBd=Rs3S7Oqw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox