Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bYe5D-0007w1-Ci for pgsql-performance@arkaria.postgresql.org; Sat, 13 Aug 2016 18:54:59 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bYe5C-0002fu-05 for pgsql-performance@arkaria.postgresql.org; Sat, 13 Aug 2016 18:54:58 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bYe5A-0002ff-DT for pgsql-performance@postgresql.org; Sat, 13 Aug 2016 18:54:56 +0000 Received: from mail-io0-x232.google.com ([2607:f8b0:4001:c06::232]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bYe56-0004RG-LM for pgsql-performance@postgresql.org; Sat, 13 Aug 2016 18:54:55 +0000 Received: by mail-io0-x232.google.com with SMTP id 38so51584292iol.0 for ; Sat, 13 Aug 2016 11:54:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telsasoft-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:mime-version:content-disposition :in-reply-to:user-agent; bh=IuZec+3uh+WR+AX4X51eVTmChuuELmg/Xg0E3JlRAEw=; b=KnOp2Q2XSFZuUifoxaze6hzVTbrFkafvu/O46xHA460CyWC+m5cTZCBjXjW0AUfqpy Zja+pB9y9RWUYNsA3LiQfiU4K1kO2jPzscJf+FCr5daTvqo1lTxJ1pA9njTW6OgqQnfi wg8XAGKvJ05qTo/TcaqDFGnRbVT4agZ6eUFpg9pO/k8NACzxEA1BcIYzjvFKFT8AOb5D rm64QCUhNUXsPgFU8VpMVioJeLHRk3tDvC15T8+vXzfrH3vU4rQV2LN1/pRA5bA60t6H hPAtuMmQehGMmn/zZmj+gswpN+yqNJGnoQ/hiPPuEt4ibE4wMMECSkukrKK87AKS8VGU eVdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:in-reply-to:user-agent; bh=IuZec+3uh+WR+AX4X51eVTmChuuELmg/Xg0E3JlRAEw=; b=a3nwKewz90bRynEcnYum89G44zra691+fst0OeevT7CoohE5FN4WIyZoMN0/AVL0C+ tMH9b7f4+NAkKVTV8iOwyvaVX99fA18mBo16Cw3FOGT1mnl3idaGUKSOrXPm750CA1Nz VDCKptVElELwEgz0AmBF2cz0yiXOiRaZ8eiTMA7UHHJ/lD9zhJSUgyMFvpD7Jbd3BYfz RFnK64/CmA4SqaxC2Ll2o7I0G2qNRytADccXJm0OiaVCuucOWiderkD2HZ6mCg4H84qX jr7VHG255TnTuAMeOtDeZle0xDu5Z41QHg8+UdStt6Wpx5OMr3cqV3a9oBpngJ9Cgli3 Ngdw== X-Gm-Message-State: AEkoouuyFDmmwnf6FhkJ0S1mxIY8PWW/WPJ4rfXmcgoWx6lcaz5wOylKYfbDTJBbxmP2IA== X-Received: by 10.107.20.206 with SMTP id 197mr27416987iou.103.1471114490352; Sat, 13 Aug 2016 11:54:50 -0700 (PDT) Received: from pryzbyj (charmander.telsasoft.com. [50.244.222.1]) by smtp.gmail.com with ESMTPSA id r190sm820469ith.8.2016.08.13.11.54.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 13 Aug 2016 11:54:49 -0700 (PDT) Received: by pryzbyj (Postfix, from userid 1000) id 7B6CA801FC0; Sat, 13 Aug 2016 13:54:48 -0500 (CDT) Date: Sat, 13 Aug 2016 13:54:48 -0500 From: Justin Pryzby To: pgsql-performance@postgresql.org Cc: Jeff Janes , Tom Lane , Claudio Freire Subject: Re: index fragmentation on insert-only table with non-unique column Message-ID: <20160813185448.GP1179@telsasoft.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Pg-Spam-Score: -1.1 (-) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Regarding this earlier thread: https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.GA11880@telsasoft.com On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby wrote: > Summary: Non-unique btree indices are returning CTIDs for rows with same > value of indexed column not in logical order, imposing a high performance > penalty. I have to point out that by "logical" I clearly meant "physical", hopefully nobody was too misled.. On Sun, Jun 05, 2016 at 12:28:47PM -0700, Jeff Janes wrote: > On Sun, Jun 5, 2016 at 9:03 AM, Tom Lane wrote: > > Claudio Freire writes: > >> So correlated index scans look extra favourable vs bitmap index scans > >> because bitmap heap scans consider random page costs sans correlation > >> effects (even though correlation applies to bitmap heap scans as > >> well). > > > > Really? How? The index ordering has nothing to do with the order in > > which heap tuples will be visited. > > It is not the order itself, but the density. > > If the index is read in a range scan (as opposed to =ANY scan), and > the index lead column is correlated with the table ordering, then the > parts of the table that need to be visited will be much denser than if > there were no correlation. But Claudio is saying that this is not > being accounted for. I didn't completely understand Claudio/Jeff here, and not sure if we're on the same page. For queries on these tables, the index scan was very slow, due to fragmented index on non-unique column, and seq scan would have been (was) faster (even if it means reading 70GB and filtering out 6 of 7 days' data). That was resolved by added a nightly reindex job (.. which sometimes competes with other maintenance and has trouble running every table every night). But I did find that someone else had previously reported this problem (in a strikingly similar context and message, perhaps clearer than mine): https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520D6610.8040907@emulex.com I also found this older thread: https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou1kn@4ax.com There was mention of a TODO item: * Compute index correlation on CREATE INDEX and ANALYZE, use it for index * scan cost estimation .. but perhaps I misunderstand and that's long since resolved ? Justin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance