public inbox for [email protected]help / color / mirror / Atom feed
index of only not null, use function index? 5+ messages / 4 participants [nested] [flat]
* index of only not null, use function index? @ 2017-05-22 15:17 Ariel <[email protected]> 2017-06-08 14:47 ` Re: index of only not null, use function index? Merlin Moncure <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Ariel @ 2017-05-22 15:17 UTC (permalink / raw) To: pgsql-performance I need to be able to quickly find rows where a column is not null (only a small percent of the rows will have that column not null). Should I do: CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL or: CREATE INDEX ON table (col) WHERE col IS NOT NULL I'm thinking the first index will make a smaller, simpler, index since I don't actually need to index the value of the column. But are there any drawbacks I may not be aware of? Or perhaps there are no actual benefits? -Ariel -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: index of only not null, use function index? 2017-05-22 15:17 index of only not null, use function index? Ariel <[email protected]> @ 2017-06-08 14:47 ` Merlin Moncure <[email protected]> 2017-06-08 14:58 ` Re: index of only not null, use function index? Tom Lane <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Merlin Moncure @ 2017-06-08 14:47 UTC (permalink / raw) To: Ariel <[email protected]>; +Cc: pgsql-performance On Mon, May 22, 2017 at 10:17 AM, Ariel <[email protected]> wrote: > > I need to be able to quickly find rows where a column is not null (only a > small percent of the rows will have that column not null). > > Should I do: > > CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL > > or: > > CREATE INDEX ON table (col) WHERE col IS NOT NULL > > I'm thinking the first index will make a smaller, simpler, index since I > don't actually need to index the value of the column. But are there any > drawbacks I may not be aware of? Or perhaps there are no actual benefits? You are correct. I don't see any downside to converting to bool; this will be more efficient especially if 'col' is large at the small cost of some generality. Having said that, what I typically do in such cases (this comes a lot in database driven work queues) something like this: CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL; Where "OrderCol" is some field that defines some kind of order to the items that you are marking off. This will give very good performance of queries in the form of: SELECT Col FROM table WHERE col IS NOT NULL ORDER BY OrderCol LIMIT 1; merlin -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: index of only not null, use function index? 2017-05-22 15:17 index of only not null, use function index? Ariel <[email protected]> 2017-06-08 14:47 ` Re: index of only not null, use function index? Merlin Moncure <[email protected]> @ 2017-06-08 14:58 ` Tom Lane <[email protected]> 2017-06-08 16:05 ` Re: index of only not null, use function index? Jeremy Finzel <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Tom Lane @ 2017-06-08 14:58 UTC (permalink / raw) To: Merlin Moncure <[email protected]>; +Cc: Ariel <[email protected]>; pgsql-performance Merlin Moncure <[email protected]> writes: > On Mon, May 22, 2017 at 10:17 AM, Ariel <[email protected]> wrote: >> Should I do: >> >> CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL >> >> or: >> >> CREATE INDEX ON table (col) WHERE col IS NOT NULL >> >> I'm thinking the first index will make a smaller, simpler, index since I >> don't actually need to index the value of the column. But are there any >> drawbacks I may not be aware of? Or perhaps there are no actual benefits? > You are correct. I don't see any downside to converting to bool; this > will be more efficient especially if 'col' is large at the small cost > of some generality. Depends on the datatype really. Because of alignment considerations, the index tuples will be the same size for any column value <= 4 bytes, or <= 8 bytes on 64-bit hardware. So if this is an integer column, or even bigint on 64-bit, you won't save any space with the first index definition. If it's a text column with an average width larger than what I just mentioned, you could save some space that way. In general, indexes on expressions are a tad more expensive to maintain than indexes on plain column values. And the second index at least has the potential to be useful for other queries than the one you're thinking about. So personally I'd go with the second definition unless you can show that there's a really meaningful space savings with the first one. > Having said that, what I typically do in such > cases (this comes a lot in database driven work queues) something like > this: > CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL; Right, you can frequently get a lot of mileage out of indexing something that's unrelated to the predicate condition, but is also needed by the query you want to optimize. regards, tom lane -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: index of only not null, use function index? 2017-05-22 15:17 index of only not null, use function index? Ariel <[email protected]> 2017-06-08 14:47 ` Re: index of only not null, use function index? Merlin Moncure <[email protected]> 2017-06-08 14:58 ` Re: index of only not null, use function index? Tom Lane <[email protected]> @ 2017-06-08 16:05 ` Jeremy Finzel <[email protected]> 2017-06-08 19:57 ` Re: index of only not null, use function index? Merlin Moncure <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Jeremy Finzel @ 2017-06-08 16:05 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Merlin Moncure <[email protected]>; Ariel <[email protected]>; pgsql-performance Normally, I find that in these situations, it makes sense to index the primary key of the table WHERE col is not null, because it will usually cover the largest number of cases, and is much better than a two-value boolean index, for example. On Thu, Jun 8, 2017 at 9:58 AM, Tom Lane <[email protected]> wrote: > Merlin Moncure <[email protected]> writes: > > On Mon, May 22, 2017 at 10:17 AM, Ariel <[email protected]> wrote: > >> Should I do: > >> > >> CREATE INDEX ON table ((col IS NOT NULL)) WHERE col IS NOT NULL > >> > >> or: > >> > >> CREATE INDEX ON table (col) WHERE col IS NOT NULL > >> > >> I'm thinking the first index will make a smaller, simpler, index since I > >> don't actually need to index the value of the column. But are there any > >> drawbacks I may not be aware of? Or perhaps there are no actual > benefits? > > > You are correct. I don't see any downside to converting to bool; this > > will be more efficient especially if 'col' is large at the small cost > > of some generality. > > Depends on the datatype really. Because of alignment considerations, > the index tuples will be the same size for any column value <= 4 bytes, > or <= 8 bytes on 64-bit hardware. So if this is an integer column, > or even bigint on 64-bit, you won't save any space with the first > index definition. If it's a text column with an average width larger > than what I just mentioned, you could save some space that way. > > In general, indexes on expressions are a tad more expensive to maintain > than indexes on plain column values. And the second index at least has > the potential to be useful for other queries than the one you're thinking > about. So personally I'd go with the second definition unless you can > show that there's a really meaningful space savings with the first one. > > > Having said that, what I typically do in such > > cases (this comes a lot in database driven work queues) something like > > this: > > CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL; > > Right, you can frequently get a lot of mileage out of indexing something > that's unrelated to the predicate condition, but is also needed by the > query you want to optimize. > > regards, tom lane > > > -- > Sent via pgsql-performance mailing list ([email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: index of only not null, use function index? 2017-05-22 15:17 index of only not null, use function index? Ariel <[email protected]> 2017-06-08 14:47 ` Re: index of only not null, use function index? Merlin Moncure <[email protected]> 2017-06-08 14:58 ` Re: index of only not null, use function index? Tom Lane <[email protected]> 2017-06-08 16:05 ` Re: index of only not null, use function index? Jeremy Finzel <[email protected]> @ 2017-06-08 19:57 ` Merlin Moncure <[email protected]> 0 siblings, 0 replies; 5+ messages in thread From: Merlin Moncure @ 2017-06-08 19:57 UTC (permalink / raw) To: Jeremy Finzel <[email protected]>; +Cc: Tom Lane <[email protected]>; Ariel <[email protected]>; pgsql-performance On Thu, Jun 8, 2017 at 11:05 AM, Jeremy Finzel <[email protected]> wrote: > On Thu, Jun 8, 2017 at 9:58 AM, Tom Lane <[email protected]> wrote: >> Merlin Moncure <[email protected]> writes: >> > Having said that, what I typically do in such >> > cases (this comes a lot in database driven work queues) something like >> > this: >> > CREATE INDEX ON table (OrderCol) WHERE col IS NOT NULL; >> >> Right, you can frequently get a lot of mileage out of indexing something >> that's unrelated to the predicate condition, but is also needed by the >> query you want to optimize. > Normally, I find that in these situations, it makes sense to index the > primary key of the table WHERE col is not null, because it will usually > cover the largest number of cases, and is much better than a two-value > boolean index, for example. [meta note: please try to avoid top-posting] Yeah, if you index the primary key and query it like this: CREATE INDEX ON table (pkey) WHERE col IS NOT NULL; SELECT pkey FROM table WHERE col IS NOT NULL ORDER BY pkey LIMIT n; This can give the best possible results since this can qualify for an index only scan :-). merlin -- Sent via pgsql-performance mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance ^ permalink raw reply [nested|flat] 5+ messages in thread
end of thread, other threads:[~2017-06-08 19:57 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2017-05-22 15:17 index of only not null, use function index? Ariel <[email protected]> 2017-06-08 14:47 ` Merlin Moncure <[email protected]> 2017-06-08 14:58 ` Tom Lane <[email protected]> 2017-06-08 16:05 ` Jeremy Finzel <[email protected]> 2017-06-08 19:57 ` Merlin Moncure <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox