X-Original-To: pgsql-general-postgresql.org@localhost.postgresql.org Received: from localhost (unknown [200.46.204.144]) by svr1.postgresql.org (Postfix) with ESMTP id 784963A41FF for ; Fri, 26 Nov 2004 23:39:30 +0000 (GMT) Received: from svr1.postgresql.org ([200.46.204.71]) by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024) with ESMTP id 81858-06 for ; Fri, 26 Nov 2004 23:39:20 +0000 (GMT) Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130]) by svr1.postgresql.org (Postfix) with ESMTP id 45BA23A4B77 for ; Fri, 26 Nov 2004 23:39:20 +0000 (GMT) Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id iAQNdH2Z020227; Fri, 26 Nov 2004 18:39:17 -0500 (EST) To: Kenneth Tanzer Cc: pgsql-general@postgresql.org Subject: Re: Regexp matching: bug or operator error? In-reply-to: <41A7ADA8.5080205@desc.org> References: <41A3C6C6.2090605@desc.org> <1713.1101254499@sss.pgh.pa.us> <1846.1101255318@sss.pgh.pa.us> <41A4CAA5.2040908@desc.org> <22644.1101340967@sss.pgh.pa.us> <41A7ADA8.5080205@desc.org> Comments: In-reply-to Kenneth Tanzer message dated "Fri, 26 Nov 2004 14:26:48 -0800" Date: Fri, 26 Nov 2004 18:39:16 -0500 Message-ID: <20226.1101512356@sss.pgh.pa.us> From: Tom Lane X-Virus-Scanned: by amavisd-new at hub.org X-Spam-Status: No, hits=0.0 tagged_above=0.0 required=5.0 tests= X-Spam-Level: X-Archive-Number: 200411/1342 X-Sequence-Number: 69422 Kenneth Tanzer writes: > But what about these two queries: > SELECT substring('a' FROM 'a?|a?'); > This returns a greedy 'a', similar to the example above. But then why does > SELECT substring('ba' FROM 'a?|a?'); > return a non-greedy empty string? You're ignoring the first rule of matching: when there is more than one possible match, the match starting earliest in the string is chosen. The longer-or-shorter business only applies when there are multiple legal ways to form a match starting at the same place. In this case 'a?' can form a legal match at the beginning of the string (ie, match to no characters) and so the fact that a longer match is available later in the string doesn't enter into it. > With regard to the documentation, after re-reading it many times I'd > have to say the information is all there, but it's hard to absorb. I'd agree. This section was taken nearly verbatim from Henry Spencer's man page for the regexp package, and with all due respect to Henry, it's definitely written in geek reference-page-speak. Maybe a few examples would help. On the other hand, I don't want to try to turn the section into a regexp tutorial --- there are entire books written about regexps (I quite like the O'Reilly one, btw). So there's a bulk-vs-friendliness tradeoff to be made. > I think the main problem is that the term "preference" is used to > discuss greedy/non-greediness, as well as the words greedy & > non-greedy. Good point. It would help to use only one term. > As an example, here's a couple of different possibilities for the second > sentence of the section: I like this one: > b) If the RE could match more than one substring starting at that point, > the match can be either greedy (matching the longest substring) or > non-greedy (matching the shortest substring). Whether an RE is greedy > or not is determined by the following rules... Given that intro, there's no need to use the word "preference" at all. Or almost --- what term will you use for "RE with no preference"? Perhaps you can avoid the question by pointing out that greediness only matters for quantifiers, since unquantified REs can only match fixed-length strings. The point you make here: > c) Like individual components of an RE, the entire RE can be either > greedy (matching the longest substring) or non-greedy (matching the > shortest substring). is also important, but probably needs to be a completely separate paragraph containing its own example. > Do you think an edit along these lines would be helpful? If so, I'd be > willing to take a shot at re-writing that section. Let me know. Thanks. Fire away. Please send whatever you come up with to the pgsql-docs list. regards, tom lane