public inbox for [email protected]  
help / color / mirror / Atom feed
Lexical Structure - String Constants
3+ messages / 2 participants
[nested] [flat]

* Lexical Structure - String Constants
@ 2014-06-17 03:14  Sérgio Saquetim <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Sérgio Saquetim @ 2014-06-17 03:14 UTC (permalink / raw)
  To: pgsql-docs

Hi,

I'm trying to build in Java a SQL lexer/parser, compliant with PostgreSQL
9.3, from scratch as a hobby project and reading chapter 4, section 4.1 (
http://www.postgresql.org/docs/9.3/interactive/sql-syntax-lexical.html) and
I've noticed a few things I thought I should mention:

In section 4.1.2.1, the following text introduces us to SQL's bizarre
multiline/multisegment split style: "Two string constants that are only
separated by whitespace with at least one newline are concatenated and
effectively treated as if the string had been written as one constant."

The text does not mention if comments are allowed between segments, so I've
run a few tests on PSQL (PostgreSQL 9.3.4):

                                               version

------------------------------------------------------------------------------------------------------
 PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-16ubuntu6) 4.8.2, 64-bit
(1 row)

postgres=# SELECT 'a'
'b';
 ?column?
----------
 ab
(1 row)

postgres=# SELECT 'a' --comment
'b';
 ?column?
----------
 ab
(1 row)

So far everything worked, but I've got different results with C style block
comments:

postgres=# SELECT 'a' /*comment*/
'b';
ERROR:  syntax error at or near "'b'"
LINE 2: 'b';

So line style comments (--) are accepted between segments but not C style
block comments (/* */). Do you think this difference in behavior should me
mentioned in the docs?

I've also noticed that in section 4.1.2.6, the following statement: "At
least one digit must follow the exponent marker (e), if one is present."

As I've understood the statement, I think it says that the following
instruction should not be valid because the exponent marker is not followed
by at least one digit, but the expression is successfully evaluated:

postgres=# SELECT 10e;
 e
----
 10
(1 row)

That said, I live in Brazil and English is not my first language so I may
be mistaken, but I thought I should bring this to this list.

Regards,

Sérgio Saquetim


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Lexical Structure - String Constants
@ 2014-06-17 04:12  Tom Lane <[email protected]>
  parent: Sérgio Saquetim <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Tom Lane @ 2014-06-17 04:12 UTC (permalink / raw)
  To: Sérgio Saquetim <[email protected]>; +Cc: pgsql-docs

=?UTF-8?Q?S=C3=A9rgio_Saquetim?= <[email protected]> writes:
> So line style comments (--) are accepted between segments but not C style
> block comments (/* */). Do you think this difference in behavior should me
> mentioned in the docs?

Hm, interesting.  It looks to me like modern versions of the SQL spec
require either -- or /* ... */ style comments to be allowed between
segments of a quoted literal.  This is pretty bad taste in language
design, if you ask me, but that's what it seems to say.  I think that
our current lexer rules date from before the SQL standard even had
/* ... */ style comments, which is why the lexer isn't taking it.

> I've also noticed that in section 4.1.2.6, the following statement: "At
> least one digit must follow the exponent marker (e), if one is present."

> As I've understood the statement, I think it says that the following
> instruction should not be valid because the exponent marker is not followed
> by at least one digit, but the expression is successfully evaluated:

> postgres=# SELECT 10e;
>  e
> ----
>  10
> (1 row)

"10e" is not a valid number, just like the manual says.  But "10" is a
valid number, and "e" is a valid column alias, so this is equivalent
to "SELECT 10 AS e".  There's no requirement for white space between
adjacent tokens, if the tokens couldn't validly be run together into
one token.

			regards, tom lane


-- 
Sent via pgsql-docs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs



^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Lexical Structure - String Constants
@ 2014-06-17 22:19  Sérgio Saquetim <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: Sérgio Saquetim @ 2014-06-17 22:19 UTC (permalink / raw)
  To: ; +Cc: pgsql-docs

> "10e" is not a valid number, just like the manual says.  But "10" is a
> valid number, and "e" is a valid column alias, so this is equivalent
> to "SELECT 10 AS e".  There's no requirement for white space between
> adjacent tokens, if the tokens couldn't validly be run together into
> one token.

Thanks Tom,

I haven't noticed that fact. I'll refactor my lexer to deal with that.

Regards,

Sérgio Saquetim


^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2014-06-17 22:19 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2014-06-17 03:14 Lexical Structure - String Constants Sérgio Saquetim <[email protected]>
2014-06-17 04:12 ` Tom Lane <[email protected]>
2014-06-17 22:19   ` Sérgio Saquetim <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox