public inbox for [email protected]
help / color / mirror / Atom feed[PATCH] pgweb: Search: bugfix: replace sites.baseurl with sites.hostname
4+ messages / 2 participants
[nested] [flat]
* [PATCH] pgweb: Search: bugfix: replace sites.baseurl with sites.hostname
@ 2021-10-22 14:59 Célestin Matte <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Célestin Matte @ 2021-10-22 14:59 UTC (permalink / raw)
To: PostgreSQL WWW <[email protected]>
Hello,
There is an issue in the sql files for search in pgweb, as schema.sql does not correspond to what is used in functions.sql.
functions.sql uses sites.baseurl, which is not a field in schema.sql. I replaced it with sites.hostname, which I guess is the targeted field.
Without that patch, I get the following error when performing a search:
Traceback (most recent call last):
File "/srv/pgweb/local/pgweb/search/views.py", line 284, in search
'internal': include_internal,
psycopg2.errors.UndefinedColumn: column sites.baseurl does not exist
LINE 1: SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, t...
^
QUERY: SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, ts_rank_cd(fti,tsq) * relprio AS ts_rank_cd FROM web
pages INNER JOIN sites ON webpages.site=sites.id WHERE fti @@ tsq AND site=1 AND (includeinternal OR NOT isinternal) OR
DER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000
CONTEXT: PL/pgSQL function site_search(text,integer,integer,text,boolean) line 21 at OPEN
Cheers,
--
Célestin Matte
Attachments:
[text/x-patch] 0001-Search-bugfix-replace-sites.baseurl-with-sites.hostn.patch (2.0K, 2-0001-Search-bugfix-replace-sites.baseurl-with-sites.hostn.patch)
download | inline diff:
From fa7d93df7236721ed97f1f85d974eb1c91a01537 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9lestin=20Matte?= <[email protected]>
Date: Fri, 22 Oct 2021 16:32:04 +0200
Subject: [PATCH] Search: bugfix: replace sites.baseurl with sites.hostname
---
tools/search/sql/functions.sql | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/search/sql/functions.sql b/tools/search/sql/functions.sql
index 9ee5c4a5..b3dcc6c4 100644
--- a/tools/search/sql/functions.sql
+++ b/tools/search/sql/functions.sql
@@ -20,9 +20,9 @@ BEGIN
SELECT INTO pagecount sites.pagecount FROM sites WHERE id=1;
IF _suburl IS NULL THEN
- OPEN curs FOR SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, ts_rank_cd(fti,tsq) * relprio AS ts_rank_cd FROM webpages INNER JOIN sites ON webpages.site=sites.id WHERE fti @@ tsq AND site=1 AND (includeinternal OR NOT isinternal) ORDER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000;
+ OPEN curs FOR SELECT sites.id AS siteid, sites.hostname AS baseurl, webpages.suburl, ts_rank_cd(fti,tsq) * relprio AS ts_rank_cd FROM webpages INNER JOIN sites ON webpages.site=sites.id WHERE fti @@ tsq AND site=1 AND (includeinternal OR NOT isinternal) ORDER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000;
ELSE
- OPEN curs FOR SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, ts_rank_cd(fti,tsq) * relprio AS ts_rank_cd FROM webpages INNER JOIN sites ON webpages.site=sites.id WHERE fti @@ tsq AND site=1 AND webpages.suburl LIKE _suburl||'%' AND (includeinternal OR NOT isinternal) ORDER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000;
+ OPEN curs FOR SELECT sites.id AS siteid, sites.hostname AS baseurl, webpages.suburl, ts_rank_cd(fti,tsq) * relprio AS ts_rank_cd FROM webpages INNER JOIN sites ON webpages.site=sites.id WHERE fti @@ tsq AND site=1 AND webpages.suburl LIKE _suburl||'%' AND (includeinternal OR NOT isinternal) ORDER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000;
END IF;
LOOP
FETCH curs INTO hit;
--
2.33.1
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: [PATCH] pgweb: Search: bugfix: replace sites.baseurl with sites.hostname
@ 2021-10-23 14:03 Magnus Hagander <[email protected]>
parent: Célestin Matte <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Magnus Hagander @ 2021-10-23 14:03 UTC (permalink / raw)
To: Célestin Matte <[email protected]>; +Cc: PostgreSQL WWW <[email protected]>
On Fri, Oct 22, 2021 at 4:59 PM Célestin Matte <[email protected]>
wrote:
> Hello,
>
> There is an issue in the sql files for search in pgweb, as schema.sql does
> not correspond to what is used in functions.sql.
> functions.sql uses sites.baseurl, which is not a field in schema.sql. I
> replaced it with sites.hostname, which I guess is the targeted field.
>
> Without that patch, I get the following error when performing a search:
>
> Traceback (most recent call last):
>
> File "/srv/pgweb/local/pgweb/search/views.py", line 284, in search
>
> 'internal': include_internal,
>
> psycopg2.errors.UndefinedColumn: column sites.baseurl does not exist
>
> LINE 1: SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, t...
>
> ^
>
> QUERY: SELECT sites.id AS siteid, sites.baseurl, webpages.suburl,
> ts_rank_cd(fti,tsq) * relprio AS ts_rank_cd FROM web
> pages INNER JOIN sites ON webpages.site=sites.id WHERE fti @@ tsq AND
> site=1 AND (includeinternal OR NOT isinternal) OR
> DER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000
>
> CONTEXT: PL/pgSQL function site_search(text,integer,integer,text,boolean)
> line 21 at OPEN
>
Is that really right/enough?
All the instances I run have both fields, and for example hostname is
www.postgresql.org and baseurl is https://www.postgresql.org.
Now, today that's the only site we have, so we could just replace it with
putting the https:// part into it int he templates, but I think this
originally comes from the times when we indexed multiple sites in a
previous system -- and it appears to have been incorrectly imported in the
very first commit. The search system is really due for a
cleanup/replacement since that has been removed, because a lot of things
could be simplified around that.
But short term, I believe the correct thing to do is to add the baseurl
field to the table in the SQL file. I'll go push a patch taht does that.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/;
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: [PATCH] pgweb: Search: bugfix: replace sites.baseurl with sites.hostname
@ 2021-10-25 16:19 Célestin Matte <[email protected]>
parent: Magnus Hagander <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Célestin Matte @ 2021-10-25 16:19 UTC (permalink / raw)
To: [email protected]
Hello,
> But short term, I believe the correct thing to do is to add the baseurl field to the table in the SQL file. I'll go push a patch taht does that.
This broke data.sql as baseurl is NOT NULL. Attached patch fixes that.
This might have also broken the crawler but I haven't installed it to test that.
Cheers,
--
Célestin Matte
Attachments:
[text/x-patch] 0001-Fix-data.sql-after-addition-of-baseurl-in-52455885.patch (1.4K, 2-0001-Fix-data.sql-after-addition-of-baseurl-in-52455885.patch)
download | inline diff:
From 0e876a52caf70dfa9f0bc3b6bbf2668be0258fe2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9lestin=20Matte?= <[email protected]>
Date: Mon, 25 Oct 2021 18:16:48 +0200
Subject: [PATCH] Fix data.sql after addition of baseurl in 52455885
---
tools/search/sql/data.sql | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/search/sql/data.sql b/tools/search/sql/data.sql
index 5e12e9d2..64be2429 100644
--- a/tools/search/sql/data.sql
+++ b/tools/search/sql/data.sql
@@ -1,11 +1,11 @@
-INSERT INTO sites (id, hostname, description, pagecount)
- VALUES (1, 'www.postgresql.org', 'Main PostgreSQL Website', 0);
+INSERT INTO sites (id, hostname, baseurl, description, pagecount)
+ VALUES (1, 'www.postgresql.org', 'https://www.postgresql.org', 'Main PostgreSQL Website', 0);
-INSERT INTO sites (id, hostname, description, pagecount)
- VALUES (2, 'www.pgadmin.org','pgAdmin III', 0);
+INSERT INTO sites (id, hostname, baseurl, description, pagecount)
+ VALUES (2, 'www.pgadmin.org', 'https://www.pgadmin.org', 'pgAdmin III', 0);
-INSERT INTO sites (id, hostname, description, pagecount)
- VALUES (3, 'jdbc.postgresql.org','JDBC driver', 0);
+INSERT INTO sites (id, hostname, baseurl, description, pagecount)
+ VALUES (3, 'jdbc.postgresql.org', 'https://jdbc.postgresql.org', 'JDBC driver', 0);
INSERT INTO site_excludes VALUES (2,'^/archives');
--
2.33.1
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: [PATCH] pgweb: Search: bugfix: replace sites.baseurl with sites.hostname
@ 2021-10-27 13:47 Magnus Hagander <[email protected]>
parent: Célestin Matte <[email protected]>
0 siblings, 0 replies; 4+ messages in thread
From: Magnus Hagander @ 2021-10-27 13:47 UTC (permalink / raw)
To: Célestin Matte <[email protected]>; Jonathan S. Katz <[email protected]>; +Cc: PostgreSQL WWW <[email protected]>
On Mon, Oct 25, 2021 at 6:19 PM Célestin Matte <[email protected]>
wrote:
> Hello,
>
> > But short term, I believe the correct thing to do is to add the baseurl
> field to the table in the SQL file. I'll go push a patch taht does that.
>
> This broke data.sql as baseurl is NOT NULL. Attached patch fixes that.
> This might have also broken the crawler but I haven't installed it to test
> that.
>
Indeed it does. Oops.
That said, most of that file are not-current examples, and should just be
cleaned out, so I'll do that while fixing this.
Yeah, another sign that this needs some proper spring cleaning to get rid
of configuration of the flexibility that we don't care about. But tihs one
for now, to unbreak things short-term.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/;
^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2021-10-27 13:47 UTC | newest]
Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22 14:59 [PATCH] pgweb: Search: bugfix: replace sites.baseurl with sites.hostname Célestin Matte <[email protected]>
2021-10-23 14:03 ` Magnus Hagander <[email protected]>
2021-10-25 16:19 ` Célestin Matte <[email protected]>
2021-10-27 13:47 ` Magnus Hagander <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox