Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1meHcx-0003aZ-DX for pgsql-www@arkaria.postgresql.org; Sat, 23 Oct 2021 14:04:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1meHcw-00033S-9p for pgsql-www@arkaria.postgresql.org; Sat, 23 Oct 2021 14:04:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1meHcv-00031n-Uv for pgsql-www@lists.postgresql.org; Sat, 23 Oct 2021 14:04:02 +0000 Received: from mail-lj1-x236.google.com ([2a00:1450:4864:20::236]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1meHct-00047p-7F for pgsql-www@lists.postgresql.org; Sat, 23 Oct 2021 14:04:01 +0000 Received: by mail-lj1-x236.google.com with SMTP id e19so1241845ljk.12 for ; Sat, 23 Oct 2021 07:03:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hagander-net.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oX8wPEMx5MvEShg5OyBnonJTaTKVbsthNPUViZ7WY/E=; b=e+3QfCJfm1KzDQ5+qr5SJkibjUZC3/g/ldKaXPKpKyu26K2ascrY2WLQgMpoWZhnNr Zi84v6hg5/fFvDNUdRgjnIDWsEP21iDYVn8XZosyZSWvD07hhlJCHzNb5e8tYFEEcpkV uP9DxYYdElR6Uno3pjUfpEm9x4GNffN6j9f2cfzyw/pI+VVlQsKMb4qibBG1vDs3BsM6 SnHXxGNFT+gBeQgybH5w1oIUuyNQrESniDsxzA9KbP9jxAG3z8Y28eOtUjmJRS9vow4j AVvteGYR7nMTWRJKMAJN0B3rpAiRmGJP+OZP1Xg3AUyuQzB9UpNDe4elKzmAt+d5izMM yPhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oX8wPEMx5MvEShg5OyBnonJTaTKVbsthNPUViZ7WY/E=; b=40OFUHW0FMIgCY2LMAjDTmOPZMCoLgvy7EEmN+zY61jXJ9A0cw6usN7Ubq4HnQZCJA Ct4bAwV0thm23bN8M8kF3TzGVqZHR8dTuNlpDh2UWdPJnG7JkJK1Q57K8xnmIuhaVgIi SHcVjR0dVhYEuFCsX2QNIC55spsWuXKjyLrd/6YO6Kz5WE2oSA5GzL2y3Yr97dTu5iIU hwZyhErTr9zdU27zUD8T9AaMe0mAQM9AS5zD/wKUryfyuj0vQOzGEH+BOYmtxeuDCl/V Gpff06+3MXTLYPtOMe0g+9jKA2RIaSWQbw9MqOsosmHpyEdWEUdNawoXN6XAgzVDcoFZ FsCw== X-Gm-Message-State: AOAM532VwSBSskLEc0cA/MnoPw726duR6/CSi+4jROGvfwn/GHKcWhOo eSBpJEyBN8uXmcz10dlGEFe5Xk6c0p0GeuGPn8xhu8dHVLI= X-Google-Smtp-Source: ABdhPJwZe3A+6vtOU/P3AuqeN8QVMtn2PRKd9bTY0SEAvenjXn1oV9XN3rKvTLYlepy5Q9zFV6xgHw3NHuVlg2OhYFA= X-Received: by 2002:a2e:a710:: with SMTP id s16mr6524925lje.444.1634997838171; Sat, 23 Oct 2021 07:03:58 -0700 (PDT) MIME-Version: 1.0 References: <21fbb020-83c6-33e5-9cfe-0aee56badf3e@cmatte.me> In-Reply-To: <21fbb020-83c6-33e5-9cfe-0aee56badf3e@cmatte.me> From: Magnus Hagander Date: Sat, 23 Oct 2021 16:03:46 +0200 Message-ID: Subject: Re: [PATCH] pgweb: Search: bugfix: replace sites.baseurl with sites.hostname To: =?UTF-8?Q?C=C3=A9lestin_Matte?= Cc: PostgreSQL WWW Content-Type: multipart/alternative; boundary="0000000000005a660305cf05996f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005a660305cf05996f Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Oct 22, 2021 at 4:59 PM C=C3=A9lestin Matte wrote: > Hello, > > There is an issue in the sql files for search in pgweb, as schema.sql doe= s > not correspond to what is used in functions.sql. > functions.sql uses sites.baseurl, which is not a field in schema.sql. I > replaced it with sites.hostname, which I guess is the targeted field. > > Without that patch, I get the following error when performing a search: > > Traceback (most recent call last): > > File "/srv/pgweb/local/pgweb/search/views.py", line 284, in search > > 'internal': include_internal, > > psycopg2.errors.UndefinedColumn: column sites.baseurl does not exist > > LINE 1: SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, t... > > ^ > > QUERY: SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, > ts_rank_cd(fti,tsq) * relprio AS ts_rank_cd FROM web > pages INNER JOIN sites ON webpages.site=3Dsites.id WHERE fti @@ tsq AND > site=3D1 AND (includeinternal OR NOT isinternal) OR > DER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000 > > CONTEXT: PL/pgSQL function site_search(text,integer,integer,text,boolean= ) > line 21 at OPEN > Is that really right/enough? All the instances I run have both fields, and for example hostname is www.postgresql.org and baseurl is https://www.postgresql.org. Now, today that's the only site we have, so we could just replace it with putting the https:// part into it int he templates, but I think this originally comes from the times when we indexed multiple sites in a previous system -- and it appears to have been incorrectly imported in the very first commit. The search system is really due for a cleanup/replacement since that has been removed, because a lot of things could be simplified around that. But short term, I believe the correct thing to do is to add the baseurl field to the table in the SQL file. I'll go push a patch taht does that. --=20 Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ --0000000000005a660305cf05996f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Fri, Oct 22, 2021 at 4:59 PM C=C3= =A9lestin Matte <celestin.ma= tte@cmatte.me> wrote:
Hello,

There is an issue in the sql files for search in pgweb, as schema.sql does = not correspond to what is used in functions.sql.
functions.sql uses sites.baseurl, which is not a field in schema.sql. I rep= laced it with sites.hostname, which I guess is the targeted field.

Without that patch, I get the following error when performing a search:

Traceback (most recent call last):=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 File "/srv/pgweb/local/pgweb/search/views.py", line 284, i= n search=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 'internal': include_internal,=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
psycopg2.errors.UndefinedColumn: column sites.baseurl does not exist=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0
LINE 1: SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, t...=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
QUERY:=C2=A0 SELECT sites.id AS siteid, sites.baseurl, webpages.suburl, ts_rank= _cd(fti,tsq) * relprio AS ts_rank_cd FROM web
pages INNER JOIN sites ON webpages.site=3Dsites.id WHERE fti @@ tsq AND site=3D= 1 AND (includeinternal OR NOT isinternal) OR
DER BY ts_rank_cd(fti,tsq) * relprio DESC LIMIT 1000=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
CONTEXT:=C2=A0 PL/pgSQL function site_search(text,integer,integer,text,bool= ean) line 21 at OPEN


Is = that really right/enough?

All the instances I run = have both fields, and for example hostname is www.postgresql.org and baseurl is https://www.postgresql.org.

Now, to= day that's the only site we have, so we could just replace it with putt= ing the https:// part into it int he templates, but I think this originally= comes from the times when we indexed multiple sites in a previous system -= - and it appears to have been incorrectly imported in the very first commit= . The search system is really due for a cleanup/replacement since that has = been removed, because a lot of things could be simplified around that.

But short term, I believe the correct thing to do is t= o add the baseurl field to the table in the SQL file. I'll go push a pa= tch taht does that.
=C2=A0
--
=C2=A0Magnus Hagander
=C2= =A0Me: https://www.h= agander.net/
=C2=A0Work: https://www.redpill-linpro.com/
--0000000000005a660305cf05996f--