Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thTRf-002sC1-Oj for pgsql-general@arkaria.postgresql.org; Mon, 10 Feb 2025 13:03:27 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1thTQg-0017zF-1I for pgsql-general@arkaria.postgresql.org; Mon, 10 Feb 2025 13:02:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thTQf-0017z6-LD for pgsql-general@lists.postgresql.org; Mon, 10 Feb 2025 13:02:25 +0000 Received: from mail-ed1-x530.google.com ([2a00:1450:4864:20::530]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1thTQd-0001u2-1S for pgsql-general@postgresql.org; Mon, 10 Feb 2025 13:02:24 +0000 Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-5de7519e5a7so1369513a12.2 for ; Mon, 10 Feb 2025 05:02:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739192542; x=1739797342; darn=postgresql.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=LkmVXLhxVc7zlmBuuD2xsqjVaSY9lvyGjzvPdDf9qNg=; b=Yf0gnFaIY56N/XwIXLfS7Vy1GQtf0mvYp/ZGYJRcdhw6mFHrFCR4+JKKhaRE+rg5jm y0Hs4r72Z4CKgtZ8lgHFuBt5hlJbVGtxCTCS0+e+fZYBzPowOFnbBig4MRca1Xo5/ifh OzBPB5Xe7Imcd1HF0DQi+AoH04HLMuAZcJwEY7VS16tTqqNDaLgrp8Dqkp0U1U8M2aKU ANBVUdu76erBNKf4Zz+7Cyw8LzpcgMxDysVwDYOTT9/RxEihuVn3znZlRi85KDJlgNLg uWKMTAQXuVMZ0wtV6CJtRNwx3NCWGgQE9DygFW/COXzOdawFfQk3RAaH4KmdU/pvqio1 dCUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739192542; x=1739797342; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LkmVXLhxVc7zlmBuuD2xsqjVaSY9lvyGjzvPdDf9qNg=; b=WDQUiHHrf/hhiW6A+UI6Oeq+s7JWJrYBqbyrPE/TF4DP2PLiSFtMdDVEoM2RapFPT8 /YT+uGZZAy+CK6hLdBxwuDtjFMBez7qt6cwCx0pYSpETdUsLfvPsFg0nLZoqk02sNh5t knowsX88RNML2xgCqpVg8v1m1uZxFBxlvIXIPGCyxGCcrZ/GrlHZyJwFF8xQ0CV8PLbQ dlccB0neEi7dRHht7aoMmW2uEZ9WertOFF8NnhyWRnvn4kEwGbce2bxVLkN0Tf0ys6YJ p1KE/7tm4a24SDpOELBpjwzi0xfKWUgP3o0P8BaW4ij2ism/zlWhhCSJkmNuo2bEHX7U XshA== X-Forwarded-Encrypted: i=1; AJvYcCXPsjvY/hqBy4iTDYi92T6v/u7ClX2HcCwUq2mvHAFkzmym6T8hUt+q1D2cgM5fKXDnunRhUTZ7aJgytipg@postgresql.org X-Gm-Message-State: AOJu0Yz31MIyPnZ7pH+pSYslGMXbVWDuZoyHBlh+DE977ZxX7z4D0PlS /X0YKDbdx99nl9csXDE8iWHZIfsDqNsW6gxYm+gygSiXCq/UX/D3 X-Gm-Gg: ASbGncu+cnn8wS4UFJ1IXfZVRdl1xr/vARrphGO2QDa3IJ6rIZwEhiRCZln5TmVr/jn Z8PO5NIxE7KDrqZLHTUePa7OwPYE94dqwbLzO4h8CZ3IUF4AVEhclsH+hYC5HxjOZVTqvgs5+IT cbJxIo8lvUfaFTVA9h0yYkxvde80liJE7jaS6S3rxbWHbJ9iGhR9ZJpL8IhInWFEXReEo52pGc/ +cwpNt5NLlFLSJvDE7KNxJqNzxpCRS7liVsBHQHoNk3rg9jmhAdSw7M3CJCPvXN49X1Cv2zPgBl RVdj5QeZaAcv4rnh1fc7xzt3xms753rtqdp+Jxc2eX/Ha96dPxOC7QqJDffzze7aHiwdIIU= X-Google-Smtp-Source: AGHT+IGA4Oezgmy96PcV7KZwFxsWG00wswQdbJ40gWburabf8dwLLEYZky8WyUG5qw6T8gwvGUIAfw== X-Received: by 2002:a05:6402:a00f:b0:5dc:8845:69fc with SMTP id 4fb4d7f45d1cf-5de450881b8mr14833484a12.28.1739192541484; Mon, 10 Feb 2025 05:02:21 -0800 (PST) Received: from smtpclient.apple (adsl-126.176.58.138.tellas.gr. [176.58.138.126]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5de4805a0b2sm6553720a12.48.2025.02.10.05.02.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Feb 2025 05:02:20 -0800 (PST) From: Florents Tselai Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_6D8AF91C-7786-419C-B975-E90B0AF4BD1E" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.400.131.1.6\)) Subject: Re: How to read an external pdf file from postgres? Date: Mon, 10 Feb 2025 15:01:48 +0200 In-Reply-To: Cc: Amine Tengilimoglu , "pgsql-general@postgresql.org >> PG-General Mailing List" To: Ian Lawrence Barwick References: X-Mailer: Apple Mail (2.3826.400.131.1.6) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_6D8AF91C-7786-419C-B975-E90B0AF4BD1E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 12 Jan 2022, at 4:35=E2=80=AFPM, Ian Lawrence Barwick = wrote: >=20 > 2022=E5=B9=B41=E6=9C=8812=E6=97=A5(=E6=B0=B4) 20:16 Amine Tengilimoglu = : >>=20 >> Hi; >>=20 >> I want to read an external pdf file from postgres. pdf file will = exist on the disk. postgres only know the disk full path as metadata. Is = there any software or extension that can be used for this? Or do we have = to develop software for it? Or what is the best approach for this? I'd = appreciate it if anyone with experience could make suggestions. >=20 > By "read" do you mean "open the file and meaningful extract data from = it"? If > so, speaking from prior experience, don't. And if you really have to, = make sure > the source PDF is guaranteed to be in a well-defined, predictable = format > enforceable by contract law and/or people with sharp pointy sticks. I = have > successfully suppressed the memories of whatever it is I once had to = do with > reading data from PDFs, but though the data was eventually imported = into > PostgreSQL, there was a lot of mangling probably involving a Perl = module (other > languages are probably available) before it got anywhere near the = database. >=20 >=20 > Reagrds >=20 > Ian Barwick >=20 > --=20 > EnterpriseDB: https://www.enterprisedb.com >=20 >=20 https://github.com/Florents-Tselai/pgpdf= --Apple-Mail=_6D8AF91C-7786-419C-B975-E90B0AF4BD1E Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 12 Jan 2022, at 4:35=E2=80=AFPM, Ian Lawrence = Barwick <barwick@gmail.com> wrote:

2022=E5=B9=B41=E6=9C=8812=E6= =97=A5(=E6=B0=B4) 20:16 Amine Tengilimoglu = <aminetengilimoglu@gmail.com>:

=  Hi;

    I want to read an external pdf = file from postgres. pdf file will exist on the disk. postgres only know = the disk full path as metadata. Is there any software or extension that = can be used for this? Or do we have to develop software for it?  Or = what is the best approach for this? I'd appreciate it if anyone with = experience could make suggestions.

By "read" do you = mean "open the file and meaningful extract data from it"? If
so, = speaking from prior experience, don't. And if you really have to, make = sure
the source PDF is guaranteed to be in a well-defined, = predictable format
enforceable by contract law and/or people with = sharp pointy sticks. I have
successfully suppressed the memories of = whatever it is I once had to do with
reading data from PDFs, but = though the data was eventually imported into
PostgreSQL, there was a = lot of mangling probably involving a Perl module (other
languages are = probably available) before it got anywhere near the = database.


Reagrds

Ian Barwick

-- =
EnterpriseDB: = https://www.enterprisedb.com


= --Apple-Mail=_6D8AF91C-7786-419C-B975-E90B0AF4BD1E--