Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sX2de-0018qN-J3 for pgsql-hackers@arkaria.postgresql.org; Thu, 25 Jul 2024 17:52:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sX2cf-00Haxj-2q for pgsql-hackers@arkaria.postgresql.org; Thu, 25 Jul 2024 17:51:25 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sX2ce-00Haxb-On for pgsql-hackers@lists.postgresql.org; Thu, 25 Jul 2024 17:51:24 +0000 Received: from mail-lj1-x236.google.com ([2a00:1450:4864:20::236]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sX2cb-001PbI-HK for pgsql-hackers@lists.postgresql.org; Thu, 25 Jul 2024 17:51:22 +0000 Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-2ef2c56d9dcso4544741fa.2 for ; Thu, 25 Jul 2024 10:51:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1721929880; x=1722534680; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZV4fuAIHndkgDaWONtJtieWv1j4pWawkJ0lkegu/Hqw=; b=lvtlJRUMFytZ8vWSUN/NZ8R5U73Oh9r/NQxTkOGM/f+DqG82oE1XLmHPNaQZthqT5A mgtIqL6T3iY+VS2NRyMvlpp6o5EzT5kRLUDOTA83JU8YWp3zAio07FR+cWJyRTtr3S/I xKk4BnOgTTuVO5a3lnFj7Lg92ihg1hKPpM7aFrd0zIYzWq3aTxWSuXEooybsHZaL4mQ4 JusCJI6HJ/CN/TXqVRswf3qBWkEsFUWY2g5+aT2a9mfKTUF5hP1gCHWIEs2uuzAjql/k qJ8geimx/VoZousTPOfjs8rOnbGE8p/K0vKBqbwm0+wwzvrDV6v5RZdV5Fy2mhoPJbD4 ggcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721929880; x=1722534680; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZV4fuAIHndkgDaWONtJtieWv1j4pWawkJ0lkegu/Hqw=; b=OxjYKmiWTizvE2hYbHF6IyqI9bGRh3FKorramzI/lXkP0h1VnZ3MU9t8BF2giZ7+AC r0tLZdjqfvwIOqifUGgeuo1cwPyyDsIT0By32DIg5NXDyl/XG4x5r8+nZ+v+yQj9+ZZb G55WGlAzPijSqtRk+XmqQTCK8Oetl0oVxdcIlSznQFstLkcPUf8ZCisRlNJlHwyMNo7/ Qa/pou3L4K7n/XciiIbu9f2RiZj+INYvlOaxIs/3/DPNhTyEIufLYw1VHrQBnAGYDX9Y NGt5dWHt5dDZnhD4UAaUccsyAyeMTPLMKf9MkilgFFr8eSsBeSL4nJSEEP+Qia78wVap pErg== X-Gm-Message-State: AOJu0YwE/JJvE0IfieL19fxv3hicpitlXrvd058f+FMtv1/PkujRXhgN SN0y9Pp/wcNeR/uscuBQszhOW9rwF8xvbHtCgcoIxaKI3S+A4gUwuVyRGYeFOfAyNIIW4Yx35Dy /KfPIaeGvocrdkz+EQbtJOv4C/5AwVLxEY70kDs0V6xokG7kEOQ== X-Google-Smtp-Source: AGHT+IG3jbLgKh1XYb8UD4BcfTjL+xbWx+oWXYx6PLSjvkVII131lZEMC4iHRWzrX9c5eQyvGNOptlCKzzpqLznKXUs= X-Received: by 2002:a2e:9b5a:0:b0:2ec:5933:a62c with SMTP id 38308e7fff4ca-2f03db8e16fmr21646641fa.22.1721929879687; Thu, 25 Jul 2024 10:51:19 -0700 (PDT) MIME-Version: 1.0 References: <1C81CD0D-407E-44F9-833A-DD0331C202E5@yesql.se> In-Reply-To: <1C81CD0D-407E-44F9-833A-DD0331C202E5@yesql.se> From: Jacob Champion Date: Thu, 25 Jul 2024 10:51:05 -0700 Message-ID: Subject: Re: Serverside SNI support in libpq To: Daniel Gustafsson Cc: Pgsql Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, May 10, 2024 at 7:23=E2=80=AFAM Daniel Gustafsson = wrote: > The way multiple certificates are handled is that libpq creates one SSL_C= TX for > each at startup, and switch to the appropriate one when the connection is > inspected. I fell in a rabbit hole while testing this patch, so this review isn't complete, but I don't want to delay it any more. I see a few possibly-related problems with the handling of SSL_context. The first is that reloading the server configuration doesn't reset the contexts list, so the server starts behaving in really strange ways the longer you test. That's an easy enough fix, but things got weirder when I did. Part of that weirdness is that SSL_context gets set to the last initialized context, so fallback doesn't always behave in a deterministic fashion. But we do have to set it to something, to create the SSL object itself... I tried patching all that, but I continue to see nondeterministic behavior, including the wrong certificate chain occasionally being served, and the servername callback being called twice for each connection (?!). Since I can't reproduce the weirdest bits under a debugger yet, I don't really know what's happening. Maybe my patches are buggy. Or maybe we're running into some chicken-and-egg madness? The order of operations looks like this: 1. Create a list of contexts, selecting one as an arbitrary default 2. Create an SSL object from our default context 3. During the servername_callback, reparent that SSL object (which has an active connection underway) to the actual context we want to use 4. Complete the connection It's step 3 that I'm squinting at. I wondered how, exactly, that worked in practice, and based on this issue the answer might be "not well": https://github.com/openssl/openssl/issues/6109 Matt Caswell appears to be convinced that SSL_set_SSL_CTX() is fundamentally broken. So it might just be FUD, but I'm wondering if we should instead be using the SSL_ flavors of the API to reassign the certificate chain on the SSL pointer directly, inside the callback, instead of trying to set them indirectly via the SSL_CTX_ API. Have you seen any weird behavior like this on your end? I'm starting to doubt my test setup... On the plus side, I now have a handful of debugging patches for a future commitfest. Thanks, --Jacob