Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1snJOe-002b1N-AW for pgsql-hackers@arkaria.postgresql.org; Sun, 08 Sep 2024 15:00:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1snJOc-00FNIW-94 for pgsql-hackers@arkaria.postgresql.org; Sun, 08 Sep 2024 15:00:10 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1snJOb-00FNIO-RM for pgsql-hackers@lists.postgresql.org; Sun, 08 Sep 2024 15:00:09 +0000 Received: from mail-lf1-x12d.google.com ([2a00:1450:4864:20::12d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1snJOW-0009aP-3L for pgsql-hackers@postgresql.org; Sun, 08 Sep 2024 15:00:08 +0000 Received: by mail-lf1-x12d.google.com with SMTP id 2adb3069b0e04-5365c060f47so2245784e87.2 for ; Sun, 08 Sep 2024 08:00:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725807603; x=1726412403; darn=postgresql.org; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=bql5k+DQMu2TwWklDK4zKnURRCjwjp7r2tGIrSFtys0=; b=glBJxK6rvFi09bkvOFIDbPE214vqxbuBCXixkiIvVcqrIcX+NYxvsIgkEALnRTPMHG 5O03XnhWogu/br/9Cb6q4AVz15QIjZ/hkXw+DfS4FS982hDGJHmEJEW1LPUNh6YZQjcT 9QUVWu+XRXZZBbHkKEQKXC+A4nF1tyTH4LEJLkxOGxdNrhwHWPk5SHkH7TWd0CwCrEEP TrSdyQmJGXTJcCu8DJyrIZU1CuMpBjXKIghRK4szAF/jADJLEzNtarg5rAWEWR4vtAUL 6yVwVt9HXSzTzpgPXnmQ7jMH4DnG/BWaFIBwr+tWNdeLZKth2k3XiVSZdbboulQP82hp bWhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725807603; x=1726412403; h=in-reply-to:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=bql5k+DQMu2TwWklDK4zKnURRCjwjp7r2tGIrSFtys0=; b=DM0gE5McLSQPkmAdxc4RWuCFXamLrE4CPMm77zI2ekrQ/TWA7MpgXrs0FhaZKYD8KJ qMouRwprw0N+8PgW2rZ5dRp8En3a6sxTXL4ZiUfiphNLlkOR5rlzVBV8M+CXBpmM68XR cb6e2ec//QKC6qCYbu78KXFBpWWfhYx7LGNKKPzYZY9dMExm1Ad2fFwq9sjb6wtVhcaD B6XemIBtAKimVw7ZI/UgtEVTZxiu8KxSI6YoP7kTo6noeq7CA6BfAh0hBsgFO0/M8okm 0J6dmj/836CRzfYIZm5lM2eg4K7zpZmsoJFcWuKdhnDvn/MuGqMzalqp23GfsTsSXn4l SfCg== X-Gm-Message-State: AOJu0Yy+OWmRXb4s2UQX0qrBOpEfSKQiYjoA50J16azI9s3TYgwSbbgL Vs7GQ03CT2mFNnyhpJOT13GO5JtbdvQsB/y7T11JsSfOiwKuSzqZ X-Google-Smtp-Source: AGHT+IE0Ut7WIzu2vw9gcdr1ycaEOJ1HG40kkV+nXj7v204A6Q3dqxEyMMiJMkSfTP4paa9hBjrIRg== X-Received: by 2002:a05:6512:3510:b0:533:46cc:a736 with SMTP id 2adb3069b0e04-536587f56dbmr4215620e87.37.1725807601719; Sun, 08 Sep 2024 08:00:01 -0700 (PDT) Received: from [1.0.0.7] ([178.155.16.7]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5365f86fb8fsm424573e87.66.2024.09.08.08.00.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 08 Sep 2024 08:00:00 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------rL4iKe0sAhy13aHkfDO6jk02" Message-ID: Date: Sun, 8 Sep 2024 18:00:00 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: Yet another way for pg_ctl stop to fail on Windows Content-Language: en-US To: Noah Misch Cc: pgsql-hackers References: <20240907181143.11.nmisch@google.com> From: Alexander Lakhin In-Reply-To: <20240907181143.11.nmisch@google.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------rL4iKe0sAhy13aHkfDO6jk02 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit 07.09.2024 21:11, Noah Misch wrote: > >> Noah, what do you think of handling this error in line with handling of >> ERROR_BROKEN_PIPE and ERROR_BAD_PIPE (which was done in 0ea1f2a3a)? >> >> I tried the following change: >>         switch (GetLastError()) >>         { >>                 case ERROR_BROKEN_PIPE: >>                 case ERROR_BAD_PIPE: >> +               case ERROR_PIPE_BUSY: >> and saw no issues. > That would be a strict improvement over returning EINVAL like we do today. We > do use PIPE_UNLIMITED_INSTANCES, so I expect the causes of ERROR_PIPE_BUSY are > process exit and ENOMEM-like situations. While that change is the best thing > if the process is exiting, it could silently drop the signal in ENOMEM-like > situations. Consider the following alternative. If sig==0, just return 0 > like you propose, because the process isn't completely gone. Otherwise, sleep > and retry the signal, like pgwin32_open_handle() retries after certain errors. > What do you think of that? Thank you for your attention to the issue! I agree with your approach. It looks like Microsoft recommends to loop on ERROR_PIPE_BUSY: [1] (they say "Calling CallNamedPipe is equivalent to calling the CreateFile ..." at [2]). So if we aim to not only fix "pg_ctl stop", but to make pgkill() robust, it's the way to go, IMHO. I'm not sure about an infinite loop they show, I'd vote for a loop with the same characteristics as in pgwin32_open_handle(). I've managed to trigger ERROR_PIPE_BUSY with "pg_ctl reload", when running 20 TAP tests (see attached) in parallel (with 20 threads executing $node->reload() in each), and with the kill() call inside do_reload() repeated x100 (see the modification patch attached too): ... # Running: pg_ctl -D .../099_test_pgkill\data/t_099_test_pgkill_node_data/pgdata reload ### Reloading node "node" # Running: pg_ctl -D .../099_test_pgkill\data/t_099_test_pgkill_node_data/pgdata reload [13:41:46.850](2.400s) # 18 server signaled server signaled server signaled server signaled server signaled server signaled server signaled server signaled !!!pgkill| GetLastError(): 231 pg_ctl: could not send reload signal (PID: 3912), iteration: 81, res: -1, errno: 22: Invalid argument server signaled [13:41:52.594](5.744s) # 19 ... [1] https://learn.microsoft.com/en-us/windows/win32/ipc/named-pipe-client [2] https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-callnamedpipea Best regards, Alexander --------------rL4iKe0sAhy13aHkfDO6jk02 Content-Type: text/x-patch; charset=UTF-8; name="pgkill-debugging.patch" Content-Disposition: attachment; filename="pgkill-debugging.patch" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy9iaW4vcGdfY3RsL3BnX2N0bC5jIGIvc3JjL2Jpbi9wZ19jdGwv cGdfY3RsLmMKaW5kZXggZTdlODc4YzIyZi4uY2FiNmViOTQ3MiAxMDA2NDQKLS0tIGEvc3Jj L2Jpbi9wZ19jdGwvcGdfY3RsLmMKKysrIGIvc3JjL2Jpbi9wZ19jdGwvcGdfY3RsLmMKQEAg LTExNjcsMTIgKzExNzQsMTcgQEAgZG9fcmVsb2FkKHZvaWQpCiAJCWV4aXQoMSk7CiAJfQog Ci0JaWYgKGtpbGwocGlkLCBzaWcpICE9IDApCitmb3IgKGludCBpID0gMTsgaSA8PSAxMDA7 IGkrKykKK3sKK2ludCByZXMsIGVuOworCWlmICgocmVzID0ga2lsbChwaWQsIHNpZykpICE9 IDApCiAJewotCQl3cml0ZV9zdGRlcnIoXygiJXM6IGNvdWxkIG5vdCBzZW5kIHJlbG9hZCBz aWduYWwgKFBJRDogJWQpOiAlbVxuIiksCi0JCQkJCSBwcm9nbmFtZSwgKGludCkgcGlkKTsK K2VuID0gZXJybm87CisJCXdyaXRlX3N0ZGVycihfKCIlczogY291bGQgbm90IHNlbmQgcmVs b2FkIHNpZ25hbCAoUElEOiAlZCksIGl0ZXJhdGlvbjogJWQsIHJlczogJWQsIGVycm5vOiAl ZDogJW1cbiIpLAorCQkJCQkgcHJvZ25hbWUsIChpbnQpIHBpZCwgaSwgcmVzLCBlbik7CiAJ CWV4aXQoMSk7CiAJfQorfQogCiAJcHJpbnRfbXNnKF8oInNlcnZlciBzaWduYWxlZFxuIikp OwogfQpkaWZmIC0tZ2l0IGEvc3JjL3BvcnQva2lsbC5jIGIvc3JjL3BvcnQva2lsbC5jCmlu ZGV4IDQxMmMyZjE5YzEuLmIzZDJjMDA3OTYgMTAwNjQ0Ci0tLSBhL3NyYy9wb3J0L2tpbGwu YworKysgYi9zcmMvcG9ydC9raWxsLmMKQEAgLTcwLDYgKzcwLDcgQEAgcGdraWxsKGludCBw aWQsIGludCBzaWcpCiAJCXJldHVybiAwOwogCX0KIAoraW50IGxlID0gR2V0TGFzdEVycm9y KCk7CiAJc3dpdGNoIChHZXRMYXN0RXJyb3IoKSkKIAl7CiAJCWNhc2UgRVJST1JfQlJPS0VO X1BJUEU6CkBAIC04OSw2ICs5MCw3IEBAIHBna2lsbChpbnQgcGlkLCBpbnQgc2lnKQogCQkJ ZXJybm8gPSBFUEVSTTsKIAkJCXJldHVybiAtMTsKIAkJZGVmYXVsdDoKK2ZwcmludGYoc3Rk ZXJyLCAiISEhcGdraWxsfCBHZXRMYXN0RXJyb3IoKTogJWRcbiIsIGxlKTsKIAkJCWVycm5v ID0gRUlOVkFMOwkJLyogdW5leHBlY3RlZCAqLwogCQkJcmV0dXJuIC0xOwogCX0K --------------rL4iKe0sAhy13aHkfDO6jk02 Content-Type: application/x-perl; name="099_test_pgkill.pl" Content-Disposition: attachment; filename="099_test_pgkill.pl" Content-Transfer-Encoding: 7bit use strict; use warnings; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More tests => 1; use threads; use threads::shared; sub reload { my @args = @_; my ($thread_num, $node) = @args; my $i = 1; $SIG{'KILL'} = sub { print("Thread killed on iteration $i.\n"); threads->exit(); }; print("Thread $thread_num started\n"); while (1) { $node->reload(); $i++; } } sub shutdown_threads { my @args = @_; my (@threads) = @args; for my $thread (@threads) { if ($thread->is_running()) { $thread->kill('KILL')->detach; } } return; } my $node = PostgreSQL::Test::Cluster->new("node"); $node->init; $node->start; my $numthreads = 20; my @threads; foreach my $ti (1 .. $numthreads) { push @threads, threads->create('reload', $ti, $node); } my $max_duration = 100; my $failed_threads = 0; my $duration = 0; while (1) { foreach my $ti (1 .. $numthreads) { my $thread = $threads[$ti - 1]; if ($thread->is_joinable()) { my $result = $thread->join(); $failed_threads += 1; diag("Thread $ti failed:\n$result"); } elsif (!$thread->is_running()) { $failed_threads += 1; diag("Thread $ti is not running:\n"); } } if ($failed_threads > 0) { last; } sleep(1); diag("$duration\n"); $duration += 1; if ($duration >= $max_duration) { last; } } diag($failed_threads); shutdown_threads(@threads); ok($failed_threads == 0); --------------rL4iKe0sAhy13aHkfDO6jk02--