Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tq6De-00GzGj-GT for pgsql-general@arkaria.postgresql.org; Thu, 06 Mar 2025 08:04:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tq6Dc-002PEl-J5 for pgsql-general@arkaria.postgresql.org; Thu, 06 Mar 2025 08:04:36 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tq6Dc-002PDQ-6v for pgsql-general@lists.postgresql.org; Thu, 06 Mar 2025 08:04:36 +0000 Received: from mail-ed1-x533.google.com ([2a00:1450:4864:20::533]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tq6Da-001GCH-0s for pgsql-general@lists.postgresql.org; Thu, 06 Mar 2025 08:04:35 +0000 Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-5e0573a84fcso424167a12.2 for ; Thu, 06 Mar 2025 00:04:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1741248273; x=1741853073; darn=lists.postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=drda1LGs2XWzRCeAnGW2jgx5BPKMcvYRLSuG3ajOj0Y=; b=anLZGK3rkCsmP8zJSwGpPK5TdWXz8x4aCyQAzoeMKYh6tgl+cMYWfQSQjxGT5q2XXp /0h91bRgx2ZDvyR8/ggvtaklTZDqCBqR3C0hvNZ9CxLk3Ijrfd39i6cU9/QxgY7vLp0V UN8ZAqfVUk7BZka8rI57wqrOrWVtXIIbWDtKvgk2ZqMTjO//Dw8Kt6NVNG35KhnLHAwF MgIETc/Bz5GZqm6UBFky2ud36eBPnFHHYN5vYmRpL0FN2UOODo1Sj/KJUdzTBqPu7Hfu JSIB9vZhsXOfLg2XuyxhwGtJV8+ROpZoiQn2vmk8jINGU+UoNZKXeM5SQmrCgdZU4lA9 +cXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741248273; x=1741853073; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=drda1LGs2XWzRCeAnGW2jgx5BPKMcvYRLSuG3ajOj0Y=; b=H1o7lKe0KUHpZTrM/4VnSgC3rXL3J674V3ZWlsXkSgNhB/cIw3bupNfc2O4KlbVydq 1iZ5ag6wFA45um7upWrYf7xFRZvUiw+EYykTSQx8p2RdMoEZBGNfmAEP7NKwiuiGqGQK Pe2xo+S48NGY8CCbbOj8naDE1ZGLkAJPCO2LT4Uvom9yRiYW7gtVBlVCc42Sr9REnoQT OrAC1VR3qw2OBIQhqcsNgMZSd42bX0xHWxvB54CbdIY2wzZgM6ua2bcqF5U7WER5uqq6 8TFh9cD0tz1pe4+knH4eRJAjGG3MbEfJOwjmOrgOEdCPlCAgABnCUYsxOyd6SOWPa2hs 5iWA== X-Gm-Message-State: AOJu0YxeToacPe8Ul2frO99/NMlPy7q2aJdEkKbZsp2IbZeq02HHntYD bplB26xfRbIdXno+YrZelbdENjzJ+UFBae510VJepbBM0JnD+H61b+/NOBU5pvHkftZtiT01yVb b2Ws= X-Gm-Gg: ASbGncuUx98ysjz9Gi0QJUEeaPWZ/PExH4kdS6fSX3FxV2Aww/qDvUHfQATypXLmg90 bs5OL3zjsvhdkQaHJvivy4U4BeYqTea6x0O4++lG2mpVOxtAM3jJ5qeMZy8U3rXvSx+K9tbFUpe ZlZ5GmJ7iPrrtnYT32+VcIDWIcZxS0c9OMaI4jy1SV44pkZdR3vYbsc7hFlInUk1gyhL02L1C2L pR4xoOEvOjYr6y5Y+ysAg97r+DzvyN2whLVZs4iDGte4Y2hAUjwmz8CCnEZag/U613iXbZxOMuF Ad2CLPyeJKONbWEuFrhiA8aBSSy4OYGeK2DLjcicDIjckiQ3kSeCbV4Zo11/ X-Google-Smtp-Source: AGHT+IHwjTCo3P0hq+59Z4U04F++z3JTKhhwsX9yu/L3RyeA4MQG5bOofXPwsQBgGuViWQu7YgDxcw== X-Received: by 2002:a17:907:7f0b:b0:ac1:fb60:2269 with SMTP id a640c23a62f3a-ac20d925048mr607045966b.27.1741248272628; Thu, 06 Mar 2025 00:04:32 -0800 (PST) Received: from localhost.localdomain ([41.66.98.91]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ac239517802sm52306966b.86.2025.03.06.00.04.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Mar 2025 00:04:32 -0800 (PST) Message-ID: Subject: Re: Quesion about querying distributed databases From: Laurenz Albe To: me nefcanto , Adrian Klaver Cc: pgsql-general@lists.postgresql.org Date: Thu, 06 Mar 2025 09:04:31 +0100 In-Reply-To: References: <099b49ebae94e23f19afdad3f8c9c6e702a3a2d5.camel@cybertec.at> <6d7e1022-6404-4dab-8467-8d1f6e8b63cb@aklaver.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.3 (3.54.3-1.fc41) MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 2025-03-06 at 06:13 +0330, me nefcanto wrote: > I once worked with a monolithic SQL Server database with more than 10 bil= lion > records and about 8 Terabytes=C2=A0of data. A single backup took us more = than 21 days. > It was a nightmare. Almost everybody knows that scaling up has a ceiling,= but > scaling out has no boundaries. I hear you, and I agree with that. > We initially chose to break the database into smaller databases, because = it > seemed natural for our modularized monolith architecture. And it worked g= reat > for SQL Server. If you're small, we host them all on one server. If you g= et > bigger, we can put heavy databases on separate machines. So you mean that you had those databases on different servers? How would a cross-database query work in that case? It must be something akin to foreign data in PostgreSQL. If that worked well, then it should also work well with PostgreSQL and foreign data wrappers. Look at the execution plan you got on SQL Server and see where PostgreSQL chooses a different plan. Then try to improve tha= t. We can try to help if we see actual plans. > However, I don't have experience working with other types of database > scaling. I have used table partitioning, but I have never used sharding. Well, if you split the data into several databases, that *was* sharding. > Anyway, that's why I asked you guys. However, encouraging me to go back t= o > monolith without giving solutions on how to scale, is not helping. To be > honest, I'm somehow disappointed by how the most advanced open source > database does not support cross-database querying just like how SQL Serve= r > does. But if it doesn't, it doesn't. Our team should either drop it as a > choice or find a way (by asking the experts who built it or use it) how > to design based on its features. That's why I'm asking. Excluding options from the start is limiting yourself. Consider using other, better databases than PostgreSQL (if you can find them). It is difficult to come up with a concrete design based on the information you provided. Perhaps you should get a consultant; the mailing list does not seem to be the right format for that request. Typically, you split the data in a ways that they have few interconnections= , for example per customer, so that you don't regularly end up joining data from different databases (shards). > One thing that comes to my mind, is to use custom types. Instead of stori= ng > data in ItemCategories and ItemAttributes, store them as arrays in the > relevant tables in the same database. Don't ever store arrays in the database. It will be a nightmare. You seem to be drawn to questionable data design... Yours, Laurenz Albe