Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tOwzl-009FGQ-Jn for psycopg@arkaria.postgresql.org; Sat, 21 Dec 2024 10:46:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tOwzj-007SUh-Gj for psycopg@arkaria.postgresql.org; Sat, 21 Dec 2024 10:46:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tOwzj-007SUR-AN for psycopg@lists.postgresql.org; Sat, 21 Dec 2024 10:46:03 +0000 Received: from mail-il1-x132.google.com ([2607:f8b0:4864:20::132]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tOwzg-000uJr-9g for psycopg@postgresql.org; Sat, 21 Dec 2024 10:46:02 +0000 Received: by mail-il1-x132.google.com with SMTP id e9e14a558f8ab-3a9cb80dbfdso20136655ab.0 for ; Sat, 21 Dec 2024 02:46:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734777957; x=1735382757; darn=postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=vhBN53MJ1LMeZBAJB0Yt4eIMo6w+JKqsF7FPEOqMscc=; b=lI7DN9EZmj/3IMZFN14Z7sDB1JhLjAudo92jndpsnzHItRxRfHAVgbvDbhJWJbOcNM +tqZLceLR/9U+cel4dOnFhJuq1twRurbj5LFUq0vVWcuNm4nA0ngcthv/kNi218J6gPI DJRygnopjD64Bs9in5FNaeQGkOwvOvduQYx+Pr9SuzLykp0hAPq9FJWFiDF2AA/s5H88 IShkWsc6gb+OBWh8jFRFv+hVg6rMsNcisnGn3dkX0Zd11yf9P3CU20wsX8wQSunXlmts X2cb5bXPHDJqmpN6MyzrrPmEAwz5Wympt7pKknNJQ38YDHv67pgWt7gdOcuUffO3jqUc gObQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734777957; x=1735382757; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=vhBN53MJ1LMeZBAJB0Yt4eIMo6w+JKqsF7FPEOqMscc=; b=WV8BCtoegULWRT1abAoDKF6xOC9J96v1wWgS/xiBZEuyM1NxQJp9l2UP3PeIDbPhI+ O+vGP3KQRlwrB/Nen97AT7VcKukp3ojDdNuq/gwEz91FO+NFegSfcn2zgoyV/kyixgYZ Ledy5I1En4eHg5Yjhom+dy2tpP3brJWZrQP4prJrrcfo0vz7ikb49iPZnimZQu/NQN8F ZvdXb1mbixWtIdyZmGOMPqlwjf3kjUrY3bHToe6Kxf3FT0VfFRNDTWIXuisO303v8Dq7 6Y67RFt9CAPtutKl2T5g6PL8VhimQM8p/jYeS/xtGomFvM32Uh8nKTevdVi06pZTvhwy 2RQg== X-Gm-Message-State: AOJu0YwjhSYfwsq8xfUtczs14OyNWrb97j7IhAT/7VqEX5sth/KsnnfU 1okj2HYp/cdS5BWicHb2QNJNwELrSQcgzAw1g2BZT75/65udVZtLb9L4FlhcIv3dK8mjjnpLhIN YtvusdUzSvCPnMXvto2czxGiqSWP+Kf106Ds= X-Gm-Gg: ASbGncv55Xwb+Fyqotc6BcG/aM6eZK7tZexZANjpEjiTBw75OvcKHO4fIWOUezwEJau 2IhoAFG2NWeka5xFHpNVh8UtPpvDObPNG6MPSS+Il X-Google-Smtp-Source: AGHT+IGf8seDKJaW2EKAOYI6+KSJaUkCsGMuywTRdK/jrZvALZCWYGYE3Xm/SxPNk9/ZzeeWBzl1VbuHJABrPkQSwyU= X-Received: by 2002:a05:6e02:1889:b0:3a7:e286:a565 with SMTP id e9e14a558f8ab-3c2d5a28056mr55795885ab.23.1734777957359; Sat, 21 Dec 2024 02:45:57 -0800 (PST) MIME-Version: 1.0 From: Vladimir Ryabtsev Date: Sat, 21 Dec 2024 02:45:45 -0800 Message-ID: Subject: Memory To: psycopg@postgresql.org Content-Type: multipart/alternative; boundary="000000000000e9aa4f0629c577c3" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e9aa4f0629c577c3 Content-Type: text/plain; charset="UTF-8" Hi community, I am reading a big dataset using code similar to this: query = ''' SELECT timestamp, data_source, tag, agg_value FROM my_table '''I batch_size = 10_000_000 with psycopg.connect(cs, cursor_factory=psycopg.ClientrCursor) as conn: with conn.cursor('my_table') as cur: cur = cur.execute(query) while True: rows = cur.fetchmany(batch_size) # ... if not rows: break The code is executed on a Databricks node, if that matters. The library version is the latest. I found that despite fetching in batches, memory consumption grows continuously throughout the loop iterations and eventually the node goes OOM. My code does not save any references, so it might be something internal to the library. If I change the factory to ServerCursor, the issue fixes, memory does not grow after the first iteration. I looked the documentation, but did not find specifics related to performance differences between Server and Client cursors. I am fine with ServerCursor, but I need to ask, is it by design that with ClientCursor the result set is copied into memory despite fetchmany() limit? ClientCursor is the default class, so may be worth documenting the difference (sorry, if I missed that). Thank you. --000000000000e9aa4f0629c577c3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi community,

I am reading a big datase= t using code similar to this:

query =3D '''
SELECT timestamp, data_source, tag, agg_v= alue
FROM my_table
'''I=C2=A0
batch_size =3D 10_000_00= 0

with psycopg.connect(cs, cursor_fa= ctory=3Dpsycopg.ClientrCursor) as conn:
=C2=A0 with conn.cursor('my_= table') as cur:
=C2=A0 =C2=A0 cur =3D cur.execute(query)
=C2=A0 = =C2=A0 while True:
=C2=A0 =C2=A0 =C2=A0 rows =3D cur.fetchmany(batch_siz= e)
=C2=A0 =C2=A0 =C2=A0 # ...
= =C2=A0 =C2=A0 =C2=A0 if not rows:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 break

The code is executed on a Databricks node, if that mat= ters. The library version is the latest.

I found t= hat despite fetching in batches, memory=C2=A0consumption grows continuously= throughout the loop iterations and eventually the node goes OOM. My code d= oes not save any references, so it might be something internal to the libra= ry.

If I change the factory to ServerCursor, the i= ssue fixes, memory does not grow after the first iteration.

<= /div>
I looked the documentation, but did not find specifics related to= performance differences between=C2=A0Server and Client cursors.
=
I am fine with ServerCursor, but I need to ask, is it by des= ign that with ClientCursor the result set is copied into memory despite fet= chmany() limit? ClientCursor is the default class, so may be worth document= ing the difference (sorry, if I missed that).

Than= k you.

--000000000000e9aa4f0629c577c3--