public inbox for [email protected]
help / color / mirror / Atom feedFrom: Vladimir Ryabtsev <[email protected]>
To: [email protected]
Subject: Memory
Date: Sat, 21 Dec 2024 02:45:45 -0800
Message-ID: <CAMqTPqn8y8Y+uDY0FPvX6ghD1DftLyz2nD6n6HhGOg-gHP4JdA@mail.gmail.com> (raw)
Hi community,
I am reading a big dataset using code similar to this:
query = '''
SELECT timestamp, data_source, tag, agg_value
FROM my_table
'''I
batch_size = 10_000_000
with psycopg.connect(cs, cursor_factory=psycopg.ClientrCursor) as conn:
with conn.cursor('my_table') as cur:
cur = cur.execute(query)
while True:
rows = cur.fetchmany(batch_size)
# ...
if not rows:
break
The code is executed on a Databricks node, if that matters. The library
version is the latest.
I found that despite fetching in batches, memory consumption grows
continuously throughout the loop iterations and eventually the node goes
OOM. My code does not save any references, so it might be something
internal to the library.
If I change the factory to ServerCursor, the issue fixes, memory does not
grow after the first iteration.
I looked the documentation, but did not find specifics related to
performance differences between Server and Client cursors.
I am fine with ServerCursor, but I need to ask, is it by design that with
ClientCursor the result set is copied into memory despite fetchmany()
limit? ClientCursor is the default class, so may be worth documenting the
difference (sorry, if I missed that).
Thank you.
view thread (3+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected]
Subject: Re: Memory
In-Reply-To: <CAMqTPqn8y8Y+uDY0FPvX6ghD1DftLyz2nD6n6HhGOg-gHP4JdA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox