public inbox for [email protected]  
help / color / mirror / Atom feed
Used memory calculation in containers - docker stats and file cache
4+ messages / 2 participants
[nested] [flat]

* Used memory calculation in containers - docker stats and file cache
@ 2024-10-27 15:23 Costa Alexoglou <[email protected]>
  2024-11-01 19:56 ` Re: Used memory calculation in containers - docker stats and file cache Peter J. Holzer <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Costa Alexoglou @ 2024-10-27 15:23 UTC (permalink / raw)
  To: [email protected]

Hey folks,

I noticed some behaviour which I was not expecting at all and after hours
of debugging I am getting somewhere, but still any help understanding what
is happening would be appreciated.

I was running a PostgreSQL server within a docker container, and after
starting to generate some moderate load (I/O heavy), I noticed the memory
usage was going up (reaching the container limit) and down, which reminded
me of a Heap GC event, but it was not.

The container limit was 16GB of RAM, and as soon as this limit was reached,
there was no restart or OOM errors, rather than a huge drop in memory
(image `ContainerRelativeAbsolute`).

After digging into this issue I found that this is file cache (image `Below
CLI`), and by default this measurement is included to docker stats CLI, and
is the default way of measuring in some observability tools as far as I can
tell.

I have two questions:
1. Should the file cache be indeed part of the reported memory that is
allocated to the container? The issue I see is that if we have an alerting
system, for example, on 90% of memory used in a container, this definitely
triggers many false positives. If we ignore the file cache, could this lead
to any reliability issues?
2. What is happening on the OS level when suddenly 15GB of file cache is
getting erased (image `ContainerRelativeAbsolute`)? I would expect for
incremental deletes rather than so many GB of cache being evicted.

Cheers,
Costa


Attachments:

  [image/png] BelowCLI.png (621.3K, 3-BelowCLI.png)
  download | view image

  [image/png] ContainerRelativeAbsolute.png (350.1K, 4-ContainerRelativeAbsolute.png)
  download | view image

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Used memory calculation in containers - docker stats and file cache
  2024-10-27 15:23 Used memory calculation in containers - docker stats and file cache Costa Alexoglou <[email protected]>
@ 2024-11-01 19:56 ` Peter J. Holzer <[email protected]>
  2024-11-04 13:35   ` Re: Used memory calculation in containers - docker stats and file cache Costa Alexoglou <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Peter J. Holzer @ 2024-11-01 19:56 UTC (permalink / raw)
  To: [email protected]

On 2024-10-27 16:23:44 +0100, Costa Alexoglou wrote:
> The container limit was 16GB of RAM, and as soon as this limit was reached,
> there was no restart or OOM errors, rather than a huge drop in memory (image
> `ContainerRelativeAbsolute`).
[...]
> 2. What is happening on the OS level when suddenly 15GB of file cache is
> getting erased (image `ContainerRelativeAbsolute`)? I would expect for
> incremental deletes rather than so many GB of cache being evicted.

I don't know if Docker does anything strange here. I can think of two
scenarios which would normally result in a sudden drop in filesystem
cache size:

1) A large file (or many smaller files) which is cached is deleted
2) Something else briefly needs a lot of RAM, evicting data from the
   cache.

Both can happen in a database (for example, a large sort operation might
need a few GBs of either RAM or temporary files, depending on your
work_mem settings), but I wouldn't expect them to happen just before the
configured limit is reached. So I'd double check the logs if there are
any errors.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Used memory calculation in containers - docker stats and file cache
  2024-10-27 15:23 Used memory calculation in containers - docker stats and file cache Costa Alexoglou <[email protected]>
  2024-11-01 19:56 ` Re: Used memory calculation in containers - docker stats and file cache Peter J. Holzer <[email protected]>
@ 2024-11-04 13:35   ` Costa Alexoglou <[email protected]>
  2024-11-04 20:45     ` Re: Used memory calculation in containers - docker stats and file cache Peter J. Holzer <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Costa Alexoglou @ 2024-11-04 13:35 UTC (permalink / raw)
  To: [email protected]; [email protected]

> I don't know if Docker does anything strange here.

I am not sure if this is docker specific or cgroup comes into play.
The measurement is implemented in docker CLI, but I would make the
assumption
that the eviction is done within the cgroup scope.

> A large file (or many smaller files) which is cached is deleted

The increase pattern is "incremental" until the huge eviction, and this is
my question.
Couldn't also the eviction happen incrementally rather than 15GB of file
cache evicted on an instant?

> So I'd double check the logs if there are
any errors.

Not any error in the logs, unfortunately (or fortunately).


Seems like this issue <https://github.com/hashicorp/nomad/issues/16230;, or
the parent one <https://github.com/moby/moby/issues/10824; that everyone is
linking to this.


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Used memory calculation in containers - docker stats and file cache
  2024-10-27 15:23 Used memory calculation in containers - docker stats and file cache Costa Alexoglou <[email protected]>
  2024-11-01 19:56 ` Re: Used memory calculation in containers - docker stats and file cache Peter J. Holzer <[email protected]>
  2024-11-04 13:35   ` Re: Used memory calculation in containers - docker stats and file cache Costa Alexoglou <[email protected]>
@ 2024-11-04 20:45     ` Peter J. Holzer <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Peter J. Holzer @ 2024-11-04 20:45 UTC (permalink / raw)
  To: [email protected]

On 2024-11-04 14:35:23 +0100, Costa Alexoglou wrote:
> > I don't know if Docker does anything strange here.
> 
> I am not sure if this is docker specific or cgroup comes into play. 
> The measurement is implemented in docker CLI, but I would make the
> assumption that the eviction is done within the cgroup scope.

I was trying to come up with possible *causes* for the eviction.


> > A large file (or many smaller files) which is cached is deleted
> 
> The increase pattern is "incremental" until the huge eviction, and
> this is my question. Couldn't also the eviction happen incrementally
> rather than 15GB of file cache evicted on an instant?

It (usually) takes longer to write a file than to delete it.

If a temporary file is slowly written and then deleted after it is no
longer needed, you would see such a "sawtooth" as your screenshots show:
While the file is written, the kernel will cache the data on the
assumption that it will be read again sometime in the near future. But
when it is deleted, the kernel knows that it can't be read again - so it
will throw away all this (now useless) data.


> Seems like this issue, or the parent one that everyone is linking to this.

That seems to be just about the way it is reported, not the behaviour.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2024-11-04 20:45 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-10-27 15:23 Used memory calculation in containers - docker stats and file cache Costa Alexoglou <[email protected]>
2024-11-01 19:56 ` Peter J. Holzer <[email protected]>
2024-11-04 13:35   ` Costa Alexoglou <[email protected]>
2024-11-04 20:45     ` Peter J. Holzer <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox