Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server

public inbox for [email protected]  
help / color / mirror / Atom feed

Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
6+ messages / 5 participants
[nested] [flat]

* Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
@ 2024-10-16 14:06  Onni Hakala <[email protected]>
  0 siblings, 3 replies; 6+ messages in thread

From: Onni Hakala @ 2024-10-16 14:06 UTC (permalink / raw)
  To: [email protected]

Hey,

I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.

I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.

Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.

What would be the de-facto way to use both SSD and HDD together in a way use where commonly used data would be fast to access and old data would eventually only be stored in compressed format in the HDDs?

I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs but I’m unsure how to add the SSDs into this equation and I thought that this is probably a common scenario and wanted to ask opinions from here.

Thanks in advance,
Onni Hakala

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
@ 2024-10-16 14:56  Bzzzz <[email protected]>
  parent: Onni Hakala <[email protected]>
  2 siblings, 0 replies; 6+ messages in thread

From: Bzzzz @ 2024-10-16 14:56 UTC (permalink / raw)
  To: [email protected]

On Wed, 16 Oct 2024 17:06:24 +0300
Onni Hakala <[email protected]> wrote:

> Hey,

Hi,

> I have a large dataset of > 100TB which would be very expensive to
> store solely into SSD drives.
>
> I have access to a server which has 2x 3.84TB NVME SSD disks and
> large array of HDD drives 8 x 22TB.
>
> Most of the data that I have in my dataset is very rarely accessed
> and is stored only for archival purposes.

I can't answer to the rest but with the configuration you describe,
you might meet some slow down because HDD speed is not what you think
it is when you reach a barrier in disks filling.


A test on a 2TB/7200RPM returns that, at the beginning of the disk it is
quite fast :
========================================================================
hdparm --offset 1g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1 GB): 504 MB in  3.00 seconds =
 167.97 MB/sec


@ 50%, there's even a small gain (NB : this is a PITA, hdparm displays
GB where it uses in fact GiB:( :
=======================================================================
hdparm --offset 931g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 931 GB): 520 MB in  3.01 seconds =
 172.73 MB/sec


@ 70% of the disk, the loss is only 11% :
=========================================
hdparm --offset 1304g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1304 GB): 440 MB in  3.01 seconds =
 146.38 MB/sec


@ 80% of the disk the read speed has already lost 21% :
=======================================================
hdparm --offset 1490g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1490 GB): 390 MB in  3.01 seconds =
129.68 MB/sec


but @ 90% of the disk, the loss climbs @ 30.5% :
================================================
hdparm --offset 1676g --direct -t /dev/sdb1
/dev/sdb1:
 geometry      = 243201/255/63, sectors = 3907027087, start = 2048
 Timing O_DIRECT disk reads (offset 1676 GB): 344 MB in  3.01 seconds =
 114.45 MB/sec


So, for a good response time, it is best to never fill your disks
further than 75% and it is also always a good practice to avoid
some inconvenience, especially when using a COW FS, such as ZFS.

Jean-Yves

--





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
@ 2024-10-16 16:01  TIM CHILD <[email protected]>
  parent: Onni Hakala <[email protected]>
  2 siblings, 0 replies; 6+ messages in thread

From: TIM CHILD @ 2024-10-16 16:01 UTC (permalink / raw)
  To: Onni Hakala <[email protected]>; [email protected]

Omni,

Firstly, it looks like to have a solution, ( a fixed number and type of disks) looking for a problem. Its better to consider what is right mix of disks for your application the server. 

To understand the best physical layout, you need to know the logical  access patterns to your data. When you access your data are you doing random queries, or index scans or table scans or  computing aggregates? What queries are most important and what response time are you targeting? 
Once you have an idea of your logical access patterns you map that to you physical layout.  

Postgres Tablespaces are useful as they allow you locate tables  and indexes of different physical devices. For example it will be better to put small frequently used tables of SSD's. Or just put frequently used indexes on the SSD.  

There are many factors to consider when planning the physical storage layout. There could me a need for AI apps to do just that!

-Tim

> On 10/16/2024 7:06 AM PDT Onni Hakala <[email protected]> wrote:
> 
>  
> Hey,
> 
> I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.
> 
> I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.
> 
> Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.
> 
> What would be the de-facto way to use both SSD and HDD together in a way use where commonly used data would be fast to access and old data would eventually only be stored in compressed format in the HDDs?
> 
> I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs but I’m unsure how to add the SSDs into this equation and I thought that this is probably a common scenario and wanted to ask opinions from here.
> 
> Thanks in advance,
> Onni Hakala

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server
@ 2024-10-16 16:23  Laurenz Albe <[email protected]>
  parent: Onni Hakala <[email protected]>
  2 siblings, 0 replies; 6+ messages in thread

From: Laurenz Albe @ 2024-10-16 16:23 UTC (permalink / raw)
  To: Onni Hakala <[email protected]>; [email protected]

On Wed, 2024-10-16 at 17:06 +0300, Onni Hakala wrote:
> I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.
> 
> I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.
> 
> Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.
> 
> What would be the de-facto way to use both SSD and HDD together in a way use where commonly used
> data would be fast to access and old data would eventually only be stored in compressed format in the HDDs?
> 
> I was initially looking into building zpool using zfs with raidz3 and zstd compression for my HDDs
> but I’m unsure how to add the SSDs into this equation and I thought that this is probably a common
> scenario and wanted to ask opinions from here.

The traditionalway to handle that would be to create an extra tablespace on a file system built
on the slow disks.  You'd move tables that are no longer hot to that slow tablespace.
If you cannot move the whole table, use partitioning and move old partitions to the slow
tablespace.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* RE: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server [SEC=UNOFFICIAL]
@ 2024-10-17 03:26  Pilling, Michael DR <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Pilling, Michael DR @ 2024-10-17 03:26 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; Onni Hakala <[email protected]>; [email protected] <[email protected]>

UNOFFICIAL
I would say the best way to handle this would be to do it in the hardware drivers or file system components of the OS so that the SSD drives were used as a cache for the HDDs. This would generally result in all the active indices of the tables residing on the SSDs. It all depends whether your OS and its files systems are to the job. Certainly, Reisier4 file system aimed for all of its indices to be held in memory with only writes going to the HDDs - by using the SSDs for swap space this would have the desired effect under Reiser4.

Dr Michael Pilling (him / they)
Defence Analyst / Software Specialist 
Human and Decision Sciences Division | Analytical Wargaming | Capability Analysis and Design 
Defence Science and Technology Group
_____________________________________________________________________________
Department of Defence | Edinburgh  | 23L Bay 35
PO Box 1500 | EDINBURGH SA 5111 
M: 0414 405 711 | www.defence.gov.au 
E: [email protected]

Defence acknowledges the Traditional Custodians of the Country throughout Australia. We recognise their continuing connection to land, waters and community. We pay our respects to them, their culture and to their Elders past and present. 

-----Original Message-----
From: Laurenz Albe <[email protected]> 
Sent: Thursday, 17 October 2024 2:53 AM
To: Onni Hakala <[email protected]>; [email protected]
Subject: Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server

EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.

On Wed, 2024-10-16 at 17:06 +0300, Onni Hakala wrote:
> I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.
> 
> I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.
> 
> Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.
> 
> What would be the de-facto way to use both SSD and HDD together in a 
> way use where commonly used data would be fast to access and old data would eventually only be stored in compressed format in the HDDs?
> 
> I was initially looking into building zpool using zfs with raidz3 and 
> zstd compression for my HDDs but I’m unsure how to add the SSDs into 
> this equation and I thought that this is probably a common scenario and wanted to ask opinions from here.

The traditionalway to handle that would be to create an extra tablespace on a file system built on the slow disks.  You'd move tables that are no longer hot to that slow tablespace.
If you cannot move the whole table, use partitioning and move old partitions to the slow tablespace.

Yours,
Laurenz Albe

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* RE: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server [SEC=UNOFFICIAL]
@ 2024-10-18 00:22  Pilling, Michael DR <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Pilling, Michael DR @ 2024-10-18 00:22 UTC (permalink / raw)
  To: Pilling, Michael DR <[email protected]>; Laurenz Albe <[email protected]>; [email protected] <[email protected]>

UNOFFICIAL
Further to this, the advantages of using the SSD as a cache in the hardware/file system/operating system are:
* You don't have to understand the deep system performance issues of PostgreSQL or your application
* The result will be optimal or near optimal
* The system will self tune even as your usage of your PostgreSQL database changes overtime

The main thing you will need to monitor is the health of the SSD (and of course also the HDDs) but hopefully you will have the HDDs in a RAID configuration anyway.

Do not forget to do backups.

Here are some links that go into further detail which will hopefully help.

https://robots.net/tech/how-to-use-a-solid-state-drive-as-a-cache/
https://superuser.com/questions/390071/how-can-i-use-my-small-ssd-as-a-cache-for-a-larger-hard-disk
https://bytebitebit.com/tips-tricks/how-to-use-ssd-as-cache-for-hdd/

Happy configuring
Cheers,
Michael

Dr Michael Pilling (him / they)
Defence Analyst / Software Specialist 
_____________________________________________________________________________
Defence acknowledges the Traditional Custodians of the Country throughout Australia. We recognise their continuing connection to land, waters and community. We pay our respects to them, their culture and to their Elders past and present. 

-----Original Message-----
From: Pilling, Michael DR <[email protected]> 
Sent: Thursday, 17 October 2024 1:57 PM
To: Laurenz Albe <[email protected]>; Onni Hakala <[email protected]>; [email protected]
Subject: RE: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server [SEC=UNOFFICIAL]

EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.

UNOFFICIAL
I would say the best way to handle this would be to do it in the hardware drivers or file system components of the OS so that the SSD drives were used as a cache for the HDDs. This would generally result in all the active indices of the tables residing on the SSDs. It all depends whether your OS and its files systems are to the job. Certainly, Reisier4 file system aimed for all of its indices to be held in memory with only writes going to the HDDs - by using the SSDs for swap space this would have the desired effect under Reiser4.

Dr Michael Pilling (him / they)
Defence Analyst / Software Specialist
Human and Decision Sciences Division | Analytical Wargaming | Capability Analysis and Design Defence Science and Technology Group _____________________________________________________________________________
Department of Defence | Edinburgh  | 23L Bay 35 PO Box 1500 | EDINBURGH SA 5111
M: 0414 405 711 | www.defence.gov.au
E: [email protected]

Defence acknowledges the Traditional Custodians of the Country throughout Australia. We recognise their continuing connection to land, waters and community. We pay our respects to them, their culture and to their Elders past and present. 

-----Original Message-----
From: Laurenz Albe <[email protected]>
Sent: Thursday, 17 October 2024 2:53 AM
To: Onni Hakala <[email protected]>; [email protected]
Subject: Re: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server

EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.

On Wed, 2024-10-16 at 17:06 +0300, Onni Hakala wrote:
> I have a large dataset of > 100TB which would be very expensive to store solely into SSD drives.
> 
> I have access to a server which has 2x 3.84TB NVME SSD disks and large array of HDD drives 8 x 22TB.
> 
> Most of the data that I have in my dataset is very rarely accessed and is stored only for archival purposes.
> 
> What would be the de-facto way to use both SSD and HDD together in a 
> way use where commonly used data would be fast to access and old data would eventually only be stored in compressed format in the HDDs?
> 
> I was initially looking into building zpool using zfs with raidz3 and 
> zstd compression for my HDDs but I’m unsure how to add the SSDs into 
> this equation and I thought that this is probably a common scenario and wanted to ask opinions from here.

The traditionalway to handle that would be to create an extra tablespace on a file system built on the slow disks.  You'd move tables that are no longer hot to that slow tablespace.
If you cannot move the whole table, use partitioning and move old partitions to the slow tablespace.

Yours,
Laurenz Albe

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

end of thread, other threads:[~2024-10-18 00:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-10-16 14:06 Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server Onni Hakala <[email protected]>
2024-10-16 14:56 ` Bzzzz <[email protected]>
2024-10-16 16:01 ` TIM CHILD <[email protected]>
2024-10-16 16:23 ` Laurenz Albe <[email protected]>
2024-10-17 03:26 RE: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server [SEC=UNOFFICIAL] Pilling, Michael DR <[email protected]>
2024-10-18 00:22 RE: Recommendations on how to combine SSD and HDD drives in bare metal PostgreSQL server [SEC=UNOFFICIAL] Pilling, Michael DR <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox