Why?
As a project manager, I want to dimension James deployments.
How ?
This documents details existing deployment costs and set-up, as well as occupation, in order to help your planning.
Acceptable loads
Defining acceptable running conditions
We (arbitrarily) decides that, in peak hours:
- 50% CPU usage should not be exceeded
- Load should not be higher than the number of CPU core
- 50% disk utilization (of bandwith)
- iowaits should stay low (< 50ms)
- network utilisation <50%
https://web.twake.app/linagora-software/DevOps-f3h9d54age83hg1heag92c3g0242ac120i4-cbh306begeb5cg1heag842bg0242ac120i4-36209ha0g1437g1hebg8f18g0242ac120i4
openpaas.linagora.com deployment
Set up & costs
We use 1 OVH b2-30 to deploy James & RabbitMQ.
We use a 3 node Cassandra cluster - b2-30 each.
We use a 3 node ElasticSearch cluster - b2-30 each.
So (a b2-30 being 1.000€ a year) we spend ~7.000€ a year on OVH instance rents.
We furthermore rely on OVH object storage. Cost is of 15€/month so 180€/year.
Overall Load
- 223 users (on db) - 150 active users for the webmail. (INBOX poll does 2 request every minute per user)
- 9.000 mailboxes (folders)
- 6.100.000 messages
In peak hours…
- 300 OPM (operation per minutes) at the JMAP level - stable (150 users)
- 1000 OPM at the IMAP level, peaks at 1600 OPM
- 7 OPM maximum mail processing - peaks at 15 OPM
ElasticSearch Load
Used space: 48GB
Given a 50% free space security margin (segment merges) we should plan 16KB per message.
Linagora deployments uses 03 b2-30 instances.
During peak hours we measured (iostat & htop):
Disk utilisation: 2% on average - 4% sometime during several minutes - up to 6%
IO waits: few ms up to 25ms
Read workload: 0KB -> 80KB (mostly 0 KB)
Write workload:20 -> 500KB/s spikes up to 12 MB/s, 2MB/s not uncommon
Server load: 0.2, spikes at 0.8, mostly 0
Network usage: 40KB/s average, spikes at 1.2MB/s
Extrapolating, Linagora deployment (3 instances) could handling up to 1250 users (x8.333) . IO bound.
Cassandra Load
Used space: 6GB
Given a 50% free space security margin (compaction) we should plan 2KB per message.
Linagora deployments uses 03 b2-30 instances with a replication factor of 3 and quorum reads. Writes hits all nodes, reads hit 2 nodes out of 3.
During peak hours we measured (iostat & htop):
Disk utilisation: avg of 1%, spikes at 2.5%
IO waits: 5-20ms
Read workload: 0KB (all files are cached to memory?)
Write workload: From 20KB/s to 200-500KB/s avg ~ 40KB/s
Server load: 0.6
Network usage: 100KB usage, burst at 200KB
Extrapolating, Linagora deployment could handling up to 2000 users (x13.3). CPU bound?Likely more if at scale we are IO bound (3000 users ? x20?).
James Load
Memory 9.32GB used out of 28.8 (including RabbitMQ)
CPU load ~1.0, spikes at 1.5
Disk utilisation: 1% -> 3% (logging?)
Network usage: 1MB usage, burst at 12MB
Extrapolating, Linagora deployment could handling up to 1200 users (x8). CPU bound.
Object storage
Total storage space: 717 GB - 117KB/mail?
Object count: 8.140.000 - 1.33 per mail.
Having 3.500.000 attachments, and 6.100.000 mail, expecting 2 object per mail and 1 object per attachment we have a deduplication ratio of 50%.
Traffic: 659 GB/month
