JAMES: Planning deployments

Why?

As a project manager, I want to dimension James deployments.

How ?

This documents details existing deployment costs and set-up, as well as occupation, in order to help your planning.

Acceptable loads

Defining acceptable running conditions

We (arbitrarily) decides that, in peak hours:

https://web.twake.app/linagora-software/DevOps-f3h9d54age83hg1heag92c3g0242ac120i4-cbh306begeb5cg1heag842bg0242ac120i4-36209ha0g1437g1hebg8f18g0242ac120i4

openpaas.linagora.com deployment

Set up & costs

We use 1 OVH b2-30 to deploy James & RabbitMQ.

We use a 3 node Cassandra cluster - b2-30 each.

We use a 3 node ElasticSearch cluster - b2-30 each.

So (a b2-30 being 1.000€ a year) we spend ~7.000€ a year on OVH instance rents.

We furthermore rely on OVH object storage. Cost is of 15€/month so 180€/year.

Overall Load

In peak hours…

ElasticSearch Load

Used space: 48GB

Given a 50% free space security margin (segment merges) we should plan 16KB per message.

Linagora deployments uses 03 b2-30 instances.

During peak hours we measured (iostat & htop):

Disk utilisation: 2% on average - 4% sometime during several minutes - up to 6%
IO waits: few ms up to 25ms
Read workload: 0KB -> 80KB (mostly 0 KB)
Write workload:20 -> 500KB/s spikes up to 12 MB/s, 2MB/s not uncommon
Server load: 0.2, spikes at 0.8, mostly 0
Network usage: 40KB/s average, spikes at 1.2MB/s

Extrapolating, Linagora deployment (3 instances) could handling up to 1250 users (x8.333) . IO bound.

Cassandra Load

Used space: 6GB

Given a 50% free space security margin (compaction) we should plan 2KB per message.

Linagora deployments uses 03 b2-30 instances with a replication factor of 3 and quorum reads. Writes hits all nodes, reads hit 2 nodes out of 3.

During peak hours we measured (iostat & htop):

Disk utilisation: avg of 1%, spikes at 2.5%
IO waits: 5-20ms
Read workload: 0KB (all files are cached to memory?)
Write workload: From 20KB/s to 200-500KB/s avg ~ 40KB/s
Server load: 0.6
Network usage: 100KB usage, burst at 200KB

Extrapolating, Linagora deployment could handling up to 2000 users (x13.3). CPU bound?Likely more if at scale we are IO bound (3000 users ? x20?).

James Load

Memory 9.32GB used out of 28.8 (including RabbitMQ)
CPU load ~1.0, spikes at 1.5
Disk utilisation: 1% -> 3% (logging?)
Network usage: 1MB usage, burst at 12MB

Extrapolating, Linagora deployment could handling up to 1200 users (x8). CPU bound.

Object storage

Total storage space: 717 GB - 117KB/mail?

Object count: 8.140.000 - 1.33 per mail.

Having 3.500.000 attachments, and 6.100.000 mail, expecting 2 object per mail and 1 object per attachment we have a deduplication ratio of 50%.

Traffic: 659 GB/month