2.7 Repository design considerations

Storage design choices

In the previous chapter, we discussed about all the possible options a provider has when designing the storage for Veeam Cloud Connect. However, in a typical Cloud Connect environment, not every possible storage solution is commonly used. Some options are not even available in Veeam Cloud Connect.

We can reduce the list of possible options to three:

  1. Linux Hardened Repository (self-built or using the Veeam Hardened Repository ISO)
  2. Windows Repository
  3. Object Storage

Each solution has its pro’s and con’s, and to guide users to the correct choice for their needs, we’ll walk you throuh some questions.

Physical or Virtual?

TL:DR = Physical.

There are multiple reasons to choose physical repositories, the two main poins are performance and security.

About performance: removing every additional layer in a server (even a tiny and very optimized hypervisor) will increase performance, but two additional aspects of virtualization may work against our repositories. The first one is CPU/Memory contention: the repository is probably not the only machine running on the hypervisor, and unless we configure memory and cpu reservations, the virtual machine will have to compete with others for resources. The second aspect is networking contention: people wrongly believe that a vmxnet3 driver can only go up to 10Gbps. This is only meta information the driver passes to the operating system, but in reality it can and it will be able to fully utilize the entire bandwidth. But at high speed, again contention between multiple VM’s on the same host will come into play, and at high speeds you will run into CPU contention, as all those packets have to be processed.

About security: the virtual repository is running abstracted over an additional layer, that brings its additional set of vulnerabilities. Even hardening the virtual machine with all the best practices suggested by Veeam will still leave an uncomplete setup.

finally, security is also about availability: a problem to the hypervisor can impact all the hosted virtual machines, also comprising the Veeam Repository.

For all this reasons, we suggest to have a physical repository.

Do you want immutability?

TL:DR = YES!

These days, ransomware is probably the biggest treat to customers data. And among the possible safeguards, immutability is probably one of the most effective. Veeam has multiple choices for immutable backups, but in the context of Cloud Connect two main options are available:

  • Linux Hardened Repository, if you want to build a block storage solution
  • Object Storage with Immutability, if you want to build a object storage solution

You may notice here that this is the same list of the start of the chapter, minus the Windows Repository. At the time of writing (April 2025) Immutability is not possible on a Windows repository. In case a provider wants to build block-based repositories, we suggest to use Linux Hardened repositories. Windows repositories are supported anyway, but in this case providers will need to compensate the lack of Immutability with compensative solutions, like Capacity Tier (using a Object storage with Immutability) or Tenants to Tape.

Immutabiliy obviously increases the used storage space: this can be partially compensated by the use of BlockCloning, available both on Linux repositories (when using the XFS filesystem) and Object Storage. See below for more information.

Simple Repositories or SOBR?

TL:DR = It depends…

When there are multiple repositories, providers need to decide if they want to create Cloud Repositories directly over each of them, or they first want to group them in one or more SOBR (Scale-Out Backup Repository). This is especially true if we are talking about block storage, while with object storage its capability to linearly grow without changing teh configuration allows them to be used even as simple repositories.

With Simple repositories, a service provider needs to plan how to manually distribute customers among the several repositories he could have. And in addition, he must keep some free space for a future increase in the cloud repositories’ quotas and additional operations. A customer may start with a small amount of space, but after some time they could ask for an increase in the storage quota. If there is no free space left in the repository, the service provider will be able to satisfy the customer’s request only by migrating the customer’s backup files into another repository. This can be done almost transparently, but it involves some manual activities on the part of the service provider and some downtime in the customer’s Veeam Cloud Connect service. Cloud repository quotas are strictly applied, but as long as the customer is not using the entire amount of the assigned quota, the service provider can use some over-commitment. However, the service provider should carefully evaluate the level of over-commitment to avoid any interruption of the service.

Scale-Out Repository logical architecture 2.12: Simple Repositories

Scale-out Backup Repository has several advantages, and we highly suggest to use this design (with a warning we’ll explain later). Veeam Scale-Out Backup Repository (called SOBR) groups multiple simple repositories into a unique logical entity that can be dynamically expanded, modified and shrink, while the logical entity as a whole is constantly seen as unchanged by the client component.

To create it, multiple Veeam simple repositories are grouped together into a SOBR as extents.

Scale-Out Repository logical architecture 2.13: Scale-Out Repository logical architecture

This solution helps service providers avoid capacity problems in their repository design, since they can react quickly to a capacity shortage without changing any configuration to the repository structure.

Even if a provider is planning to start with a small Cloud Connect deployment and thus is only going to deploy one simple repository, we suggest to create immediately a SOBR with one extent:

Scale-Out Repository with one extent 2.14: Scale-Out Repository with one extent

By starting immediately with SOBR, its file and folder structure is configured since the first received backup, and thus any future expansion of the group with the addition of other extents will not require any migration. Placement policies will take care of automatically placing new backups into the newer extents.

NOTE: existing Veeam Cloud Connect deployments using simple repositories can be migrated to the new Scale-out Backup Repository. Providers should engage Veeam Support, that will execute the migration for the service provider by simply opening a support ticket.

So, back to the original answer “It Depends…“, the reason for it is Configuration Backup. It can only be sent to a Simple Repository, not to SOBR. If a provider wants to leverage SOBR capabilities, and offer a remote and secure storage also for Configuration Backup, the solution is to offert both options:

  • One or more cloud repositories created from a SOBR for backups of virtual machines and agents
  • One cloud repository created from a Simple Repository (or Object Storage) to receive Cofiguration Backups

Design Considerations

Storage space sizing is not covered in this book. Cloud Connect is a highly variable system where data may grow fast, so sizing such an environment would be an impossible task. Providers should look more for the flexibility and rapid scalability of their design.

When choosing a repository design, service providers may plan to have large repositories or build smaller systems to reduce the failure domain of each repository. But regardless the strategy, some considerations may be worth to share.

Server sizing

In regards to the memory sizing of a backup repository, it is important to understand how a Veeam repository uses memory. Veeam Backup & Replication has four different levels of storage optimization for a backup job:

Storage optimization options for a backup job

2.15: Storage optimization options for a backup job

A repository uses memory to store incoming blocks. This queue collects all blocks coming from source data movers, caches them in memory and after some optimization, this content is flushed to disk. This reduces the random I/O affecting the backup files to a minimum, while trying to serialize as many write operations as possible.

Also, Veeam backup files contain deduplicated information of the saved blocks. As with any deduplicated storage, there are metadata information stored along the file itself in order to keep track of stored blocks.

To improve performance, the repository loads dynamically this metadata information into memory. Starting from Veeam Backup & Replication v8 Update 2, the cache accelerates both write and read operations, but there are also differences in the way the cache is populated and used. The amount of consumed memory for metadata depends on the selected block size for deduplication:

VBK size Optimization VBK block size Memory consumption for VBK metadata
1 TB WAN target 256 KB 700 MB
1 TB LAN target 512 KB 350 MB
1 TB Local target 1024 KB 175 MB
1 TB Local target 16+ TB 4096 KB 44 MB

Note: Starting from Veeam Backup & Replication v9, the new block size for Local target 16+ TB is 4 MB instead of 8 MB. The previous value for memory consumption was 22 MB.

By adjusting these values to a real scenario, service providers can estimate how much data a given repository will be able to process at a certain point in time; or said differently, how much memory will be needed for an expected amount of processed data.

If a given backup repository is assigned to different customers and all of them are executing their jobs at the same time, the total memory must be divided among all the incoming jobs. The Veeam repository doesn’t constantly consume the same amount of RAM, because it can dynamically load and offload metadata, but planning for the maximum possible consumption is a good choice to be prepared for the worst-case scenario.

Finally, it’s worth to remember that backup and backup copy jobs are configured by the customer and not by the service provider. There is no direct way for the service provider to plan for an accurate utilization of the backup repository memory, because the provider does not know in advance which block size will be used and what the total size of a backup set will be. However, the quota configured for a tenant in Veeam Cloud Connect can also be considered the maximum possible size of a backup file of a customer. For these reasons, proper monitoring of the backup repository is paramount, so the provider can quickly identify when the system is too stressed.

Ultimately, it is up to the service provider to design a single large backup repository, decide to have multiple simple repositories, or leverage Scale-Out Backup Repositories or even Object Storage, and size their memory accordingly.

Concurrency

Concurrency in a Backup Repository is an interesting and important topic: the Load Control section of a repository has two main values, and while data rate is pretty easy to understand, the limits applied to concurrent tasks can be a bit tricky, also because the behaviour in a Cloud Connect environment is different from an end-user installation of Veeam Backup & Replication..

Limits of a repository should be carefully evaluated by the service provider, in order to find a balance between:

  • avoid to overload the Cloud Connect environment with too many tasks; we also recommend to never remove the limit but to always set a number;
  • have customers jobs waiting for available resources at the service provider for too long because there are too few task slots assigned to a tenant:

Configure carefully the repository load control

2.16: Configure carefully the repository load control

Let’s start from the basic concept: a task is an operation that can be executed by a Veeam Repository. A backup job, a backup copy job, but also compact or merge operations, they all consume a task slot. So, maximum concurrent tasks is the value of how many of these operations a repository can run at the same time.

This is the general concept, but in Veeam Cloud Connect the behaviour is different: in order to guarantee to tenants that a job is always executed when the tenant scheduled it, every job is executed as long as there are still free tenant concurrent tasks :

Max concurrent tasks for a tenant

2.17: Max concurrent tasks for a tenant

This is the most important concept to understand when planning for Veeam Cloud Connect backup services, let’s repeat it again:

Veeam Cloud Connect allows each tenant to execute up to the maximum amount of concurrent tasks assigned to the tenant itself, REGARDLESS of the available concurrent tasks slots in the repositories.

This is done on purpose to let tenants consume the slots they are paying for, but it can lead to undesired results if the environment has not been sized correctly.

Let’s use this example: there are 4 repositories joined together into a SOBR. The first three have 4 CPU and 16 GB of Memory, and the fourth has double of those resources; they all offer 400GB of disk space for backups.

The overall SOBR is this:

Server CPUs Memory Storage space Max concurrent tasks
REPO1 4 16 GB 400 GB 16
REPO2 4 16 GB 400 GB 16
REPO3 4 16 GB 400 GB 16
REPO4 8 32 GB 400 GB 32
Total 16 64 GB 1200 GB 80

In an end-user environment, this SOBR would be able to accept up to 80 concurrent tasks. But in Cloud Connect connection limits are regulated by the tenants configurations. If for example we have 100 tenants, each with 5 allocated tasks, the total amount of tasks that the Cloud Connect environment will accept is 500, way more than 80.

This case obviously will happen if every tenant is using all their assigned tasks at the same time; the assigned value is a hard limit, so even if there’s only one tenant actively running tasks, it’s limit of 5 will never be surpassed, even if there are 80 task slots in the SOBR.

So, why we need to configure Max concurrent tasks in each extent of a SOBR group? This is still an important parameter to set, because it influences the way internal load balancing of SOBR works.

Let’s explain how a SOBR selects which extent has to be used.

For an existing backup chain, as long as there is enough free space, the choice is simple: SOBR selects the same extent where the full was stored (for data locality policy) or the extent where the previous incremental was stored (for performance policy). Since blockcloning is the choice for any block storage these days, and it requires data locality policy, we can assume Data Locality is the used policy. So, SOBR will try to write data where the esisting chain is already stored.

For a completely new chain or (in case of performance policy) for the first incremental, the placement of the backup file has to be decided. SOBR algorithm works in this way:

  • First, SOBR selects only those extents that will not break the placement policy;
  • Then, SOBR lists the extents based on their actual load, measuring the used slots and dividing this number by the Max Concurrent tasks. We may have a situation like this:
Order Server Max concurrent tasks Used slots Load
1 REPO4 32 13 40.6 %
2 REPO2 16 7 43.7 %
3 REPO3 16 9 56.2 %
4 REPO1 16 policy-breaking N/A

In our example, REPO4 will be the selected extent: it has the most active sessions, but its load is the lowest of the SOBR extents. The load is calculated again at each new session, and since the limit is set by the sum of the tenant extents, we may also have load values above 100%.

Finally, if there’s a tie between two or more extents in terms of Load, SOBR uses Free space as the second parameter to evaluate: the extent with more free space will be selected, if both have the same load.

Design Tip: if you want performance to use uniformly a SOBR, plan to have extents, and thus the underlying servers, with an even amount of CPU cores.

BlockCloning

BlockCloning can clone blocks by just updating its metadata information, without effectively reading and writing the same block multiple times, but only updating the reference count of the same block.

Blockcloning has been first leveraged in Veeam Backup and Replication 9.5 when combined with Microsoft Windows 2016 and its new ReFS 3.1 filesystem. The same concept applies now for Linux repositories using the XFS filesystem, and Object Storage. We highly suggest to leverage BlockCloning, since it reduces the used storage spage and the time to complete backup operations.

To understand how it works, let’s suppose we have two files made with multiple blocks (images are taken from Microsoft MSDN):

Cluster layout before clone

2.18: Cluster layout before clone

Now suppose an application issues a block clone operation from File X, over file regions A and B, to File Y at the offset where E currently is. This is the same operation that a Veeam Backup transform operation does, where an incremental backup is merged into the full backup. The result on the file system after the clone operation is this:

Cluster layout before clone

2.19: Cluster layout after clone

The new file Y is not effectively written again, it’s just an update operation on the Reference Count of the file table in ReFS, and now block regions A and B are used two times in the file system, by both file X and Y. The net result is that transform operations in Veeam Backup & Replication are extremely fast now, as only metadata need to be updated. There are also advantages for GFS retention (typical choice for a Backup Copy Job that can be sent to Veeam Cloud Connect): a complete full backup is written each time, but now it doesn’t consume additional space on the disk, as the same block is just referenced multiple times.

NOTE: there is NO space saving on incremental backups during transform operations, as the same block is always written once, and it’s only moved from the incremental backup file to the full one. Transform operations are about time saving, not space saving. You can get savings when you run a synthetic full, either in backups or backup copy jobs (like GFS retention).

In order to leverage this technology, service providers need to have at least one repository that supports BlockCloning: REFS in Windows, and XFS in Linux. Blockcloning on Object Storage is automatic.

Veeam datamover in Veeam Backup & Replication recognizes the ReFS/XFS filesystem and can then leverage BlockCloning. Service providers and their customers can recognize the effectiveness of the API by first of all looking at the time for completing a merge operation, and by looking at this line in the Job Statistics:

Fast Clone leveraged in a Veeam backup job

2.20: Fast Clone leveraged in a Veeam backup job

Design Tip: One optimal design we suggest is the usage of multiple repositories, each using Linux Hardened Repositories, with XFS volumes, grouped together into a SOBR with data locality policy.