FlashArray und Amazon AWS: Snap to AWS / CloudSnap to AWS

[ NOTE: machine translation with the help of DeepL translator without additional proofreading and spell checking ]


As already introduced in my post "FlashArray and FlashBlade: Snap to NFS / Snap to Flashblade" from August 17, 2019 the topic Snapshot offloading to NFS storage, with Purity 5.2 the feature "Snap to AWS" was released, another feature based on Pure Storage's portable snapshot technology. With the latest Purity 5.3, functionality for Microsoft Azure Blob has also been added.


CloudSnap works like Snap to NFS, the only difference is that the interface is different.


As already known, snapshots are also used for test/dev scenarios or cloning operations. With "CloudSnap" this functionality is extended to not "besiege" your systems with snapshot capacity and to offload this capacity to cheap cloud storage. The transferred snapshots are compressed (not deduplicated) but in a user/application format that is not directly readable. However, a FlashArray system is required to restore snapshots. A restore is therefore also possible to any supported FlashArray system (also other model, but min. Purity 5.2).

As always, there are no additional software licenses/costs to use this functionality.


CloudSnap also works agentless, meaning no Pure Software is needed in AWS. Data compression is already performed during the transfer, saving network bandwidth and increasing the efficiency of the target. After the initial transfer of the baseline, only deltas of subsequent volume snaps (incremental snapshot) are transferred, this process of matching is done within Purity.


During a restore, Purity already knows which data blocks are present on the FlashArray and only needs to transfer changed/missing blocks. Likewise, deduplication-optimized restores are performed, meaning restored data from the offload destination is deduplicated directly during the transfer and thus does not occupy valuable space.


"CloudSnap" is an app/integration which runs in the "heads" of the FlashArray controllers - known as PurityRUN. The overhead required for this is minimal and does not have a large impact (max. 10% of performance) on the primary storage traffic.

A reservation (with lower prio) of 4-8 cores and 8-16 GB RAM is created. If the load on the system can no longer guarantee proper operation of the front IO traffic, the PurityRUN functionalities are throttled.


The practice


CloudSnap can be managed through the FlashArray GUI or CLI, but also monitored through Pure1. Of course, supported tools can also control operations into the arrays via the REST API.


Similar to the asynchronous Pure Storage replication, CloudSnap uses so-called "Protection Groups".


System requirements


A dedicated Amazon AWS S3 bucket is required. Dedicated in the sense that it must not contain any other data that is not in use with CloudSnap.


Limitations


Annoying for me at this point was that Purity currently allows a maximum of one offload target. Conversely, for me that meant I had to disconnect my snap to NFS target. The limits for the maximum offload to volume snapshots should not be a problem at 100,000. To an offload target currently 4 * FlashArrays can be attached at most.


* 2 FlashArrays for backup and restore + 2 FlashArrays for restore only.


Base - Setup


The initial configuration must be done by Pure Storage Support. To do this, simply create a ticket (subject: "Pure Storage PurityRUN CloudSnap enablement"). In advance, you should prepare a free IP address (with connection to the network). This IP address is needed by the support for setting up the offload.


After enabling the Remote Assist/RA, the support staff can now perform the configuration. The prepared IP address is placed as a virtual interface over a replication interface of both controllers (ct0-eth2, ct1-eth2). Here it is important to know: this has no influence on the operation of functionalities like ActiveCluster!

Finally, both controllers must be restarted one after the other (no downtime) and "CloudSnap" is fully usable.



INFO: since Purity 5.2.0 PurityRUN* is already active by default and contains prepared but deactivated apps. This means that no resources are wasted (when not in use).


In the Settings > Software > App Catalog tab, two prepared apps are displayed by default. However, you can install them, but not configure them.



PurityRUN* = a KVM virtualization platform for deploying integrations/apps on the Pure Storage system.

Configuration AWS


Setting up the AWS Bucket is quick, and those who just want to test the feature can take advantage of Amazon Free AWS for 1 year.


Login to AWS


First, we log in to the AWS Management Console and switch to the AWS S3 Console Dashboard.




Creation bucket


In the dashboard we click on "Create Bucket" and follow the wizard. A name for the bucket must be assigned and the region of the AWS data center. It is recommended to always use the shortest paths (unless geo-securing is required).

As always, I use unique names "offloadtoawsfrankfurt" for creation and identification.




Next, encryption must be enabled for this bucket. This step is mandatory for the proper configuration of CloudSnap. This customization is done on the bucket itself under "Properties" > "Default Encryption". AES-256 as encryption, on the other hand, is free to AWS-KMS.




The bucket configuration is in itself hereby completed. Important is at this point: "more - default - is sometimes more". In no case "Lifecyle rules" and a publication of the bucket may take place, in order to ensure the function and security.


AWS User Creation


Now, to complete the AWS configuration, a corresponding user with access to the bucket must be created. The user administration of AWS is done via the "IAM" and can be accessed via the search:

I created the user "purebucketuser-PURE-X50-2" with "Program controlled access" and assign him the policy "AmazonS3FullAccess" from the default policies. For arrays which only restore, ReadOnly permissions should be sufficient. I left all other settings at the default values.




With "Create user" the user is created and the access data is generated. You can export them as CSV (which I recommend for later use) and save them to a secure location.

Relevant for later use is: the "Access key", the "Secret access key" and the bucket name.


Configuration FlashArray


Integration Offload Target / FlashBlade


The prepared AWS bucket must now be connected to the FlashArray. This is done via Storage > Array > "+":

An alias must be specified: I am - as always - also a fan of unique names here, so I chose the specified bucket name from "offloadtoawsfrankfurt". I take the Access Key and Secret Access Key from the previously exported CSV. The bucket name can be read from the S3 AWS Console.

If the bucket has not yet been used for CloudSnap, it can be prepared for this with the checkbox "initialize bucket as offload target". The placement strategy is available in selection options:

  • aws-standard-class: with this option all offloads are swapped to standard S3 storage.

  • retention-based: with this option all offloads are stored either on S3 standard storage or S3 "cold" infrequent storage. This option depends on the respective values defined for the Protection Groups in Purity. "Retention-based storage is also known as S3-IA. All snapshots that are retained in the target for more than > 30 days are placed in S3-IA storage.

  • unchanged: this option MUST be used when connecting to an existing S3 bucket that has already been used with Purity.

The connection is established with "Connect" and the FlashArray has a connection with the bucket. The status is displayed in Purity in the overview. In case of problems you should check the accessibility between the systems and the set permissions.

The specified S3 bucket is scanned first, if snapshots are already detected, they become visible in Purity.



Configuration Snapshot-Offload-Job


As mentioned earlier, CloudSnap is based on Protection Groups. All volumes within a Protection Group (hereafter PGROUP) can be replicated to one or more defined targets. Within a PGROUP, volumes, snapshot schedules, replication targets/schedules/periods/windows can be defined.


Therefore, first we create a new PGROUP via Storage > Protection Groups > Create Protection Group (PGROUP must not be a member of a container) with a unique name: "PGROUP-offload-TO-FB-01".


Then we define the volumes to replicate, the CloudSnap target, snapshot plans and the replication plan.


Set up Protection Group



Customization Protection Group