What is it?
The L1TF (L1 Terminal Fault) vulnerability has been identified as another Intel based speculative execution side channel vulnerability, similar to the Spectre and Meltdown vulnerabilities which made headlines earlier this year. It is currently only identified as a theoretical vulnerability with no proof of its malicious use in the wild.
How is it a risk?
The risk of the vulnerability is that when specially crafted malicious code is executed on a unprotected VM in an unprotected shared virtualisation environment, CPU cache memory from other VMs running on the same CPU can be obtained. The contents of this cache memory could be such things such as reading passwords or private keys from other unprotected VMs.
What is the impact?
The impact of this by a bad actor is potentially leaking of sensitive information across virtual machine boundaries, which usually should not be possible. Obviously this is more of a concern across shared and public cloud offerings, than private clouds.
What systems are at risk?
- All unpatched operating systems
- Virtual machines running on VMware vSphere and other hypervisors
How do we patch for it?
To completely cover your VMware environment from L1TF it means the following steps need to be taken;
- Identifying if your VMware environment and VMs are affected and to what degree
- Identifying the impact of the fixes (see next question for details)
- Implementing the fixes, which will most likely include the following sub steps;
– Upgrading vCenter to a patched version
– Rebooting each host and updating the firmware for the Intel CPU
– Installing the ESXi patch and restarting each host again
– Enabling the ESXi L1TF threat prevention setting and restarting each host
– Installing any applicable virtual appliances and OS’s with L1TF patches
What is the impact of patching and enabling the threat prevention?
As outlined in the previous question, to completely cover your VMware environment from L1TF it means enabling the L1TF threat prevention setting on each ESXi host. This setting will have a CPU impact to the host as it includes disabling the hyper threading feature of Intel CPU’s. Hyper threading technology is responsible for giving, in some cases, up to double the performance of Intel CPU performance within servers. Recent benchmarks across environments indicate that this number is closer to 10-20%, but it really depends on the types of applications running within your environment. By disabling hyper threading, the CPU capacity will be decreased and this may or may not have an impact on your ability to comfortably run your current virtual machine workloads on existing hardware, especially in a failed server or maintenance situation.
There has been a PowerCLI utility provided by VMware that can be run against a VMware environment to give an indication as to the risk of making this change to environment. The recommendations are based on the current workloads and looking back at rolled up performance data going back a number of months provided by vCenter.
Stay tuned for more in depth information on the L1TF threat and how it relates to VMware environments.
Since 2002, Perfekt has been delivering IT infrastructure solutions, services and consulting to address the business challenges IT organisations face within their respective companies from the continually changing IT landscape.
This information pack is intended to provide an overview of the recent documented exploitation of modern CPU (mis)features.
It is not designed to be a comprehensive document on the fine details of the exploits and specific platforms.
Where possible, links will be provided to full documentation and disclosure.
This document is a description of the situation, remediation options and impacts, as well suggestions for remediation strategy.
Information is rolling out daily on these subjects.
How long does a bridge or building last?Been to Europe lately? It could be over 500 years. The trouble is in Australia, we don’t often have that long term thinking. Certain regulations ask us to retain records for:
- The life of the patient (or until the child turns 18 plus 7 years = age 25)
- The length of employment, plus 7 years
- The term of the contract
- At least 7 years for financial records
- After an OH&S incident, plus 5 years
- 70 years after the end of the year of a copyright creator’s death
- And so on
- And some regulations are specified as “at least…”, with no maximum specified.
What could this look like?:
Every day you add new records to the database
You may update some records (depends on how the application handles edits)
You don’t purge old records to save space because the disk space is relatively cheap, and the application updates existing records to show an account or transaction as being closed or complete.
For protection purposes, do you:
- Back up the database daily and keep each backup for 7 years?, or
- Keep the database with historical records online, and apply data protection techniques to meet the regulatory requirements? This could take the form of:
- an occasional backup for recovery purposes
- Retain this backup until you take the next backup, with a small number of levels of recoverability
Levels Of RecoverabilityThe ongoing mistake made is over-catering for multiple “levels of recoverability” from copies made years ago. If a file such as a database is updated every day for 7 years, there would be very few circumstances where you would need to go back to a copy specifically from 2374 days ago (6 years, in case you were wondering). Most organisations only keep a monthly “archive” copy of the backup for the purposes of a just-in-case recovery as needed. Very few organisations would ever recover an application back this far. On very rare occasions they are more likely to run up a parallel copy and examine aged records. However, if the online database is retaining records all the way back to 7 years ago, then this typically would meet the record retention requirements without any backup copies. What does this mean: unless there is some hefty purging of content (by users or inside the application), there is no need to retain backups for such a long period. What about files in a file server, where there are ongoing updates and even deletions of files? Management of whole files is actually more challenging that for records in a database. Enterprise backup software applications have an archive module for exactly this functionality. If you intend to purge entire files from the active primary data store, for example to relieve pressure on backup times, but also on primary storage usage, then the best option is to archive this data; usually to a much lower cost tier such as nearline disk or tape, or possibly to the cloud. For compliance purposes, to retain a copy of a file, some type of protection scheme must be in place. This could be a backup, archive, records/document management system, or repository.
Data Classification and RetentionData Classification and Retention
A 2012 study across a broad range of industries showed that:
So what would be a simple and effective backup retention regime?Assuming the deep purging of in-database records does not occur, then your life can be quite simple. Here’s an outline of sound practice:
|“Current” data set|
|How long to keep:||Determined by restore profile. If not sure, 1 month is a good benchmark|
|Databases frequency of copies:||Could be backed up hourly (or in extreme cases every 10-15 mins). High frequency database backups would have a single backup nominated as the retained representative daily backup; where other intermediate copies are expired within a small number of hours.|
|Files frequency of copies:||Usually daily (perhaps augmented with a snapshot regime on the file server/NAS).|
|Method:||Archive and replace with a tiny stub file, (or completely remove from primary storage – but less common)|
|Frequency of copies:||Monthly, or perhaps fortnightly|
|Archive candidature:||Typically older than 24 months, but can be lower depending on data access patterns|
|Where to keep the backup and archive copies:||Commonly nearline disk, as this allows for immediate and easy recovery at a low cost; but also tape is highly economical especially for vast data volumes. Cloud is also an option and many modern data management platforms have strong cloud interfaces.|
|Archive protection:||Archived files should be retained in at least 2 places: disk and tape, or 2 disk copies in different sites to cater for a disaster event|
When aged data grows to an unmanageable scale, how should this be managed?Some data, especially voluminous data sets including geo-seismic data, clinical records such as a pathology tissue scan or x-ray, or files from a complex engineering project, may need to be retained for many years. Clinical records often need to be retained for the life of the patient. Engineering drawings can provide great insight into a building for refurbishment or later demolition. The life of a building or a bridge could be in excess of 100 years. How do we retain all of this data, and where is it best kept, if it grows to 50 or 100TB (or more) over the life of an organisation? Keeping such aged data on the primary storage disk array is uneconomical, especially new and expensive all-flash arrays. This is ever more important if the access rate is low. Relative likehood of access:
- WORM (write-once, read many)
- Version control
- A minimum of 2 copies of every file to cater for corruption
- Self-healing from corruption
- Geographically dispersed copies and replication, to cater for DR
- Therefore its data doesn’t need backing up – because of the features above
Are you one of those super-paranoid organisations who have decided to keep all backups forever?Then you are certainly not alone. The vendors will love you; however you must consider how you will handle technology obsolescence. While the shelf life of an LTO tape cartridge is 30 years if stored under ideal conditions, the chances are that in 15 years you won’t actually have the hardware or software technology necessary to recover the data.
- Focus on data retention, not backup retention
- Understand your restore profile and work out how many levels of recoverability are needed – it will be less than you think
- Investigate whether your applications actually retain most historical data online (likely!)
- Aggressively retain fewer backups in line with the points above.
Animal Logic makes the visual effects that go into films like Matrix 1 and 2, The Lego film, and Great Gatsby.
In this video case study, their head of IT, Alex Timbs, explains why they selected a prefab data centre from our partners at Schneider Electric.
This video will be relevant to you, if you need:
- very high compute or storage or both
- a flexible solution that is modular, expandable or portable
- a solution that is very quick to provision
I don’t know about you, but when I was working as an infrastructure systems administrator my life was ruled by a list. A list of to-do’s to get the environment to a secure and happy place. There would be weeks where my list got smaller as I got things crossed off it. But mostly, weeks where the list ended up longer than before, as I uncovered another upgrade to do or issue to fix which always took priority over the things I actually wanted to get crossed off.
One thing that I never got to tick off while I was in this role was a proper password management system. For far too long an excel spreadsheet was the home of the most important and sensitive information my network had to hold and it deserved better.
It was only until a few years later where I started working in a consulting role that I actually got to design and implement such a system and I’m not ashamed to say, it felt great!
So let’s talk about Password Management Systems and the review points I used when evaluating various products and tailoring them to fit the customer.
- What is so bad about the current methods used and how can we fix these with a new system?
- Who will the users of the system be and what access should each user have?
- How will these users access the system?
- How will we make it easy to administer/modify/add to etc?
- How can we make it as secure, robust and yet user friendly as possible?
- How are we going to get the current password information into the new system?
- Can we do this with free tools or does it have to be a paid product?
The answers to these questions seem to be pretty common throughout the implementations I have done since. I will answer them below and see if they sound familiar to you. As you will see I usually have the source as Keepass or excel spreadsheets.
- What is so bad about the current methods used and how can we fix these with a new system?
It usually lacks the following major features;
- ability to shield the passwords from prying eyes (products like Keepass have this though)
- auditability, to report (or alert) on when something was changed or accessed
- granularity, anyone who can access the current system can see every username and password we have
- fixed portability, what is stopping someone from copying an excel spreadsheet or Keepass file and brute forcing the password offsite
- Who will the users of the system be, what access should each user have and what sort of information should reside in there?
Every member of IT should be able to access the system and only get the passwords appropriate to their role. It would be great if they could also store their own personal website passwords too, as long as they are only business related. They should be the only ones allowed to access these personal passwords.
It would be great if it could hold things such as license keys too, as these are considered as sensitive as passwords in some cases.
- How will these users access the system?
Active Directory integration is a must here. The system should not be accessible outside of the organisation (except via VPN/Citrix/VDI etc). In the event of AD being down, there would be a master user password that could be used to access the system, but this would be a break glass sort of situation only.
- How will we make it easy to administer/modify/add to etc?
The key is that we don’t want to have to apply individual permissions to every password, but to have a structure that each password will reside in that makes it like a folder view where permissions are set per folder and the password can be moved as required. If we have passwords that will need different permissions, we have these as exceptions and can assign these as required in a place where we know special permissions exist.
With the structure in place creating new passwords would be as easy as just putting it in the right folder and the inherited permissions take care of the rest.
- How can we make it as secure, robust and yet user friendly as possible?
Different products have different ways of handling this, but seeing as cloud based systems are more often than not ruled out we want to keep it as isolated and self-contained as possible for security and rapid recovery purposes and with encryption enabled. User friendliness is key to keep IT staff wanting to use the system and not go back to old habits.
- How are we going to get the current password information into the new system?
Again different products have different methods, whether we are talking about the source of the passwords or the destination, but this is important to check when comparing products.
- Do we need any sort of automation or advanced functionality?
Some of the other features that can be useful, but are usually not required at the time of implementation, are;
- Automatic discovery of accounts
- Automatic testing of passwords to ensure they are still correct and active
- Automatic changing of passwords once changed in the Password Management System
- Password workflows
- Alerting when a password is due to expire
- Server clustering support for high availability
Mostly these are available in the higher tiered options of products.
- Can we do this with free tools or does it have to be a paid product?
Depending on all the answers above, especially question 7, we work out the best product for the job. Surprisingly, more often than not there is a free option available.
I always found it strange that after external audits of the organisation, the lack of a password management system was never made a priority or ever even highlighted as part of their findings. I suspect this hasn’t changed because I still see the vast majority of the environments either running the old excel spreadsheet or a product that doesn’t achieve much better security.
TL;DR Password Management Systems don’t need to be overly complex or even overly expensive but can give you a great improvement over what you have for storing sensitive information. Perfekt have experience in this area and can help if you have it on your to-do list too.
For further information please contact your friendly Perfekt account manager.
Many years ago, before most backup products had backup-to-disk options, vendors launched Disk-based appliances, known as Virtual Tape Libraries, as a backup target. These enhanced the backup and restore experiences over tape. Over time these appliances were enhanced to provide deduplication in the box. This supplemented the lack of such features in some backup solutions.
Time has of course moved on and products such as Commvault has had backup-to-disk as an inherent part of its architecture since inception, and has supported deduplication for many years now.
So the question becomes: What is the best way to perform deduplication? Within backup software, or with an appliance?
Appliances seem easy, as you don’t need to consider the deduplication within the backup software, just buy a box with 3x of the capacity of your data being backed up and you might get a month or so of backups on disk. The backup software sees it as a generic disk target.
However, it really isn’t that simple, and there are a number of factors often overlooked when contemplating this approach. This blog article is designed to flesh these out for you so you can make a considered decision. These are broken down into a few distinct categories:
A dedupe appliance is a self-contained unit and generally can’t be messed with by outside factors. For some users this “black box” approach is simple, but has a number of notable downsides.
They perform the deduplication task far too late in the data path to be useful. Deduplication is done at the very end of the data movement process. Software dedupe (eg. by Commvault) is at the start of the data path. Nothing leaves the client computer unless the Media Agent confirms that it is needed. With a dedupe appliance all of the data has to be sent by the client across the network and be processed by the Media Agent. It then has to be sent out of the Media Agent to the appliance, at which point it’s deduped. There is no aid to backup performance when you still need to move all of the data.
When one compares a typical amount of data that is sent out of the client using Commvault dedupe – somewhere around 2% to 5% daily is quite normal, against what gets sent across the network when using a hardware appliance – 100% – there is really no contest. Commvault dedupe saves significant amounts of two things: time and space. Time is saved because very little is sent across the network from client to Media Agent, and space is saved because only a single copy of everything is stored. A dedupe appliance saves space but no time at all because of the 100% data transmission.
Dedupe performed at the client (like Commvault) is “content aware”. This means that for every item it is backing up, it re-starts the alignment again.
For example, if a system has file 1, file 2 and file 3, and a user edits file 1 and makes it bigger, this won’t disrupt the deduplication for file 2 and file 3, because when it’s finished with file 1, the agent will open file 2 and start again at the beginning of the file, so everything lines up again. A dedupe appliance has no idea what is going on in the client and is not content aware. Commvault (and other backup products) will write big chunk files to the appliance, and these don’t dedupe very well once some content has been edited as described. Today’s chunk files don’t look much like yesterday’s, so for that reason an appliance simply cannot get the same sort of space savings that deduping at the source as Commvault does. No dedupe appliance can achieve dedupe savings of 98% or 99%, and this is often achievable with Commvault dedupe.
Note the Size of Application (original data size, at source), the Data Written (data sent across the network to the Media Agent), and the Savings Percentage – a whopping 98%! Only 2% of the data was sent across the network and only 2% was stored in the disk library. No dedupe appliance can achieve ratios like that.
The picture gets murkier when considering multiple sites and DR protection of backup copies. Such site-to-site copy is only able to be performed efficiently when you have purchased two such dedupe appliances, Commvault cannot participate in assisting this process as it is handled in the back end by the appliance and Commvault is (mostly) unaware that the copy has been made. Now you have hardware vendor lock in for two sites that must both be upgraded together.
Often, in such circumstances you have limited control over the mirror site policies. Commvault solves all of this with DASH copy, where the hardware can be dissimilar and your retention policies can be as varied as you like: keep some jobs for 3 months at Prod and 1 month at DR, keep other jobs for 3 months on each site, etc. As granular as you might need.
Since dedupe appliances only see what data is sent to them, large and complex sites with a mix of backup and archive data are not able to take advantage of global site dedupe. Every site, no matter how big or small, would need the same vendor’s dedupe appliance and this get inordinately complex (not to mention expensive) when trying to coordinate large scale fan-in of remote site content. It is simply not feasible. Contrast that with Commvault software-based dedupe you can protect a single desktop right through to a massive NAS device of >1PB and dedupe will work for all enterprise data, backup and archive, with cross site data management handled efficiently and seamlessly.
Despite purchasing the dedupe appliance, you will also need to consider the licensing obligations of the backup software itself. Within Commvault, depending on your licensing model, in the early days you needed to license the total addressable (effective) capacity of the dedupe appliance. So if it had 40TB of usable disk and with deduplication offered 150TB effective space you would need to purchase a Standard Disk Option license for 150TB.
With the newer Commvault licensing schemes such as a Capacity License Agreement and the VM protection Solution Bundles, your license already includes Commvault deduplication capability! Why would you want to go out and buy another/different solution when you have paid for it? 95% of backup systems are configured with deduplication as it is the market expectation. It makes no sense to bypass that and buy an appliance to do the same thing.
Once you have Commvault deduplication licensing, there is no more to pay for back end dedupe capacity expansion, just the cost of the disk. Dedupe appliances will cost more than generic JBOD disk, thus you are paying more than you will should.
Related to this, is that a dedupe appliance locks you in to a specific HW vendor. Your upgrades need to come from them. Your second site appliance must be the same, so that must also come from them.
Further, capacity upgrades to the dedupe appliance are limited to what the vendor offers and some boxes are restrictive in capacity options. With Commvault software dedupe you can add JBOD after JOBD separately (even from different vendors) and therefore not be concerned with the Library device itself. You are free to choose any vendor Media Agent server, so long as the specification meets the needs for performing deduplication according to Commvault guidelines. If your shop has a preference for HP, Dell, Cisco, IBM/Lenovo – then you can stay with that choice.
All of the points above add up to a total cost of ownership for dedupe appliances which can only be more expensive than using the Commvault deduplication you probably have already in your environment.
- Earlier in the data path = faster backups, less user impact, less network traffic (thinner pipes)
- Software implementation = regular JBOD, no vendor lock-in with proprietary algorithms and hardware
- Client aware = more efficient deduplication
- Multiple site protection = easy implementation, allow for cost-effective tapeless DR
- Remote copy policy flexibility = limit disk capacity to rules that follow a business retention process, not one mandated by vendor design
- Global dedupe = corporate benefits
- And don’t forget that dedupe appliances may also have a Commvault license obligation
On the 8th of December, 2015 VMware released a patch to ESXi 5.5 to address the POODLE vulnerability in SSLv3. Their patch disables SSLv3 on the host altogether and only allows the more secure TLS cipher to be used instead. The patch is called ESXi550-201512101-SG and is titled “Updates esx-base” in Update Manager and could cause you problems if you upgrade before vCenter.
If you are already at the vCenter version they released at the same time (vCenter 5.5 U3b) then it is safe to upgrade the hosts to this patch level as communication will continue to work fine over TLS.
However, if you have vCenter below this latest 5.5 Update 3b level and you install the ESXi patch you will not be able to connect to the host in vCenter after the patch is installed and it’s subsequent restart. This is because vCenter will still be trying to communicate to the host via SSLv3 and the host now has it disabled.
If you do install the patch you have two options to enable communication again, either you can re-enable SSLv3 on the host (following the procedure here http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2139396#hostd) or you can upgrade vCenter to 5.5 U3b (the preferred method).
If you are reading this before you install this patch then planning to upgrade to vCenter version 5.5 U3b would be the ideal solution. There are also newer versions of VMware’s other software that uses SSLv3 and these need to be upgraded too, e.g. SRM, vRealize Operations, VMware tools, etc. VMware has an article on the order to upgrade these applications here – http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2057795
Something else to be mindful of is that external applications that communicate with vCenter and ESXi may currently do so with SSLv3, so upgrading to 5.5 U3b may stop this communication from working as TLS support may not be implemented in the application. This is something that can be tested by going into the Advanced Settings of vCenter and disabling SSLv3 in “SSL.Version” setting and restarting vCenter. Testing this on the ESXi host level can be done by this procedure http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2139396#hostd.
- Cluster size up to 64 hosts (32 previously).
- Each host can run up to 1024 VMs, have 480 logical CPU’s and 12TB of RAM.
- Local ESXi account management is now through vCenter (no more having to rely solely on the root account). This also comes with account lockout and password complexity options.
- Improved audit logging within the host, where vCenter users’ details are now logged against actions within the hosts files.
- Compatibility level 11 now supports VMs with up to 128 vCPUs, 4TB RAM and you can add USB 3 controllers.
- Clustering support for Windows 2012 R2 and SQL 2012 and the ability to vMotion clustered VMs with physical mode RDM’s between hosts.
- NVIDIA GRID vGPU support for VDI VMs
- FT now supports VMs with up to 4 vCPU’s and 64GB of memory.
- FT also now supports snapshots so increases the chances that you will be able to back it up via a VMware level backup. Check with your backup vendor first though.
- FT also now creates duplicate storage for the VM, which means they potentially can be running on local storage and have the secondary copy on local storage on another server.
- Now supports up to 24 recover points per VM
- vSphere Replication can now compress the replication traffic reducing the bandwidth requirements
- Supports interface and bandwidth control of vSphere Replication traffic
- Ability to vMotion a replica without having to fully resynchronise
- vMotion can now be completed across vSwitches (useful for cross cluster migrations) and even across vCenter Servers.
- Long Distance vMotion allows migrations across large geographical areas (assuming <100ms latency). It requires 250Mbit of bandwidth per migration and a stretched Layer 2 network at both sites but can be very useful for moving VMs from site to site with no downtime.
- The backend components of vCenter, such as SSO, Inventory Service and Web Client have now been combined together into a role known as the Platform Services Controller (PSC). This role can either exist on the vCenter Server itself as an embedded PSC, or can be installed outside of the vCenter Server in a separate VM. The PSC can either be installed within Windows or as an appliance.
- The embedded database for vCenter has now been replaced with PostgreSQL which scales much larger than the previous SQL Express editions. For example when using Windows and the embedded PostgreSQL database VMware now supports up to 20 hosts and 200 VMs (much more if you use external DBs) and the vCenter Appliance using the embedded PostgreSQL DB now supports 1,000 hosts and 10,000 VMs. External Microsoft SQL Server support is not available with the vCenter Appliance, but you can still use an external Oracle DB if need be.
- Linked mode is now called “Enhanced” Link Mode and the information is replicated between PSC’s instead of vCenter Servers. This means that no special configuration needs to be done on the vCenter servers and as long as the PSC’s are in the same Single Sign On domain, the vCenters that use them will work together in linked mode. You can even mix appliance and Windows installs of vCenters in linked mode.
- Certificate management has had a huge overhaul too. The PSC now acts as the VMware root Certificate Authority and handles the certificate generation to hosts and VMware solutions. There are various ways this CA can be set up but in most cases setting up the CA as a subordinate in an existing Active Directory would be the best way forward instead of using the self-signed certificates of a default configuration.
- Multisite Content Library is a new feature that allows templates, ISO’s and scripts to be replicated between vCenters. As it is updated at one site it will automatically update the other site(s). This replication can be configured with bandwidth limits and set replication hours if required.
The Traditional vSphere clientVMware has made it clear since 5.1 days that the future client of choice for vSphere Management is the vSphere Web Client and that one day the traditional c# vSphere client wouldn’t exist. That day has not yet come, but more and more functionality is being added to the web client and not being made available to those using the traditional client. Thankfully VMware has added the ability to edit most of the properties of VMs upgraded to hardware version > 9. It’s just the new features that have been added since 5.1 that cannot be changed without going into the web client.
Virtual Volumes (V-VOLs)V-VOLs are a new way of storing and managing disks for virtual machines on storage arrays. At a high level it allows the backend operations of the storage provisioning and management to be done inside the VMware Web Client at a VM disk level as opposed to a datastore level. The end outcome of V-VOL implementation is that when you provision a virtual machine disk you specify what type of size and performance you need for the VM and an automation engine will create the volume on the array directly, in the best RAID Group or Pool for you. Hopefully in the near future V-VOL orchestration will also handle array replication and snapshot of V-VOLs so you won’t need to go into the array or replication devices to configure and manage these features. V-VOLs are in still in their early stages with only certain vendors and arrays currently supporting it. What they support and how they achieve it also varies between vendors. You can use this link to find the vendors and arrays that currently support V-VOLs – https://www.vmware.com/resources/compatibility/search.php?deviceCategory=vvols
Topology ChangesSo if this all sounds great and you want to upgrade ASAP there is one change you need to be made aware of before you dive in to the upgrade notes, prerequisites etc. VMware has changed their supported topologies and has deprecated support for a very common set up in version 6.0. If you run a single vCenter in your environment and have no foreseeable plans of increasing this (i.e. to include a DR site for example), then this doesn’t apply to you. But if you do currently (or plan to), then you most likely also run the SSO, Web Client, Inventory Service etc on your current vCenter servers and it probably looks something like this; Remember that the SSO, Inventory Service and Web Client will be combined into the PSC with version 6. So if you do a straightforward upgrade it will look like this; This makes the upgrade process a much more arduous task than previous upgrades and depending on the environment and the way its set up may involve a full reinstall of vCenter.
ConclusionHopefully this gives you a helpful quick rundown on the new features of vSphere 6 and helps prepare you for the challenges ahead with the upgrade path. If you would further information on any of these points please talk to your Perfekt account manager.
Let’s face it, this topic has been in the back of everyone’s thinking for quite some time, yet few organisations of scale can achieve it. Tape has been around since the 1950s when pioneered by IBM to be a low-cost offline, and portable storage medium. In the last 65 years it has seen significant transformation with the market fairly singularly centred on the LTO Ultrium cartridge format.
LTO-6 is the current generation offering roughly 5TB of compressed data per cartridge, with a roadmap that extends to LTO-7 in October 2015, and LTO-8 which will see this increase even further over coming years.
The reality is that, since I worked at Quantum between 2000-2007, there has been a dramatic change in the paradigm for tape usage. Because of its portability and sequential nature, tape became the reason for people to often dislike backup. Yet backup need not be so dull!
These days, backups are staged to disk first before being copied to tape. Smart backup solutions are able to electronically copy backup content from one second-tier disk system to another, usually in an alternate site, so that the need for making regular tape copies is significantly diminished.
In CommVault’s terminology this is called a DASH copy. DASH is a horrible acronym for Dedupe-Accelerated Streaming Hash, which is about as bad as all of those terrible acronym’s IBM made up in the 1980s for their products. Forget the acronym; DASH just means FAST, and that’s what it does through only transferring new and unique sub-blocks of data between the primary and secondary copy of backup content.
This technology means that you can copy backup (or archive) data in any of these scenarios:
- From Production to DR
- From one or more remote sites to head office/data centre
- From any site to a cloud data centre
- Or all of the above together in any combination
The upshot of this is if you are copying data between disk arrays at your sites then your reliance on tape is significantly diminished.
When DASH copy is implemented, Perfekt often find clients today purchase a 1, 2 or 4 drive tape library or autoloader and make just weekly, fortnightly, or monthly tape copies which are more for archival purposes rather than traditional restore.
Because of the licensing schemes available with the CommVault Capacity License Agreement and the new Solution Bundles, clients are no longer metered on the back-end capacity of backup data stored. You can retain a day, a month, a year or a decade on disk for no additional license charge. You just need:
- The disk space to retain it
- A sufficiently large dedupe database on your media agent server
What do you need to get DASH Copy Working?
There are a couple of “considerations”. A consideration is a problem if you don’t think it through. If you plan ahead, then you will not run into issues.
The first is how to make the initial copy of data. DASH copy is incredibly efficient at moving backup content between sites. However, there is no special magic. That first copy will take some time to move. How long depends on:
- The data volume
- The network link (and how much of it you can use for this)
- A whole bunch of other “overheads”.
The devil is in the detail, so at Perfekt we have devised a simple formula to help you work this out which provides an approximation of the duration, in days, for the initial copy:
|Duration Days||Data Volume GB||Available Link Speed Mb/sec||Constant|
The constant factors in compression, TCP overhead, as well the CV Index and dedupe hash size. The following is a summary of the estimated numbers used for these factors:
- An estimated -15% allowance for the benefits of compression is given
- A +30% overhead for TCP/IP on the link speed
- +5% for the CommVault Index of the Data
- The Dedupe database creates a hash of each 128K block, which is 4K in size (+3%)
- Finally a unit conversion is made to account for data in GB and link speed in Mbps to output a duration in days
As an example a site with 500GB of data on a link with 10Mbps available would take at least 5.7 days to complete the initial copy process.
As an alternative to transferring the initial backups over the WAN, it is possible to seed the data using a portable USB-attached hard drive. In this approach, this hard drive transports the initial data set manually before establishing the regular (eg daily) DASH copy process.
Such a process however has considerable time and effort spent in handling and shipping of the drives, and as a result Perfekt would suggest to consider USB seeding if the WAN transfer time exceeds 14 days.
Of course, once the seeding is complete, since users do not rewrite entire reports, databases, presentations or spreadsheets every day. What is captured is just the sub-block changes, and these are efficiently replicated after the backup to the alternate site.
You can use the same formula as above, but take the daily sub-block change rate of between 2% and 5% of the data volume to determine the nightly DASH copy duration.
Taking our example of 500GB of data in a site with a link with 10Mbps, we could say that this has 2% or 5% of daily change. Pop that into the formula and you will see that the DASH copy duration on the same 10Mbps link is:
- 2%: 2hrs and 45 mins
- 5%: 6 hrs and 51 mins
These are certainly achievable in an overnight window.
We recommend that a minimum link speed of 10Mbps is used to support DASH copy. This ensures that it can make that first copy in sufficient time, but is also fast enough to handle the nightly copy should there be a rare occasion where something dramatic causes the change rate to be 10 or 15%. It may take a day or two to catch up. If the link was too slow, it may fall behind for so long that there is an exposure in getting the data off site.
With ongoing data growth and general system changes it is important to monitor transfer times of the DASH copies to ensure that they are completing in a reasonable time period and not lagging behind. Perfekt suggests that this is done with Aux Copy Fall Behind Alerts in console progress reporting.
Also the DASH copy summary report should be reviewed each month to monitor the overall health of the copies. This will help identify sites where greater link speeds may be required in the near future.
What if you don’t have a second site? Look up in the sky!
Not a problem. There are oodles (the technical term meaning more than you could imagine) of cloud providers wanting to have you store your backup data with them. There are two ways of storing CommVault backup data in cloud storage (I hate using the word “the cloud” assuming there is only one. The reality there are so many offerings. They are all different. Their costs are not the same and a good number will be out of business in less than 5 years).
The first way is to DASH copy to a cloud provider. This is preferred. Using this approach you would stand up a virtual CommVault media agent server in the cloud and purchase some cloud storage. The media agent is doing some hefty work, so the only gotcha here is the compute costs of virtual servers if your chosen cloud provider charges this way. It is best to not use this type of model for backup unless you pilot the process, measure the IOPs and extrapolate this within the costing model of your cloud provider.
The second way is to move data directly to some type of cloud storage without DASH copy. The issue with this is that you usually pay cloud providers per GB per month, and any attempt to push large data volumes to a cloud service without the benefit of dedupe will be unaffordable after a few years of a lengthy backup retention strategy. [It is affordable if you only want 1-6 months of content but that is not the normal business data retention cycle for most organisations, especially if you are looking to remove tape altogether. Any longer than a few years and you will quickly work out that you can buy a small tape library with LTO-6 drives and have plenty of change compared to the cloud costings].
Removing Tape – What Disk is Needed?
In such a topology, tape provides two key functions:
- A point in time complete “archive” copy beyond the longest disk-based retention period
- A copy of data as a last chance of recovery if all else fails
Because deduplication means that you can quite effectively retain many years of data copies this negates the need for point 1. Addressing point 2 is a business decision, and many sites do not have this today.
Back on point 1, there are a few basic factors that will need to be determined in order to estimate the size of disk array to retain your online backup content:
- How large is the first full copy of data: typically we see about 20% reduction due to compression and some deduplication
- Retention: for how many years you will retain backup copies
- Number of backups: eg 5 days per week or 7 days per week, 52 weeks per year
- Daily rate of change: typically between 2% and 5%, depending on the workload
The disk space required can then be approximated using this formula:
|Disk Space Required (TB)||Protected Data Volume (TB)||Allowance for compression & some dedupe||Number of backup days retained||Rate of daily change, 2-5%|
So in a site with 10TB of data with: normal 20% savings on the first backup, backups occurring 5 days per week, 52 weeks per year, online retention of 10 years, 2% of daily change; the usable disk volume required is then 528TB. Utilising 4TB nearline SAS drives, this could be accomplished in a storage array with dense enclosures in a tidy 9 rack units of footprint!
Of course this is simplified, volumes will start out smaller and grow with increased retention, and understandably there will be primary data growth and fluctuations to usage patterns over the retention period. This provides an indication of likely data capacity required.
Aren’t Spreadsheets Wonderful!
To extrapolate running costs of the required backup storage, here is a quick comparison of the disk array outlined above for the second (remote) site copy of the data, retained for 10 years:
|On-premise/co-lo high density storage array, 528TB usable Purchased up front, 10 year vendor support, inclusive of running costs||Cloud storage based on:
Ingest Tier of $0.0259GB / month
Storage Tier $0.012GB / month
Incrementally growing over 10 years
Compute (to run Media Agent) $1.169/hr
|$334K ex GST||$393K ex GST
Does not include costs for retrievals, and retrievals will be “problematic” at best, only to be required if all else has failed.
So, not a great deal in it when you factor this over a 10 year period; but useful to benchmark the differences between the available options. Of course, this is to simply protect 10TB of data without taking into account its own growth due to new workloads etc. The operational note on retrieving data is important. The on-premise storage will be very simple for restoration, where the cloud-based storage will be very slow (“tape-like”) and only to be used in emergencies.
And if the numbers just don’t work, there is still tape
Full scale recoveries are rare and mostly restore jobs are for small data sets. Depending on data volumes, retention requirements and other business methods, we are finding today that tape is still a very low-cost way of creating archival data copies. Made once per month, for example, a single or dual-drive LTO-6 autoloader is all that is needed to push a retention copy to tape which is probably never needed, but gives surety and another process to show strong data governance.
Should any of this be within your thinking, then give the experts at Perfekt a call. We love to help with your backup strategies.
Traditional Feature Based Licensing (also known as a la carte)Back in the year 2000 when CommVault entered the Australian marketplace, it had a licensing scheme similar to many other backup products, where features purchased matched each environment’s components.
- Agents for specific operating systems, databases, applications. This included Windows, Unix, SQL, DB2, Lotus Notes, Novell, Exchange, Active Directory, and so on. For each one of these environments you needed an agent so that the data could be protected to allow for the best recovery
- In addition, you had to license the Media Agent (backup) server, or multiples in a multi-site environment
- You licensed the tape library and each drive
- You licensed the capacity (in TBs) that you wrote to in a backup-to-disk environment
- You also had to license other options and features from a very rich and comprehensive list
- Many times customers purchased these in special bundles to reduce the price
- The feature set grew with the addition of new functions. One key addition was that of deduplication (aka Advanced Disk Option). This meant that clients licensed the disk space that CommVault’s deduplication system wrote to, measured in TB. This was more expensive that the “Standard Disk” backup method, yet you could retain many more backups in significantly less space; thus improving data protection and importantly, speed of recovery.
- There were also a range of features of email and file server archiving, content indexing and more.
Capacity Based Licensing (CLA)
CLA Meter ExampleCapacity Based Licensing was introduced around 2010 with version 8, and this dramatically simplified the way in which licenses were consumed. Instead of being tied to a specific feature set, sites were licensed by the number of TBs that they protected at the “front end”. This is equivalent to measuring the size of a single full backup of all important data and basing the licensing on that volume. The CLA scheme became very popular because it meant any change to the environment, eg moving from Novell to Microsoft, didn’t mean that you had to purchase new features. Importantly the CLA scheme allowed as many “back end TBs” of data to be retained, without regard for retention period or multi-site copies. Organisations were then able to create DR copies of their backups for no extra licensing and significantly if not completely reduce the need for tape in their data protection scheme. The CLA “front end” TBs were measured in a few ways:
- Data Protection Enterprise DPE – all you can eat in the way of backup, included all features
- Data Protection Advanced DPA (used to be called ADM) – suited most virtualised environments except very high, end
- Data Protection Foundation DPF (for server-level backups only, without application agents), and very suitable for physical server data protection
“Solution Bundle” LicensingA new feature set launched in late 2014 meant that certain CommVault features are now very affordable. One popular example is Hypervisor based backups, which are becoming an industry standard, and in acknowledgement CommVault have released simplified licensing at the price point of its much less mature competitors. Available as standalone or in addition to a CLA, the Cloud Simpana “cSim” licensing can be purchased in packs of 10 VMs or by hypervisor processor socket (similar to VMware licensing). When purchased by processor socket it allows for unlimited VMs to be protected on the licensed ESX or Hyper-V hosts. This is ideal for many organisations as it makes it easy to accommodate for growth: add another ESX or Hyper-V host? Don’t forget to get backup licensing for it! VMs protected under cSIM licensing do not consume the CLA TB-based licensing, and cSIM licensing also provides dedupe functionality, tape support, media agents, DR copies, and so on. This is great value for new and existing customers alike. New customers receive all basic licensing required to run a Simpana environment with dedupe, tape and VM backup functionality, while existing CLA customers free up significant amounts of backup license utilisation allowing for growth in application level backups. CommVault have a range of offerings in this new solution bundle category that work as an adjunct to CLA licensing (it is important to note these do not intermix with traditional feature licensing and they do require version 10 of CommVault). The areas covered by the solution bundles include: Virtual Machines
- By socket or by 10-pack of VMs (as described above)
- Intellisnap (hardware snapshot integration) and end user self-restore add-on
- VM cloud management: provision VMs locally or in the cloud, for example for spin-up of VM for site recovery/testing, add-on
- VM lifecycle management, whole of VM archiving for dormant VMs, for example, add-on
- Basic backup and recovery, per device up to 2TB ea
- File Sharing eg corporate drop-box replacement with “Edge Drive”
- Endpoint compliance search add-on
- Bundle of the above
- Entry (7TB), mid-range (14TB) and enterprise (21TB) bundles
- Email archive and content indexing, per mailbox
- Compliance archive add-on
- Bundle of the above
- Stay with feature licensing or convert to CLA?
- Straight CLA or supplement with the new solution bundles?