Faking vCenter alarms

Larger environments tend to integrate their monitoring and ticketing systems. Some also add automated workflows based on alarms. The problem when setting up these workflows is how do you test the workflow is triggered based on specific alarms?

With thanks to William Lam for the tips, it’s possible to trigger specific events that make up alarms. It’s not a simple task (for someone who’s not a developer), but you can do some trial and error to work out how to trigger each event.

You need to find the event you want to trigger from the API at http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.wssdk.apiref.doc/index-do_types.html. Pretty much anything ending in ‘Event’.

A list of events can be viewed with PowerCLI by connecting to vCenter:
$evt =get-view eventManager
$evt.Description.eventinfo

Lets choose HostConnectionLostEvent. The description says “This event records the loss of a host connection”.

The PowerCLI script below will trigger the HostConnectionLostEvent event, and in turn triggers an alarm.

# Usual load modules and connections
import-module VMware.VimAutomation.Core
connect-viserver 192.168.10.99
$server = $global:DefaultVIServer
$entity = Get-View (Get-VMHost -Name 192.168.10.240)
$eventMgr = Get-View $server.ExtensionData.Content.EventManager
# Subsitute your event to trigger
$event = New-Object VMware.Vim.HostConnectionLostEvent
# This property is specific to this event, and is required. This domain and user doesn’t have to be an existing account.
$event.UserName = “CAGE.LOCAL\Chaos Monkey”
$hostEventArg = New-Object VMware.Vim.HostEventArgument
$hostEventArg.Host = $entity.MoRef
$hostEventArg.Name = “Host-is-a-label”
$event.Host = $HostEventArg
# Sends the event to vCenter
$eventMgr.PostEvent($event,$null)

When trying different events, the main line is:

$event = New-Object VMware.Vim.HostConnectionLostEvent

If you view $event now, you’ll see it’s properties:
Key                  : 0
ChainId              : 0
CreatedTime          : 1/01/0001 12:00:00 AM
UserName             : 
Datacenter           : 
ComputeResource      : 
Host                 : 
Vm                   : 
Ds                   : 
Net                  : 
Dvs                  : 
FullFormattedMessage : 
ChangeTag            : 

You’ll see other events have different properties. Some of these properties are required. It’s a bit hit and miss.
Lets try using UplinkPortVlanUntrunkedEvent.
Substitute this line – $event = New-Object VMware.Vim.UplinkPortVlanUntrunkedEvent
Initially you’ll get an error:
Exception calling “PostEvent” with “2” argument(s): “
Required property switchUuid is missing from data object of type UplinkPortVlanUntrunkedEvent
while parsing serialized DataObject of type vim.event.UplinkPortVlanUntrunkedEvent
So we are missing a required input of switchUuid. View the properties of $event:
SwitchUuid           : 
HealthResult         : 
Key                  : 0
ChainId              : 0
CreatedTime          : 1/01/0001 12:00:00 AM
UserName             : CAGE.LOCAL\Chaos Monkey
Datacenter           : 
ComputeResource      : 
Host                 : VMware.Vim.HostEventArgument
Vm                   : 
Ds                   : 
Net                  : 
Dvs                  : 
FullFormattedMessage : 
ChangeTag            : 
So add in a value for SwitchUuid:
$event.SwitchUuid = “Fake UUID”
and execute the remaining lines, and it triggers the alarm.
From there it was trial and error to to see which properties were mandatory for each event.
This was a quick hack to demonstrate the ability to trigger alarms for a customer. 
It also has other use cases:
Feeling lonely and on-call? Trigger an alarm and hopefully you’ll receive a call.
Convince a co-worker they are causing errors by using their username.

Have fun.

ESXi kickstart with Python

I recently spent some time building kickstart files. Something I hadn’t done much since 2008. The most noticeable change in the process is I’m older and fatter. I re/learnt a few things, and did my first ever python script.

First up, William Lam has plenty of articles on the topic, and the official vSphere installation doco is pretty good too.
While testing, it’s much easier to link to a kickstart file or scripts from a webserver than directly embedded on the boot CD. For a lightweight webserver on Windows, I’ve used mongoose-free-6.4 available at https://www.cesanta.com/products/binary.
If you need to customise the installation process further, you can use Ash shell commands, or python 2.7 by using the switch –interpreter=busybox | python.

To see which python modules are available, view /lib/python2.7 on your ESXi host.

Mixing busybox and python.

I found it easier to start off with busybox, then call the python script.

%firstboot –interpreter=busybox

# For troubleshooting only
vim-cmd hostsvc/enable_esx_shell
vim-cmd hostsvc/start_esx_shell

wget -O myscript.py http://webserver/myscript.py

chmod u+x myscript.py
/bin/python myscript.py

When testing the whole kickstart file, put sleep 600; in %pre, %post or %firstboot sections, and jump to the console (Alt-F1) to test commands. You can also watch detailed messages using Alt-F12.
Keep in mind you may not be in a full blown ESXi environment, depending on if you’re in pre/post/firstboot.
%pre – Runs BEFORE the installation
%post – Runs AFTER installation. No hostd services are running. There is NO root password set and commands like esxcli won’t be available. You’ll need to use localcli instead.

%firstboot script is running now.

%firstboot – Runs at the very end of the first boot after installation. All services should be available. It’s the last thing that runs before the console shows the ESXi host name, IP. Look for “Running 001.firstboot_001”.

Deploy multiple ESXi hosts using kickstart and embedded CSV.

In a home lab or remote environment there may not be DHCP/PXE services, or justification for Auto Deploy. For that use case I created a boot CD that would install ESXi and set the hostname and IP based on matching the host’s MAC address with a CSV file included on the CDROM. Perfect for where there is no DHCP. If you have DHCP and a webserver available, you can host the CSV there so it can be easily updated.

First, customise an ESXi boot CD from William Lam.

Create the CSV file (HOSTS.CSV) in the format Hostname,MAC,IP,Subnet Mask,Gateway

HostA,00:0c:29:aa:bb:cc,192.168.0.10,255.255.255.0,192.168.0.1
HostB,00:0c:29:aa:bb:cd,192.168.0.11,255.255.255.0,192.168.0.1

Create the python script (SETNET.PY) to check the CSV for the matching MAC address:

#!/usr/bin/python
import os, commands, csv, subprocess

MAC=subprocess.check_output(“esxcli network ip interface list |grep MAC”,shell=True)

MACADDR = MAC.split()

with open(‘/vmfs/volumes/datastore1/HOSTS.CSV’, ‘rb’) as f:
        reader = csv.reader(f)
        for row in reader:
                if MACADDR[2] == row[1]:
                        os.system(“esxcli system hostname set –fqdn=” + row[0])
                        os.system(“esxcli network ip interface ipv4 set –interface-name=vmk0 –ipv4=” + row[2] + ” –netmask=” + row[3] + ” –type=static”)
print “End of loop”


[This was my first time to use python, so i chose a range of functions that worked at the time. Be wary of python’s strict indentation requirements.]
Copy HOSTS.CSV and SETNET.PY to the custom ESXi boot CD just before the mkisofs step in Williams post.
The hard part was getting the files from the CD, to a location available during firstboot. If you’re installing to an ESXi disk bigger than 6GB, a VMFS datastore will be created and we can use that for persistent storage.

Include the following in your ks.cfg:

%post –interpreter=busybox

cp /vmfs/volumes/CDROM/BUILD/HOSTS.CSV /vmfs/volumes/datastore1
cp /vmfs/volumes/CDROM/BUILD/SETNET.PY /vmfs/volumes/datastore1

%firstboot –interpreter=busybox

sleep 30;

chmod u+x /vmfs/volumes/datastore1/SETNET.PY
/bin/python /vmfs/volumes/datastore1/SETNET.PY

This gets the ESXi host configured with a hostname and IP where you can join it to vCenter and do the remaining configuration or apply Host Profiles.

2016 Australian / New Zealand vExperts

The list of 2016 vExperts was officially announced by VMware. This year sees a total of 1374 people awarded the title of vExpert. Australia / New Zealand represent 4.2% of the total vExpert numbers.

This year the vExpert nomination form asked for your country, so lets hope VMware publish that info. So for now, this is a list of names we recognise as from Australia or New Zealand.

This year sees the number of locals grow to 58, up from 48 last year. There was quite a few that made it onto the 2015 second half year announcement.

Melbourne still leads, up 3 from last year, even though 3 people weren’t renewed from last year, which means there were 5 people added from last year.

Canberra added their first during the mid year announcements last year, and they’ve added another since.

Sydney added 3 to their total from 2015, now up to 14.

Contrary to my previous comment about Brisbane, they snuck in 2 more this year.

Adelaide IS on the map. Amin Naserpourven (VCDX 188). He actually made the list last year.

The eastern island of Australia, also known as New Zealand, quietly gained 1, now up to 9.

There’s been a lot of noise from the guys in Perth, but that’s about it. No change from last year.

It’s great to see the community growing each year, and hope to catch up at vForum.

I’ve picked up a few errors in the previous vExpert blogs, so I might have to go back and update those ones. Stay tuned.

If you see anyone I’ve missed, let me know.

Name Last Twitter Region
David Barclay @davidbarclay99 Brisbane Adelaide 1
Nick Bowie @nickbowienz New Zealand Brisbane 7
Steven Bridle @virtuallyeuc Canberra Canberra 2
Luke Brown @luke_br Perth Melbourne 22
Andrew Brydon @sidbrydon Melbourne New Zealand 9
Andre Carpenter @andrecarpenter Melbourne Perth 3
Luis Concistre @luisconcistre Sydney Sydney 14
Alastair Cooke @DemitasseNZ New Zealand Grand Total 58
Andrew Dauncey @daunce_ Melbourne
Donovan Durand @donovanjd Melbourne
Mark Elliott @eggme1 New Zealand
Frank Fan @frankfan7 Melbourne
Andrew Firth Sydney
Michael Francis Brisbane
Dan Frith @penguinpunk Brisbane
RAMESH GEDDAM Brisbane
Kevin Gorman @Kev_McCloud Melbourne
Matthew Healy @matt232h Melbourne
Boris Jelic @Boris_jelic Melbourne
Chris Jones @cpjones44 Melbourne
Steven Kang @ssbkang New Zealand
Pravesh Khanna @pravesh2012 Melbourne
Askar Kopbayev @Akopbayev Sydney
Sanit Kumar @sanitkumar New Zealand
Willy LEE Sydney
David Lloyd @davlloyd Sydney
David Manconi @dmanconi New Zealand
Will Mansfield @aussiewjm Sydney
Ryan McBride @RyanMcBride81 Sydney
Greg Mulholland @g_mulholland Melbourne
Niraj Naidu @mr_champy Sydney
Amin Naserpour @AminNaserpour Adelaide
Scott Norris @auscottnorris Canberra
Jeff O’Connor @JeffOConnorAU Sydney
Grant Orchard @grantorchard Sydney
Aaron Parker @stealthpuppy Melbourne
Clinton Prentice New Zealand
David Quinney @quinney_david Sydney
Jahnin Rajamoni @jahnin Sydney
Simon Sharwood @ssharwood Sydney
Keiran Shelden @Keiran_Shelden Brisbane
Manny Sidhu @MannySidhu2 Melbourne
Brett Sinclair @Pragmatic_IO Melbourne
Anthony Spiteri @anthonyspiteri Perth
Arron Stebbing @ArronStebbing Melbourne
Tas Tareq @justonetaz Brisbane
Tyson Then Melbourne
Mark Ukotic @oringinaluko Melbourne
Rob Waite @rob_waite_oz Melbourne
Jon Waite @jondwaite New Zealand
Justin Warren @jpwarren Melbourne
Craig Waters @cswaters1 Melbourne
Michael Webster @vcdxnz001 New Zealand
Nathan Wheat @wheatcloud Melbourne
Shane White @ausvmguy Melbourne
Tim Williams @ymmit85 Perth
Jeff Wong @jumpyjw Sydney
Travis Wood @vTravWood Brisbane

A summary of the changes between 2015 & 2016:

IN
Steven Bridle @virtuallyeuc Canberra
Kevin Gorman @Kev_McCloud Melbourne
Boris Jelic @Boris_jelic Melbourne
Chris Jones @cpjones44 Melbourne
Pravesh Khanna @pravesh2012 Melbourne
Askar Kopbayev @Akopbayev Sydney
Will Mansfield @aussiewjm Sydney
Tyson Then Melbourne
Travis Wood @vTravWood Brisbane
Clinton Prentice New Zealand
Willy LEE Sydney
OUT
Harsha Hosur @harsha_hour Melbourne
Josh Odgers @josh_odgers Melbourne
Shanon Olsson @sfolsson Melbourne

Combining PowerCLI & ESXCLI to change PSP on a Large Scale

Using PowerCLI you can use Set-ScsiLun -MultipathPolicy “RoundRobin” to set the PSP, but I found it quite slow using it on a large scale. It would update one datastore on one host every 5 seconds. If there were 10 ESXi hosts with 200 Datastores, that’s 2000 operations, at 5 seconds each, it adds up to 3 hours. The same can be done using ESXCLI extremely quickly, but it’s run on each host. If only there was a way to combine PowerCLI and ESXCLI.

There is… Get-EsxCli

A quick and dirty script that combines PowerCLI & ESXCLI to change the PSP on multiple hosts & datastores.

Use at your own risk. This will change the PSP to RoundRobin for any non local device.

Get existing PSP to see how many you need to change:

1:  $AllESXHosts = Get-Cluster CLUSTERNAME | Get-VMHost | Where { ($_.ConnectionState -eq "Connected") -or ($_.ConnectionState -eq "Maintenance")} | Sort Name   
2: ForEach ($esxhost in $AllESXHosts) {
3: Get-VMhost $esxhost | Get-ScsiLun -LunType disk | Where { $_.MultipathPolicy -notlike "RoundRobin" } |Where {$_.IsLocal -notlike "True"} | Select CanonicalName,MultipathPolicy,IsLocal |ft -autosize
4: }

Set PSP:

1:  $AllESXHosts = Get-Cluster CLUSTERNAME |Get-VMHost | Where { ($_.ConnectionState -eq "Connected") -or ($_.ConnectionState -eq "Maintenance")} | Sort Name   
2: ForEach ($esxhost in $AllESXHosts) {
3: $esxcli = Get-EsxCli -VMHost $esxhost
4: $targetdevice = Get-VMhost $esxhost | Get-ScsiLun -LunType disk | Where { $_.MultipathPolicy -notlike "RoundRobin" } |Where {$_.IsLocal -notlike "True"}
5: ForEach ($device in $targetdevice) {
6: Write-Host "Updating $esxhost.name $device.CanonicalName"
7: $esxcli.storage.nmp.device.set($null,$device,"VMW_PSP_RR")
8: }
9: }

 For a more cautious approach, try one host at a time:

1: $esxcli = Get-EsxCli -VMHost $esxhost
2: Get-VMhost $esxhost | Get-ScsiLun -LunType disk | Where { $_.MultipathPolicy -notlike "RoundRobin" } |Where {$_.IsLocal -notlike "True"} | $esxcli.storage.nmp.device.set($null,$device,"VMW_PSP_RR")  

Run the script to get PSP again to confirm it worked.

The interesting thing is there’s no logs of the changes happening in vCenter when using esxcli as opposed to Set-ScsiLun,and ssh is NOT enabled, so it’s still using your vCenter credentials.

So you don’t have to go through this process again, remember to set the default PSP on each host, or use Host Profiles!!:
esxcli storage nmp satp set --default-psp=policy --satp=your_SATP_name 
(See http://kb.vmware.com/kb/1017760)

Disclaimer: I’m pretty terrible with PowerCLI, so this must look pretty ugly. I’m sure there’s better ways to do it, but it gets the job done.

2015 Australian / New Zealand vExperts

VMware announced the list of 2015 vExperts during the week. Congratulations to all that made it. The list now contains over 1,000 vExperts.

For something perhaps a bit more relevant in the A/NZ region, I’ve broken it down for us.
The numbers haven’t changed that much due to the 3 selection rounds of vExperts in 2014.

For the A/NZ region, the numbers increased by 3, with 4 people not renewed as vExperts (most likely because they didn’t submit an entry). Melbourne still holds as a clear winner with the most active VMware community, but I’m sure the other regions will gain momentum this year.

New Zealand has the most potential, with lots of talented guys there, so hopefully with the launch of the Auckland VMUG, they’ll see an increase in activity.

And Perth.. We’ll give them the encouragement award 2014. C’mon guys, you can do it. 

Melbourne: 17
Sydney: 6
New Zealand: 5
Brisbane: 4
Perth: 3
If there’s anyone I’ve missed, let me know.

If you’re interested in seeing what some of the people in the list above do, check out aussievmafia.com for links to each of their blogs.

Office Arguments – Maximum VMDK size is NOT 2TB-512bytes

…if you want to use snapshots

Pop Quiz
Q: What’s the maximum size VMDK you should create in vSphere 5.1 or earlier?

A: Most people that have studied for VCP will know the maximum VMDK size is 2TB minus 512 bytes. If you create a disk in the GUI, it allows you to choose 2TB, but it’s smart enough to minus 512 bytes.

So technically that’s the maximum VMDK size, but you should NOT create it that big.

Why?

If you plan to take snapshots, there’s additional overhead you need to take into account. For a 2TB VMDK, there’s up to 16GB of overhead for the snapshot. So in reality, the VMDK needs to be 2032GB to allow for that overhead.
If you have created a 2TB VMDK and attempt to take snapshots, you may get the error:  
File is larger than maximum file size supported (1012384)
or:
File is larger than the maximum size supported by the datastore ‘.

As mentioned in KB 1012384, depending on the block size of the VMFS volume:

Maximum VMDK size Maximum Overhead Maximum size less overhead
256GB – 512B ~ 2GB 254GB
512GB – 512B ~ 4GB 508GB
1TB – 512B ~ 8GB 1016GB
2TB – 512B ~ 16GB 2032GB

If you plan to take snapshots, the maximum VMDK size is 2032GB. Or better still, embrace Decimal, and round it off to 2000GB.

The younger generation will know a TB as 1000GB (http://en.wikipedia.org/wiki/Terabyte). Older folks will still be used to a TB being 1024GB, now known as a Tebibyte TiB (http://en.wikipedia.org/wiki/Tebibyte). Hey VMware, please update all your doco from TB to TiB 😉

Maximum Disks Per SCSI Controller is NOT 15

 Pop Quiz

Q: What’s the maximum number of disks per SCSI Controller?

A: It depends.. On your VCP exam, you would have said 15. Correct.

Although if you want to clone or snapshot and quiesce a VM, the maximum is 7 disks per SCSI controller.

Each SCSI Controller can control 15 disks and the quiesced snapshots in Windows 2008 require one available slot per existing disk.

If you have more than 7 disks, the clone / quiesce part will fail, and you’ll have the following errors in vCenter and the VM’s vmware.log:

An error occurred while quiescing the virtual machine. See the virtual machine’s event log for details.

2014-12-22T00:19:16.188Z| vmx| ToolsBackup: not enough empty nodes (needed 8, found 7)
2014-12-22T00:19:16.188Z| vmx| ToolsBackup: changing quiesce state: IDLE -> DONE
2014-12-22T00:19:16.188Z| vmx| SnapshotVMXTakeSnapshotComplete done with snapshot ‘clone-temp-1419207556192169’: 0
2014-12-22T00:19:16.188Z| vmx| SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine. (40)

It’s explained in http://kb.vmware.com/kb/2015181.

Just one more way snapshots can bring you pain.

It’s listed as only affecting Windows 2008, but I’m not sure why other versions are exempt. At this time there’s no mention of this affecting vSphere 5.5.

Updated 23/12/2014:  This affects Windows 2008 and above (ie: Win 2012), and is still an issue in vSphere 5.5. KB article will be updated.

For the lazy, see here:

Creating a quiesced snapshot of a Windows 2008 virtual machine fails with the error: Snapshot 0 failed: Failed to quiesce the virtual machine.

Creating a quiesced snapshot of a Windows 2008 virtual machine fails with the error: Snapshot 0 failed: Failed to quiesce the virtual machine. (2015181)

Symptoms

  • Cannot create a quiesced snapshot of a Windows 2008 virtual machine.
  • Creating a quiesced snapshot of a Windows 2008 virtual machine fails.
  • In the vmware.log file of the affected virtual machine, you see entries like this:

    XXXX-03-08T04:10:09.790Z| vmx| SnapshotVMXTakeSnapshotComplete done with snapshot 'test4': 0
    XXXX-03-08T04:10:09.790Z| vmx| SnapshotVMXTakeSnapshotComplete: Snapshot 0 failed: Failed to quiesce the virtual machine. (40).
    XXXX-07-01T15:30:43.244Z| vmx| ToolsBackup: not enough empty nodes (needed 9, found 6)

    where the values provided in the errors can vary.

Cause

Each SCSI Controller can control 15 disks and the quiesced snapshots in Windows 2008 require one available slot per existing disk.
This issue occurs if the virtual machine has more than seven disks attached to a single controller.

Resolution

To resolve this issue:

  1. Create a new thin virtual disk. This allows you to add a new SCSI Controller.
  2. Ensure the new thin virtual disk is attached to SCSI1:0 node.
  3. Retry creating the quiesced snapshot. You are now able to create the snapshot successfully.

Alternatively, for virtual machines with more than 7 disks on a single SCSI controller, instead of creating a new thin disk and adding this to a new SCSI controller (SCSI1:0):

  1. Shut down the virtual machine.
  2. Spread out the existing disks between multiple SCSI controllers.

    To do this:

    1. Right-click the virtual machine and click Edit Settings.
    2. Change the Virtual Device Node to your new desired SCSI controller.
  3. Power on the virtual machine.

    Note: Use this method if there is limited storage space or limited authority to create new disks.