While using Splunk I noticed 1 ESX host had a huge amount of logs compared with the others in that cluster. Looking into it, every hour there were about 23,000 entries for:
I logged an SR for it. But magically the next day, the error stopped appearing in the logs.
The response from VMware Support:
Here is What I could find regarding this error.
A bug in storageRM sets the magic number to zero .Since this magic number is only used for internal data structure bookkeeping so if the reset is due to this bug then there is no affect on storageRM functions and the data is safe . If this error reoccurs frequently even after the reset then there could be a disk corruption.
Thank you for your response.We do have a workaround to stop these messages to occur – However this is not a tested solution.We do not have much information on this at this time,as we have seen this occur very few times and stopping randomly(like in your case) – We have not been able to collect much information and hence we do not have tested solution on this as yet.we do have a bug report filed for ths however.
The workaround outline is as follows –
To reset the magic number and stop the error from coming please following the instruction:
1. Increase storageRM log level on host where failure is seen —
vsish -e set /config/Misc/intOpts/SIOControlLoglevel 5
2. Wait for the error to show up
3. Disable storage I/O control on the datastore which exhibits this problem.
4. Stop storage I/O control on all hosts sharing the datastore
5. Run command /sbin/storageRM -R
* “-R” is a troubleshooting option hence it is not listed in the men page
* Example: “/sbin/storageRM -R /vmfs/volumes/FDLD_VMTEST0508.”
6. Start storage I/O control on all hosts
7. Enable storage I/O control on datastore.
The purpose of posting this workaround is for information only. As it says, “this is not a tested solution”. If you see this in your logs, log an SR with VMware.