Posts Tagged ‘Citrix


System hung? or dead?

You gotta be kidding me, another dead server?

Well past 4 wks or more, several servers were locking up in the middle of the day or morning (lucky not at the night as I am on-call for some days), especially when there are several users on the server. Sadly, by the time issue were reported by help desk to me, i have no remote access to the server.
Using ilo doesn’t work as I do not see login screen also remote procedures (such as event logs) just doesn’t responds. (You know what do to when you’re in this stage.. boot the server and get complaints from users+managers, in other word SHIT happens)

Complete LOCK up will be much better or even BSOD will give me less grief. BUT no, as RPC never times out nor IMASRV.EXE never dies but somehow responds by itself, Data collector redirects users to dead server even worse, user gets disconnected but can not be redirect to other “working” servers.

Yeah, 2nd blow to me. (again, I curse at the air and apologies to users) until the problem server is booted, or IMA finally realise (after 1-5mins) server is truly DEAD, user is STUCK at middle of no where.

FIrst thing first, what do you do? several options (not reboot)
1 read the event log (yes general rule of thumb but not in this case, event log was long dead by the time issue is discovered)
2 gather perfdata (well, using RM, I have started to gather more metrics such as memory, thread, etc etc)
again, by the time issue is reported, perfmon too is dead and data seems all green on the RM graph, yay.. no perfmon..)
3 login! (went to data centre to see if I can log on physically, but no.. Login screen doesnt come up)
4 telnet ? (screw security, I have enabled telnet service on the all the server and tried to login, but no it failed too)
5 RDP, ICA, VNC, Radmin, Ilo (list continues) no REMOTE tool works, regardless.

By the time I’ve reach step 5, more than 3 wks has passes and I have around 10 reports(incidents) and equally same number of issues were reported but never logged (call me stupid but never tracked them as I thought problem can be one off)

6, login to ALL the servers and wait for problem to hit.
Yes this worked and I have FINALLY observed server dying in front of me eyes.
Task Manager just stop responding, Process Explorer was hung when I flip the screen to problem server, explorer was frozen too, no new process could be started. BUT funny enough GUI was not frozen and I actually didnt get kicked out(disconnected)

Problem seems to be under LOW level somewhere I got no idea to.

Also, I had force system crash tool ready to kick in order to gather crash dump.

Well, you may have guessed by now, this tool didnt work either. It failed to crash and simply complain that files were not found.

7 last minute jump… call MS.

I didnt know what MS was going to say, but they gave me a light of how to crash server using NMI.
By then I was jumping in joy (ppl in office were looking at me like some weirdo.)

Finally! I can CRASH the system!!!

well rest if history, I’ve obtained debug analysis from MS that registry lock may be the cause and apply the patch from MS KB 935926

How annoying….


July 2019
« Jan    

Greyeye Tweets