2014-10-07

Originally posted on: http://geekswithblogs.net/akraus1/archive/2014/10/07/159558.aspx

Threading can sometimes lead to impossible bugs where you have a hard time to find out how you got into this state. Lets suppose you have a nice WPF application which you dutifully test via the UIAutomation provider which can alter some state in your application.

When you look at the call stacks all is normal:

So far so expected. Now lets make our well behaving application at little more unexpected and take a lock for some time. To make it simple I deadlock the main UI thread by trying to enter a lock which is owned by another thread like this

If you have never seen Barrier it is a class introduced with .NET 4.0 which is very handy to synchronize threads to a specific time point. Only when both threads have signaled then both threads are released at the same time. In our case this ensures a nice deadlock. Here is the quiz:

Can something bad happen while the main UI thread is deadlocked?

As it turns out it can. Lets have a look at this impossible call stack:

Hm the lock is still held but how can AlterData be called while we are trying to enter the lock which was clearly not released? We do see it even in the call stack (green line)? To understand this you need to take a deep look back in history about STA threads and window message pumping. Monitor.Enter is not a replacement for WaitForSingleObject on a STA thread. Instead it calls into CoWaitForMultipleHandles which ultimately calls MsgWaitForMultipleObjectsEx. Here is the (simplified) call stack:

MsgWaitForMultipleObjectsEx is just like WaitForMultipleObjects with the important exception that the wait can also return when specific window messages are put into the window message queue. The official docs are silent about which window messages are pumped. I have tried

Keyboard press

Mouse move

Mouse button

WM_PAINT

BeginInvoke

Invoke

Dispatcher.Invoke



NONE of these window messages are pumped while holding a lock. Chris Brumme says about this in his blog

On lower operating systems, the CLR uses MsgWaitForMultipleHandles / PeekMessage / MsgWaitForMultipleHandlesEx and whatever else is available on that version of the operating system to try to mirror the behavior of CoWaitForMultipleHandles. The net effect is that we will always pump COM calls waiting to get into your STA. And any SendMessages to any windows will be serviced. But most PostMessages will be delayed until you have finished blocking.

Although the blog post is over 10 years old the information is still relevant. What I have seen so far all Window messages will be blocked except for COM calls as it is the case for UIAutomation calls which use COM as transport vehicle to marshal your UI automation calls across processes. If you have a custom UI Automation provider you should check twice if you are calling into state altering methods of your UI objects which can be reentrant called while your application was trying get access to a lock. While you were waiting for the lock your window could already be disposed! Bad things can and will happen. UIAutomation is not only used for integration testing but also by screen readers and tools for people who have problems reading a computer screen. It is therefore possible that these issues not only show up during tests but also on customer machines. If you get mysterious crashes at places where you would not have expected them check the call stack  for AutomationInteropProvider calls. If there are some you most likely have hit a problem due to unexpected "continuations" while taking a lock. Unfortunately there is not really a good solution known to me except to pay vigilant attention to your UI Automation provider objects which must ensure not to call any methods which could alter the state of objects. If you have pending Begin/Invoke calls pending in your message queue these messages will still not get pumped. Only the direct COM calls and their side effects can tunnel through blocking calls. Perhaps I am telling you things you already knew but this one was quite surprising. Perhaps it would be the best to stop pumping COM messages at all so your calls nicely line up with the other window messages as it should be. But I guess there is still too much legacy code out there which would be deadlocked if no message pumping occurs anymore. The current Windows 10 Build still pumps messages as always. Perhaps we could get a flag to switch over to the "new" behavior to prevent such nasty problems.

Show more