UTF-8 issues in MSN Messenger

Please note that the following was written some time ago, when the described flaw was only freshly fixed. I found the document recently, and have cleaned up the text in minor places; other than that it is as written in early 2004

MSN Messenger inconsistent UTF-8 handling

(Or, How I Learned To Start Worrying and Hate UTF-8)

Introduction

UTF-8 is a godsend. It allows encoding of foreign (including non-Latin) characters into a file that’s still compatible with US-ASCII for all English-only use. Editors that don’t understand it just show garbage where there’s a foreign character, and handle the “normal” A-Z as they should.

You just have to remember when you’re using it and when you’re not.

I’d like to point out that this write-up is about vulnerabilities eliminated almost a year ago, and is thusly entirely uncheckable now.

UTF-8 in Messenger

UTF-8 is used in most places in Messenger, which is great. What you have to remember is what you consider UTF-8 text and what you don’t, and to remember to treat UTF-8 text consistently. The problem with Messenger’s UTF-8 support was in its handing of Display Names. MSN 4.x (and probably 5.x) were never affected, since they didn’t seem to support UTF-8 display names (only messages). MSN 6 was another issue altogether…

MSN 6 supported UTF-8 display names. In a couple of functions, it even checked that the display name was valid. One of those functions was the callback for when a “RNG” is received. RNGs are sent by the server to the client as an invitation to a chat. If MSN6 thought the display name was empty, the “ring” was never answered, and so the client never joined the chat. With the advent of UTF-8, this left open a small loophole - the two-byte display name “300200” is a piece of invalid UTF-8 representing the “NUL” character that Microsoft’s software - against the UTF-8 specification - translates to “NUL”. Invalid UTF-8 is supposed to be rejected, and for good reason.

Server-side, when an empty display name was set, the server would not accept it. However, the server wasn’t using a UTF-8-understanding string comparison function, so when it saw “300200” it thought that the name was non-empty, and allowed it. This has now been partially fixed, resolving the problem this document describes. When MSN6 sees “300200” as the display name, it gets decoded from UTF-8 into standard ASCII and compared to "" (a blank string). It is considered a “match”, despite the fact that they are technically different. Thus, “300200” as a display name would stop MSN6 from answering a RNG.

Repercussions

The mis-handling of a small piece of UTF-8 text, fixed to one line long, seems unimportant. However, due to MSN6’s erroneous assertion that the name is empty - something that is supposed to be impossible, due to the assumption that the server would filter out empty names - there is suddenly a serious problem.

The problem is that the RNG is never answered. Any method to force a client to ignore a RNG is very dangerous, because it exposes a flaw in another part of the MSN spec - the CKI challenges. In order to join a chat, the client must supply two things - an email address, and a “CKI” key, which is an automatically generated “password” to allow entry into the switchboard session. The CKI is (currently) just the current time in unix-epoch-style format, and what appears to be a random unsigned 15-bit number. Because the time is easy to guess, the only real “security” to stop one user joining a room as another user is the random number.

15-bits cannot be brute-forced in the time it normally takes a client to respond to a RNG, however if the client can be stopped from joining, the invite lasts a lot longer than the usual 2-3 seconds. (I tested it up to around five minutes). All it would take to join a chat as another online user is to set the display name to “200300”, invite the user into a switchboard, then connect yourself and authenticate as that user, brute-forcing the random part of the CKI.

Solutions

Microsoft already solved enough of the problem so that this exact attack is no longer possible - the server now rejects UTF-8 that resolves to an empty string. However, more could still be done. First, the CKIs - which are considered opaque strings by clients - should be changed or extended so that it is not feasible to brute-force it even with days of use.

Anything else?

The idea of using invalid UTF-8 wasn’t mine; when looking at MSN security I found some old Windows Messenger-era clients that could be “invisible” in a groupchat by using invalid names. Although no longer functional, further experimentation with the idea brought this problem to light. I also found that at least some MSN versions would log out if they saw a user come online with an invalid UTF-8 string as their display name.

The level of simplicity with which one could masquerade as another user in this case brings to light trust issues with Microsoft’s implementation of instant messaging. The time that this exploit would have been available is also concerning

it could have affected users from the release of MSN 6.0 to its resolution over six months later. What else is possible, and how long will people be unknowingly at risk before that is fixed? Can you trust a homogeneous environment that you can’t see into?