MU-MIMO is not about speed, it's about capacity.
It originated in the WWAN community - where you have many more users per cell (sector in CDMA speak) - rule of thumb there is 200 users per cell (sector) - here MU-MIMO makes a tremendous amount of sense, as either they're FDD (Transmit/Receive are on different frequencies) or they're TDD (Tx/Rx on common channels) - in any event, the downlink from the Cell is scheduled, and MU-MIMO does allow different code masks to be used (over simplifying here a bit), and send data to multiple subscriber terminals in the same frame.
In WiFi space - there isn't as much benefit, first due to the nature of WiFi in that it's OFDM, not OFDMA, and it's not as tightly scheduled as one would have with a WWAN setup.
Some of the challenges with WiFi is that the chips themselves have to be relatively simple and small - silicon is a fixed price, so how do we allocate resources on the chip - we can either
a) put a lot of SU-MIMO streams - 3 streams are common enough in 802.11n, and perhaps 4-streams here soon - most of this is in the MAC/Baseband in any event - this is fairly easy actually, even thought we need to put discrete RF chains for each stream - but we can them blast lots of bits per frame...
b) TxBF - Now we have to put more computation on the MAC to determine which streams to put the bits on, and on the Baseband for the extra time delays needed to push the analog waveforms out to the RF chains...
c) Add MU-MIMO - now we have to do some resource sounding and scheduling of the frames at the MAC layer - even more computation...
Option A - this is the general trend - when you add B, you get a bit more gain, about 2-3dB is the best of cases, and C adds a whole world of hurts, as this is a lot more transistors on the die, and at a given geometry node, MU-MIMO is expensive, and for the typical SOHO/SMB WLAN, it's just not a net-positive, as a MU-MIMO frame can support more users, but at a lower data rate per user.
Not to say that Qualcomm, Broadcom, Marvell, Ralink, Mediatek won't do it, but it does add much complexity to what is a commodity chip - and complexity is cost, and what we're finding at the moment -
$249USD is about as far as customers are willing to pay for AP's at this point in time.
Just my thoughts...
sfx