Recently, here at Reverge we have been adding DirectX 10/11 support to some of our internal libraries. In the process we discovered a couple of problems in the new API design.
In DirectX 9 there were three types of "surfaces":
With this system, pure textures can be used by the CPU to efficiently write data to video memory for the GPU to read and render targets can be used to efficiently read data that was written by the GPU.
Textures that are also render targets can not be accessed by the CPU at all, but that's fine because if the CPU needs to read the result of a GPU pass, that pass can render into a render target (that is not a texture) which the CPU can read.
In DirectX 10/11 render targets that are not textures are no longer supported, so while the CPU can efficiently write data for the GPU to read, there is no efficient way for the application to read data written by the GPU. Instead, it is required to copy the data from a render target texture into another (non-render-target) texture which can then be mapped by the CPU for reading
In the good old DirectX 9 days, if the application needed to write data into a texture, it called the LockRect function to get a pointer. The data was written in the memory pointed by the pointer, and then UnlockRect was called.
The nice thing about this API is that it does not specify what kind of memory the returned pointer points. Depending on texture creation flags, the requested access (read/write/write-discard), the current state of the Direct3D device, and the capabilities of the graphics card, the driver might choose to map the texture video memory directly, or it might map a separate temporary video memory buffer, or it might even return a pointer to a system memory buffer which is copied to video memory when UnlockRect is called.
Instead, in DirectX 10 and 11 there seems to be only one choice: video memory is mapped directly if possible; if not, we get an error.
The new semantics are in fact in accord with a major overall architectural shift introduced with DirectX 10. It calls for removing the so-called "API magic": undocumented internal behavior designed to hide inefficiencies and/or lack of capabilities in the hardware. The rationale for avoiding API magic is that it makes application performance inconsistent across different hardware and driver versions.
It is true that depending on the actual behavior of LoctRect, the application performance can vary significantly. That is indeed a downside, but I wonder if the designers of DirectX 10 and 11 are old enough to remember DirectX 3. It also lacked any API magic. The idea was that the application would create execute buffers that would reside on the graphics card, removing all abstraction and maximizing performance. Yet, because the interface was designed with hardware engineers in mind, it was basically unusable. The more abstract DirectX 5 interface was much easier to use yet (surprise!) didn't lead to lower frame rates.
Similarly, the more abstract interface of LockRect in DirectX 9 is more practical than the less sophisticated DirectX 10/11 Map behavior. Sure, in the simplest use cases the difference is negligible, but then again in the simplest cases LockRect wouldn't use API magic either, so the user would not be penalized by it. It is the non-trivial, less frequent yet performance-critical uses that need a more abstract interface.
For example, if the texture is created and locked with the correct flags, DirectX 9 could let the application download new data to a texture even while it is currently used by a shader, which isn't possible in DirectX 10/11. How does that work in DirectX 9? If write-discard locking is requested, the driver could map a separate temporary video memory buffer for access by the application. As soon as the device is done using the texture, the new memory buffer can be associated with the texture at the cost of a pointer swap.
It can be argued that this is done behind the user's back: if that's what the application wanted to do it could do it itself right? The problem is that this type of systems are difficult to program. Besides, even if the user has the skill and resources needed to provide a high-quality implementation, they can't possibly compete with a driver programmer who is not only a specialist but also has the advantage of writing hardware-specific code and is enjoying much lower-level access to the graphics card.
Formatting hint: when posting comments, surround code blocks in [@ and @].