Vulkan - new generation cross platform graphics and GPGPU computing API

+
Vulkan - new generation cross platform graphics and GPGPU computing API

Since it's a major release and new effort, making a new thread for it.

https://www.khronos.org/vulkan
http://arstechnica.com/gadgets/2015/03/khronos-unveils-vulkan-opengl-built-for-modern-systems/

Some important bits:







* Rapid progress since June 2014, significant proposals and IP contributions received from members
* Participants come from all segments of the graphics industry, including an unprecedented level of participation from game engine ISVs
* Initial specs and implementations expected this year
* Will work on any platform that supports OpenGL ES 3.1 and up

---------- Updated at 07:27 PM ----------

UPDATE:

Various demos with Vulkan surfacing up: http://blog.imgtec.com/powervr/trying-out-the-new-vulkan-graphics-api-on-powervr-gpus
 
Last edited:
Sounds interesting, but I'd like to know more about the GPGPU API. For instance, having a unified "command queue" that automatically allocates resources for compute threads only works for independent tasks. The point of GPGPU is exploiting the SIMD architecture, e.g. vectorized data with parameterized, fine-grained parallelism, and therefore fully synchronized threads.

Edit: I think I misread the chart. It probably means multiple CPU threads may simultaneously submit tasks to the GPU, each through a dedicated "command queue". I assume each "command buffer" is SIMD-parallel code with the appropriate data to thread mapping. Does this mean we can perform multiple device memory reads/writes simultaneously?
 
Last edited:
We'll have to wait until more details are published. But I assume it can support multiple GPUs access in parallel, at least it makes sense from performance perspective.
 
We'll have to wait until more details are published. But I assume it can support multiple GPUs access in parallel, at least it makes sense from performance perspective.

Oh I'm sure multiple GPU's will be supported. In CUDA you simply parameterize the device number and run a "kernel". In a cluster environment, you distribute tasks across multiple nodes with something like MPI and run CUDA kernels for the respective amount of devices on each node. Easily scalable.

What I meant was multiple simultaneous memory accesses on one device.
 
Ah, parallel memory access in API on the same device - that I'm not sure. It's also hardware dependent. The recent fiasco with Nvidia GTX 970 was about that, some parts of memory from 3.5 GB and up were not built for parallel access.
 
If this turns out to be cleaner and *as* efficient as CUDA I'd promote it in all the labs I'm working at.
 
It looks like they have taken note of and intend to improve the principal limitation of OpenGL, which is that it is pervasively single threaded. They will have to implement support for multiple contexts active simultaneously in multiple threads, and do it well, or it will be inferior to work that has already been done in DirectX 12 and Mantle.
 
Looks like they are doing it, see the second picture above.

I know, but the question I have is "how well". They attempted multithread support in the existing OpenGL. It's a botch. It needs to be done better, and the presence of a single userland thread doing all the command queuing does not immediately reassure me.
 
From the diagram it looks like GPU handles commands sequentially. But shouldn't it handle them in parallel?

Eventually though something has to handle producing the final result somewhere from all the parallel contexts. It shouldn't be the driver's task I assume in order to reduce the complexity in it.
 
Last edited:
Yeah, but something has to handle producing the final result somewhere from all the parallel contexts. It shouldn't be the driver's task I assume in order to reduce the complexity in the driver.

From the diagram it looks like GPU handles commands sequentially. But shouldn't it handle them in parallel?

Modern GPUs handle commands in parallel. Actually they allow command queues to be delivered in parallel and execute them on multiple kernels in parallel. Being able to exploit this feature is the fundamental motivation for DirectX 12.

This design looks like it takes no advantage of that basic improvement in GPU architecture, it merely creates a (one hopes) better engineered synchronization point for multiple userland threads.
 
May be that thread simply works as a dispatcher. I.e. queue is a command channel, and dispatcher thread pulls from the queue and pushes to GPU where those tasks run in parallel. Not sure why can't previous threads do that directly.

Queue is usually used when there is some ordering logic involved, but it doesn't mean that commands on the queue need to be run sequentially. They are fetched sequentially, but can further run in parallel. Fetching command from the queue is significantly faster operation than running it I guess.

---------- Updated at 08:42 PM ----------

Usually when dispatching becomes a bottleneck you simply add another dispatcher.

I.e. imagine you have a set of workers who fill in wagons with coal in parallel (then wagons run sequentially using circular rails), and you have another set of workers who unload wagons also in parallel.

No idea though why would you need that queue (in the example above just build many rails). Reasons aren't really explained there.
 
Last edited:
It's a sequence point that is real enough to warrant explicit mention in an architecture diagram. Neither DirectX 12 nor Mantle have such a sequence point. You have multiple threads interacting directly with the GPU. This puts Vulkan on more or less the same architecture as DirectX 11, not Mantle, not 12.
 
Last edited:
May be it's just an example of using it. I.e. they mention: application is responsible for thread management and sync. I.e. they give building blocks - you decide how to apply them. You can construct it in a different way (like I said, use multiple queues, multiple dispatchers or whatever).

I'd expect them not to make any such mandatory bottlenecks in the API.
 
Others who have actually worked with Vulkan have mentioned the use of a dedicated thread for submitting command queues. So I do not think it is just an example. Maybe it is similar to nVidia "bindless", which allows working threads to interact with reserved memory, then submit through a still-single command queue. But it does not appear to be like Mantle, which has multiple command queues simultaneously in communication with the GPU.

http://blog.imgtec.com/powervr/trying-out-the-new-vulkan-graphics-api-on-powervr-gpus

So far, they seem to have put a lot more of their effort into shader language and compatibility, which is entirely a Good Thing.

Off topic: Specifically for gaming, an effort with maybe equal or better payoff would be multithreading Lua. Lua has been somewhat resistant to efforts to map coroutines onto threads that the OS can schedule. This leaves games in a situation where all the scripts run on only one core, and it causes single-core performance to limit game performance.
 
Last edited:
It's somewhat not clear:

Command buffers can be created on a different thread to the thread they are submitted on. This means rendering commands could be created on all cores of a CPU.

May be it means one submitting thread, but it might not preclude several of them. I'd wait for more architectural details to be published.

Unified shader compiler is a major win - bugs in shader compilers which were specific to each vendor were a major obstacle for OpenGL usage.

---------- Updated at 09:26 PM ----------

Multithreading Lua sounds interesting, but I'm not familiar with the language, so I'm not sure how well it's suited for it. May be something like Rust would be more suitable since it's inherently geared for parallelism (though Lua is popular for games scripting and already widely used).
 
Gilrond,

If this thread will become a reference for this technology it may be worth fixing a small mistake in the title. Or maybe you meant something else and I'm confusing it.

General Purpose computation on Graphical Processing Units is normally called GPGPU.
 
On a semi-related note, in a move surprising no one(if anything it was expected) Mantle SDK has been cancelled, the spec/code won't be open-sourced like they planned earlier.
 
Top Bottom