This post is deepdive into how OpenGL application interacts with underlying software stack which are OpenGL implementation and graphics drivers in the kernel. The stack is something like this.
User Land ------------> OpenGL implementation (Mesa) -------------> Intel driver (i915) -----------> HW
Userland1: application and GLUT Link to heading
Starting with a simple application to show square polygon. This application uses libglut
for window management to show OpenGL application. Cool!
#include <GL/glut.h>
void displayMe(void)
{
glClear(GL_COLOR_BUFFER_BIT);
glBegin(GL_POLYGON);
glVertex3f(0.0, 0.0, 0.0);
glVertex3f(0.5, 0.0, 0.0);
glVertex3f(0.5, 0.5, 0.0);
glVertex3f(0.0, 0.5, 0.0);
glEnd();
glFlush();
}
int main(int argc, char **argv)
{
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_SINGLE);
glutInitWindowSize(300, 300);
glutInitWindowPosition(100, 100);
glutCreateWindow("Hello world");
glutDisplayFunc(displayMe);
glutMainLoop();glut OpenGL
return 0;
}
For reference, glut
is The OpenGL Utility Toolkit
which is a minimal window manager. The popular implementation is freeglut3
as according to link, The original GLUT library was deprecated.
/usr/include/GL/freeglut_std.h
include headers from GL/gl.h
# include <GL/gl.h>
# include <GL/glu.h>
Userland2: OpenGL implementation libraries - Mesa Link to heading
OpenGL is just API specification and the underlying system needs to implement these API’s. I am using mesa as it’s a popular option (it implements other stuff like Vulkan as well).
Ok, Let’s have a look at the next layer which is libmesa. Looking at the shared objects, As expected, It links to libglut.so and libGl.so.
$ ldd main
libglut.so.3 => /lib/x86_64-Linux-gnu/libglut.so.3 (0x00007fcb71176000)
libGL.so.1 => /lib/x86_64-Linux-gnu/libGL.so.1 (0x00007fcb710ef000)
Getting the symbols of libGL.so
, obviously, the symbols are there.
m -D /lib/x86_64-Linux-gnu/libGL.so.1|rg " glClear$"
0000000000047700 T glClear
In src/./mesa/main/clear.c
, clear
is defined. The important part is the call to st_Clear
static ALWAYS_INLINE void
clear(struct gl_context *ctx, GLbitfield mask, bool no_error)
{
FLUSH_VERTICES(ctx, 0, 0);
if (!no_error) {
if (mask & ~(GL_COLOR_BUFFER_BIT |
GL_DEPTH_BUFFER_BIT |
GL_STENCIL_BUFFER_BIT |
GL_ACCUM_BUFFER_BIT)) {
_mesa_error( ctx, GL_INVALID_VALUE, "glClear(0x%x)", mask);
return;
}
....
....
st_Clear(ctx, bufferMask);
st_Clear
eventually calls pipe->clear
st->pipe->clear(st->pipe, clear_buffers, have_scissor_buffers ? &scissor_state : NULL,
(union pipe_color_union*)&ctx->Color.ClearColor,
ctx->Depth.Clear, ctx->Stencil.Clear);
For intel i915, clear
is set to i915 clear functions in i915_clear.c
.
if (i915_screen(screen)->debug.use_blitter)
i915->base.clear = i915_clear_blitter;
else
i915->base.clear = i915_clear_render;
i915->base.draw_vbo = i915_draw_vbo;
In src/gallium/drivers/i915/i915_clear.c
, i915_clear_render
calls i915_clear_emit
. Note that gallium is API between graphics drivers as described on docs page
Gallium3D is a new architecture for building 3D graphics drivers. Initially supporting Mesa and Linux graphics drivers, Gallium3D is designed to allow portability to all major operating systems and graphics interfaces.
Gallium is layered library
- API state tracker
- GPU-specific driver
- winsys (depends on OS, in case of Linux, It’s DRM)
For State tracker, Documentation says:
The Mesa state tracker is the piece which interfaces core Mesa to the Gallium3D interface. It’s responsible for translating Mesa state (blend modes, texture state, etc) and drawing commands (like glDrawArrays and glDrawPixels) into pipe objects and operations.
For winsys, It’s basically DRM interface. More on that later.
Back to i915_clear_render
in src/gallium/drivers/i915/i915_clear.c
.
i915_clear_render(struct pipe_context *pipe, unsigned buffers,
const struct pipe_scissor_state *scissor_state,
const union pipe_color_union *color, double depth,
unsigned stencil)
{
struct i915_context *i915 = i915_context(pipe);
if (i915->dirty)
i915_update_derived(i915);
i915_clear_emit(pipe, buffers, color, depth, stencil, 0, 0,
i915->framebuffer.width, i915->framebuffer.height);
}
And i915_clear_emit
is defined in the same file.
void
i915_clear_emit(struct pipe_context *pipe, unsigned buffers,
const union pipe_color_union *color, double depth,
unsigned stencil, unsigned destx, unsigned desty,
unsigned width, unsigned height)
In i15_clear_emit
, there are several OUT_BATCH
to fill out the batch memory for kernel driver
OUT_BATCH(_3DSTATE_SCISSOR_ENABLE_CMD | DISABLE_SCISSOR_RECT);
OUT_BATCH(_3DSTATE_CLEAR_PARAMETERS);
OUT_BATCH(CLEARPARAM_WRITE_COLOR | CLEARPARAM_CLEAR_RECT);
/* Used for zone init prim */
OUT_BATCH(clear_color);
OUT_BATCH(clear_depth);
/* Used for clear rect prim */
OUT_BATCH(clear_color8888);
OUT_BATCH_F(f_depth);
OUT_BATCH(clear_stencil);
So, OUT_BATCH
is just a write
to that memory and increasing point by +4 as seen in i915_winsys_batchbuffer_dword_unchecked
. Those macros are defined in src/gallium/drivers/i915/i915_batch.h
.
#define OUT_BATCH(dword) i915_winsys_batchbuffer_dword(i915->batch, dword)
static inline void
i915_winsys_batchbuffer_dword_unchecked(struct i915_winsys_batchbuffer *batch,
unsigned dword)
{
*(unsigned *)batch->ptr = dword;
batch->ptr += 4;
}
static inline void
i915_winsys_batchbuffer_dword(struct i915_winsys_batchbuffer *batch,
unsigned dword)
{
assert(i915_winsys_batchbuffer_space(batch) >= 4);
i915_winsys_batchbuffer_dword_unchecked(batch, dword);
}
The important part is the call to FLUSH_BATCH
to send the above batch memory to kernel. But How?
FLUSH_BATCH(NULL, I915_FLUSH_ASYNC);
FLUSH_BATCH
is a call to i915_flush
which calls batchbuffer_flush
#define FLUSH_BATCH(fence, flags) i915_flush(i915, fence, flags)
void
i915_flush(struct i915_context *i915, struct pipe_fence_handle **fence,
unsigned flags)
{
struct i915_winsys_batchbuffer *batch = i915->batch;
batch->iws->batchbuffer_flush(batch, fence, flags);
i915->vbo_flushed = 1;
batchbuffer_flush
is set to i915_drm_batchbuffer_flush
which drm_intel_bo_exec
which is part of libdrm
static void
i915_drm_batchbuffer_flush(struct i915_winsys_batchbuffer *ibatch,
struct pipe_fence_handle **fence,
enum i915_winsys_flush_flags flags)
{
struct i915_drm_batchbuffer *batch = i915_drm_batchbuffer(ibatch);
unsigned used;
int ret;
...
...
/* Do the sending to HW */
ret = drm_intel_bo_subdata(batch->bo, 0, used, batch->base.map);
if (ret == 0 && i915_drm_winsys(ibatch->iws)->send_cmd)
ret = drm_intel_bo_exec(batch->bo, used, NULL, 0, 0);
Note that libdrm
is user space wrappers around DRM ioctl to DRM.
Before I jump to the kernel, I want to highlight that libmesa
will write commands to buffer for HW to execute and can use ioctl to trigger execution
for example, the clear command _3DSTATE_CLEAR_PARAMETERS
and this command is defined by Intel (see pdf for opcode breakdown)
OUT_BATCH(_3DSTATE_CLEAR_PARAMETERS);
OUT_BATCH((clear_params & ~CLEARPARAM_WRITE_COLOR) |
CLEARPARAM_CLEAR_RECT);
/* Used for zone init prim */
OUT_BATCH(clear_color);
OUT_BATCH(clear_depth);
/* Used for clear rect prim */
OUT_BATCH(clear_color8888);
OUT_BATCH_F(f_depth);
OUT_BATCH(clear_stencil);
Now we saw one way to communicate to HW through commands written to batch buffers, we will have to see DRM interface between libmesa and Linux kernel.
We know that libmesa uses libdrm. And eventually libdrm will call ioctl. In libdrm
code, I can see that drm_intel_bo_exec
will use DRM_IOCTL_I915_GEM_EXECBUFFER2_WR
ret = drmIoctl(bufmgr_gem->fd,
DRM_IOCTL_I915_GEM_EXECBUFFER2_WR,
&execbuf);
if (ret != 0) {
ret = -errno;
if (ret == -ENOSPC) {
DBG("Execbuffer fails to pin. "
"Estimate: %u. Actual: %u. Available: %u\n",
drm_intel_gem_estimate_batch_space(bufmgr_gem->exec_bos,
bufmgr_gem->exec_count),
drm_intel_gem_compute_batch_space(bufmgr_gem->exec_bos,
bufmgr_gem->exec_count),
(unsigned int) bufmgr_gem->gtt_size);
}
}
drm_intel_update_buffer_offsets2(bufmgr_gem);
Linux Intel driver Side Link to heading
In this section, we go through Intel Linux driver i915
which is uses the kernel’s DRM. The files can be found at drivers/gpu/drm/i915
.
In i915.rst
The Intel GPU family is a family of integrated GPU’s using Unified Memory Access. For having the GPU “do work”, user space will feed the GPU batch buffers via one of the ioctls
DRM_IOCTL_I915_GEM_EXECBUFFER2
orDRM_IOCTL_I915_GEM_EXECBUFFER2_WR
. Most such batchbuffers will instruct the GPU to perform work (for example rendering) and that work needs memory from which to read and memory to which to write. All memory is encapsulated within GEM buffer objects (usually created with the ioctlDRM_IOCTL_I915_GEM_CREATE
). An ioctl providing a batchbuffer for the GPU to create will also list all GEM buffer objects that the batchbuffer reads and/or writes. For implementation details of memory management seeGEM BO Management Implementation Details
_.
In i915_driver
, The ioctl callbacks are registered. I am putting i915_gem_execbuffer2_ioctl
as it will be called for command execution.
static const struct drm_ioctl_desc i915_ioctls[] = {
....
DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2_ioctl, DRM_RENDER_ALLOW),
i915_gem_execbuffer2_ioctl
copies command buffer from user space and calls i915_gem_do_execbuffer
. I will stop here to recover from PTSD.
if (copy_from_user(exec2_list,
u64_to_user_ptr(args->buffers_ptr),
sizeof(*exec2_list) * count)) {
drm_dbg(&i915->drm, "copy %zd exec entries failed\n", count);
kvfree(exec2_list);
return -EFAULT;
}
err = i915_gem_do_execbuffer(dev, file, args, exec2_list);