This post is deepdive into how OpenGL application interacts with underlying software stack which are OpenGL implementation and graphics drivers in the kernel. The stack is something like this.

User Land ------------> OpenGL implementation (Mesa) -------------> Intel driver (i915) -----------> HW

Userland1: application and GLUT Link to heading

Starting with a simple application to show square polygon. This application uses libglut for window management to show OpenGL application. Cool!

#include <GL/glut.h>

void displayMe(void)
{
    glClear(GL_COLOR_BUFFER_BIT);
    glBegin(GL_POLYGON);
    glVertex3f(0.0, 0.0, 0.0);
    glVertex3f(0.5, 0.0, 0.0);
    glVertex3f(0.5, 0.5, 0.0);
    glVertex3f(0.0, 0.5, 0.0);
    glEnd();
    glFlush();
}

int main(int argc, char **argv)
{
    glutInit(&argc, argv);
    glutInitDisplayMode(GLUT_SINGLE);

    glutInitWindowSize(300, 300);
    glutInitWindowPosition(100, 100);
    glutCreateWindow("Hello world");

    glutDisplayFunc(displayMe);

    glutMainLoop();glut OpenGL
    return 0;
}

For reference, glut is The OpenGL Utility Toolkit which is a minimal window manager. The popular implementation is freeglut3 as according to link, The original GLUT library was deprecated.

/usr/include/GL/freeglut_std.h include headers from GL/gl.h

#   include <GL/gl.h>
#   include <GL/glu.h>

Userland2: OpenGL implementation libraries - Mesa Link to heading

OpenGL is just API specification and the underlying system needs to implement these API’s. I am using mesa as it’s a popular option (it implements other stuff like Vulkan as well).

Ok, Let’s have a look at the next layer which is libmesa. Looking at the shared objects, As expected, It links to libglut.so and libGl.so.

$ ldd main
	libglut.so.3 => /lib/x86_64-Linux-gnu/libglut.so.3 (0x00007fcb71176000)
	libGL.so.1 => /lib/x86_64-Linux-gnu/libGL.so.1 (0x00007fcb710ef000)

Getting the symbols of libGL.so, obviously, the symbols are there.

m -D  /lib/x86_64-Linux-gnu/libGL.so.1|rg " glClear$"
0000000000047700 T glClear

In src/./mesa/main/clear.c, clear is defined. The important part is the call to st_Clear

static ALWAYS_INLINE void
clear(struct gl_context *ctx, GLbitfield mask, bool no_error)
{
   FLUSH_VERTICES(ctx, 0, 0);

   if (!no_error) {
      if (mask & ~(GL_COLOR_BUFFER_BIT |
                   GL_DEPTH_BUFFER_BIT |
                   GL_STENCIL_BUFFER_BIT |
                   GL_ACCUM_BUFFER_BIT)) {
         _mesa_error( ctx, GL_INVALID_VALUE, "glClear(0x%x)", mask);
         return;
      }

....
....
      st_Clear(ctx, bufferMask);

st_Clear eventually calls pipe->clear

      st->pipe->clear(st->pipe, clear_buffers, have_scissor_buffers ? &scissor_state : NULL,
                      (union pipe_color_union*)&ctx->Color.ClearColor,
                      ctx->Depth.Clear, ctx->Stencil.Clear);

For intel i915, clear is set to i915 clear functions in i915_clear.c.

   if (i915_screen(screen)->debug.use_blitter)
      i915->base.clear = i915_clear_blitter;
   else
      i915->base.clear = i915_clear_render;

   i915->base.draw_vbo = i915_draw_vbo;

In src/gallium/drivers/i915/i915_clear.c, i915_clear_render calls i915_clear_emit. Note that gallium is API between graphics drivers as described on docs page

Gallium3D is a new architecture for building 3D graphics drivers. Initially supporting Mesa and Linux graphics drivers, Gallium3D is designed to allow portability to all major operating systems and graphics interfaces.

Gallium is layered library

  • API state tracker
  • GPU-specific driver
  • winsys (depends on OS, in case of Linux, It’s DRM)

For State tracker, Documentation says:

The Mesa state tracker is the piece which interfaces core Mesa to the Gallium3D interface. It’s responsible for translating Mesa state (blend modes, texture state, etc) and drawing commands (like glDrawArrays and glDrawPixels) into pipe objects and operations.

For winsys, It’s basically DRM interface. More on that later.

Back to i915_clear_render in src/gallium/drivers/i915/i915_clear.c.

i915_clear_render(struct pipe_context *pipe, unsigned buffers,
                  const struct pipe_scissor_state *scissor_state,
                  const union pipe_color_union *color, double depth,
                  unsigned stencil)
{
   struct i915_context *i915 = i915_context(pipe);

   if (i915->dirty)
      i915_update_derived(i915);

   i915_clear_emit(pipe, buffers, color, depth, stencil, 0, 0,
                   i915->framebuffer.width, i915->framebuffer.height);
}

And i915_clear_emit is defined in the same file.

void
i915_clear_emit(struct pipe_context *pipe, unsigned buffers,
                const union pipe_color_union *color, double depth,
                unsigned stencil, unsigned destx, unsigned desty,
                unsigned width, unsigned height)

In i15_clear_emit, there are several OUT_BATCH to fill out the batch memory for kernel driver

      OUT_BATCH(_3DSTATE_SCISSOR_ENABLE_CMD | DISABLE_SCISSOR_RECT);

      OUT_BATCH(_3DSTATE_CLEAR_PARAMETERS);
      OUT_BATCH(CLEARPARAM_WRITE_COLOR | CLEARPARAM_CLEAR_RECT);
      /* Used for zone init prim */
      OUT_BATCH(clear_color);
      OUT_BATCH(clear_depth);
      /* Used for clear rect prim */
      OUT_BATCH(clear_color8888);
      OUT_BATCH_F(f_depth);
      OUT_BATCH(clear_stencil);

So, OUT_BATCH is just a write to that memory and increasing point by +4 as seen in i915_winsys_batchbuffer_dword_unchecked. Those macros are defined in src/gallium/drivers/i915/i915_batch.h.

#define OUT_BATCH(dword) i915_winsys_batchbuffer_dword(i915->batch, dword)

static inline void
i915_winsys_batchbuffer_dword_unchecked(struct i915_winsys_batchbuffer *batch,
                                        unsigned dword)
{
   *(unsigned *)batch->ptr = dword;
   batch->ptr += 4;
}

static inline void
i915_winsys_batchbuffer_dword(struct i915_winsys_batchbuffer *batch,
                              unsigned dword)
{
   assert(i915_winsys_batchbuffer_space(batch) >= 4);
   i915_winsys_batchbuffer_dword_unchecked(batch, dword);
}

The important part is the call to FLUSH_BATCH to send the above batch memory to kernel. But How?

   FLUSH_BATCH(NULL, I915_FLUSH_ASYNC);

FLUSH_BATCHis a call to i915_flush which calls batchbuffer_flush

#define FLUSH_BATCH(fence, flags) i915_flush(i915, fence, flags)
void
i915_flush(struct i915_context *i915, struct pipe_fence_handle **fence,
           unsigned flags)
{
   struct i915_winsys_batchbuffer *batch = i915->batch;

   batch->iws->batchbuffer_flush(batch, fence, flags);
   i915->vbo_flushed = 1;

batchbuffer_flush is set to i915_drm_batchbuffer_flush which drm_intel_bo_exec which is part of libdrm

static void
i915_drm_batchbuffer_flush(struct i915_winsys_batchbuffer *ibatch,
                           struct pipe_fence_handle **fence,
                           enum i915_winsys_flush_flags flags)
{
   struct i915_drm_batchbuffer *batch = i915_drm_batchbuffer(ibatch);
   unsigned used;
   int ret;
...
...

   /* Do the sending to HW */
   ret = drm_intel_bo_subdata(batch->bo, 0, used, batch->base.map);
   if (ret == 0 && i915_drm_winsys(ibatch->iws)->send_cmd)
      ret = drm_intel_bo_exec(batch->bo, used, NULL, 0, 0);

Note that libdrm is user space wrappers around DRM ioctl to DRM.

Before I jump to the kernel, I want to highlight that libmesa will write commands to buffer for HW to execute and can use ioctl to trigger execution for example, the clear command _3DSTATE_CLEAR_PARAMETERS and this command is defined by Intel (see pdf for opcode breakdown)

      OUT_BATCH(_3DSTATE_CLEAR_PARAMETERS);
      OUT_BATCH((clear_params & ~CLEARPARAM_WRITE_COLOR) |
                CLEARPARAM_CLEAR_RECT);
      /* Used for zone init prim */
      OUT_BATCH(clear_color);
      OUT_BATCH(clear_depth);
      /* Used for clear rect prim */
      OUT_BATCH(clear_color8888);
      OUT_BATCH_F(f_depth);
      OUT_BATCH(clear_stencil);

Example image

Now we saw one way to communicate to HW through commands written to batch buffers, we will have to see DRM interface between libmesa and Linux kernel. We know that libmesa uses libdrm. And eventually libdrm will call ioctl. In libdrm code, I can see that drm_intel_bo_exec will use DRM_IOCTL_I915_GEM_EXECBUFFER2_WR

	ret = drmIoctl(bufmgr_gem->fd,
		       DRM_IOCTL_I915_GEM_EXECBUFFER2_WR,
		       &execbuf);
	if (ret != 0) {
		ret = -errno;
		if (ret == -ENOSPC) {
			DBG("Execbuffer fails to pin. "
			    "Estimate: %u. Actual: %u. Available: %u\n",
			    drm_intel_gem_estimate_batch_space(bufmgr_gem->exec_bos,
							       bufmgr_gem->exec_count),
			    drm_intel_gem_compute_batch_space(bufmgr_gem->exec_bos,
							      bufmgr_gem->exec_count),
			    (unsigned int) bufmgr_gem->gtt_size);
		}
	}
	drm_intel_update_buffer_offsets2(bufmgr_gem);

Linux Intel driver Side Link to heading

In this section, we go through Intel Linux driver i915 which is uses the kernel’s DRM. The files can be found at drivers/gpu/drm/i915.

In i915.rst

The Intel GPU family is a family of integrated GPU’s using Unified Memory Access. For having the GPU “do work”, user space will feed the GPU batch buffers via one of the ioctls DRM_IOCTL_I915_GEM_EXECBUFFER2 or DRM_IOCTL_I915_GEM_EXECBUFFER2_WR. Most such batchbuffers will instruct the GPU to perform work (for example rendering) and that work needs memory from which to read and memory to which to write. All memory is encapsulated within GEM buffer objects (usually created with the ioctl DRM_IOCTL_I915_GEM_CREATE). An ioctl providing a batchbuffer for the GPU to create will also list all GEM buffer objects that the batchbuffer reads and/or writes. For implementation details of memory management see GEM BO Management Implementation Details_.

In i915_driver, The ioctl callbacks are registered. I am putting i915_gem_execbuffer2_ioctl as it will be called for command execution.

static const struct drm_ioctl_desc i915_ioctls[] = {
   ....
	DRM_IOCTL_DEF_DRV(I915_GEM_EXECBUFFER2_WR, i915_gem_execbuffer2_ioctl, DRM_RENDER_ALLOW),

i915_gem_execbuffer2_ioctl copies command buffer from user space and calls i915_gem_do_execbuffer. I will stop here to recover from PTSD.

	if (copy_from_user(exec2_list,
			   u64_to_user_ptr(args->buffers_ptr),
			   sizeof(*exec2_list) * count)) {
		drm_dbg(&i915->drm, "copy %zd exec entries failed\n", count);
		kvfree(exec2_list);
		return -EFAULT;
	}

	err = i915_gem_do_execbuffer(dev, file, args, exec2_list);