`

Padding and alignment: How I fixed a bug and learned this 1 GCC trick

While working with the python-can library, I ran into a problem with the SocketCAN interface that ended up with me looking into why the periodic backend wasn't working as expected, finding an incorrect assumption about alignment, and becoming a maintainer.

My fix is now available on PyPI as part of python-can 3.2.0!

Background

Padding is added to C structures in order to satisfy the alignment requirements of various types. The Lost Art of Structure Packing is the classic article that covers this information—it's how I initially learned about this when studying for internship interviews.

However, in a nutshell, in addition to each scalar type having a size, they also each have their own alignment requirements regarding where they should begin. Moreover, structs may add additional padding in order to make the size of the struct aligned to the width of its largest scalar member.

If you've worked with low-level programming and attempted to write portable code (or interviewed for a low-level position, since struct packing questions seem to come up very frequently), you're probably very aware that the size of various integer types is not guaranteed to be consistent across various platforms. As such, the general recommendation is to use the fixed width integer types from <stdint.h> if a specific width is needed.

However, what's often glossed over is that the alignment of these types is also not guaranteed.

SocketCAN is a set of Networking Drivers used to interface with CAN buses. It provides a unified API exposed through Berkeley sockets (and allows developers to take advantage of the network queuing code) instead of custom device drivers, and perhaps most importantly, provides a layer of abstraction for communicating with CAN controllers.

Now, the python-can library overloads the generic userspace implementation for transmitting cyclic CAN messages, and instead, pushes it into the kernel's Broadcast Manager (BCM) APIs for the SocketCAN interface. The kernel exposes a set of BCM headers to userspace, which provides an API for userspace applications to communicate with the kernel.

While reading through the kernel APIs for the Broadcast Manager, I learned about this trick that you can use with zero-length arrays in GNU C (with extensions) that the kernel uses to declare structs of variable length.

/**
 * struct bcm_msg_head - head of messages to/from the broadcast manager
 * @opcode:    opcode, see enum below.
 * @flags:     special flags, see below.
 * @count:     number of frames to send before changing interval.
 * @ival1:     interval for the first @count frames.
 * @ival2:     interval for the following frames.
 * @can_id:    CAN ID of frames to be sent or received.
 * @nframes:   number of frames appended to the message head.
 * @frames:    array of CAN frames.
 */
struct bcm_msg_head {
    __u32 opcode;
    __u32 flags;
    __u32 count;
    struct bcm_timeval ival1, ival2;
    canid_t can_id;
    __u32 nframes;
    struct can_frame frames[0];
};

I guess the main benefit of this is so that you can allocate the memory in one contiguous block.

struct {
  struct bcm_msg_head msg_head;
  struct can_frame frame[1];
} msg;

However, due to this, despite the length of the array member being zero, it still affects the padding of the enclosing struct, which makes sense. The idea is that we're going to allocate a variable-length array of that type immediately after the header, so aligning the struct is a good idea.

In the Linux headers, can_frame specifies __attribute__((aligned(8))), which denotes that the struct should use 8-byte wide alignment. As such, despite the array of frames being 0, the 8-byte alignment requirement is carried over.

/**
 * struct can_frame - basic CAN frame structure
 * @can_id:  CAN ID of the frame and CAN_*_FLAG flags, see canid_t definition
 * @can_dlc: frame payload length in byte (0 .. 8) aka data length code
 *           N.B. the DLC field from ISO 11898-1 Chapter 8.4.2.3 has a 1:1
 *           mapping of the 'data length code' to the real payload length
 * @__pad:   padding
 * @__res0:  reserved / padding
 * @__res1:  reserved / padding
 * @data:    CAN frame payload (up to 8 byte)
 */
struct can_frame {
    canid_t can_id;  /* 32 bit CAN_ID + EFF/RTR/ERR flags */
    __u8    can_dlc; /* frame payload length in byte (0 .. CAN_MAX_DLEN) */
    __u8    __pad;   /* padding */
    __u8    __res0;  /* reserved / padding */
    __u8    __res1;  /* reserved / padding */
    __u8    data[CAN_MAX_DLEN] __attribute__((aligned(8)));
};

The Problem

The periodic SocketCAN backend didn't seem to work correctly on 32-bit operating systems. I was running 32-bit Ubuntu 16.04, and was getting an Exception whenever I tried sending cyclic messages.

can.CanError: Couldn't send CAN BCM frame. OS Error 22: Invalid argument
You are probably referring to a non-existing frame.

Debugging the Problem

As it turns out, someone had reported the same bug a few months earlier, but no progress had been made. Some initial debugging in the ticket noted that the size of the struct format string being used was different on two 32-bit platforms—Raspbian Stretch and Debian Stretch. Could this be due to the alignment of the types used in the struct format string?

I first verified that the size of struct bcm_msg_head was the 40 bytes that I expected. So I wrote a test C program and compiled it on each platform. I also took advantage of this opportunity to dump some additional information that I thought might be useful.

#include <linux/can/bcm.h>

#include <stdalign.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>

int main() {
  struct bcm_msg_head test = { 0 };

  printf("sizeof(long): %zu\n", sizeof(long));
  printf("alignof(long): %zu\n", alignof(long));
  printf("sizeof(long long): %zu\n", sizeof(long long));
  printf("alignof(long long): %zu\n", alignof(long long));
  printf("sizeof(struct bcm_msg_head): %zu\n", sizeof(struct bcm_msg_head));
  printf("alignof(struct bcm_msg_head): %zu\n", alignof(struct bcm_msg_head));
  return 0;
}

The original format string used was @3I4l2I0q, which forces padding bytes to be inserted at the end in order to meet alignment rules.

So I also dumped the size (using sizeof) and alignment (using alignof) of long long.

We can write a GDB script to dump the offsets of the struct members in order to check where the padding inserted.

b test.c:11
r
p sizeof(long)
p sizeof(test)
p &test.opcode
p &test.flags
p &test.count
p &test.ival1
p &test.ival1.tv_sec
p &test.ival1.tv_usec
p &test.ival2
p &test.ival2.tv_sec
p &test.ival2.tv_usec
p &test.can_id
p &test.nframes

On my laptop (a T440s running Ubuntu 18.04 LTS), this is how the layout looks like:

struct bcm_msg_head {
    __u32 opcode;               /* 4 bytes */
    __u32 flags;                /* 4 bytes */
    __u32 count;                /* 4 bytes */
    __u32 pad;                  /* pad 4 bytes to an 8-byte boundary */
    struct bcm_timeval ival1;   /* 16 bytes */
    struct bcm_timeval ival2;   /* 16 bytes */
    canid_t can_id;             /* 4 bytes */
    __u32 nframes;              /* 4 bytes */
};

On Debian Stretch (32-bit) and Raspbian Stretch, this is how the layout looks like:

struct bcm_msg_head {
    __u32 opcode;               /* 4 bytes */
    __u32 flags;                /* 4 bytes */
    __u32 count;                /* 4 bytes */
    struct bcm_timeval ival1;   /* 8 bytes */
    struct bcm_timeval ival2;   /* 8 bytes */
    canid_t can_id;             /* 4 bytes */
    __u32 nframes;              /* 4 bytes */
    __u32 pad;                  /* pad 4 bytes to an 8-byte boundary */
};

This revealed that on Raspbian Stretch, alignof(long long) == 8, but on 32-bit Debian Stretch, alignof(long long) == 4.

As such, if we apply the alignment requirements to the struct format string, we would see the following in the successful case (when alignof(long long) == 8):

// @: assuming native types and alignments are being used
// 3l
uint32_t opcode;     // sizeof: 4, alignof: 4
uint32_t flags;      // sizeof: 4, alignof: 4
uint32_t count;      // sizeof: 4, alignof: 4
// 4l
long ival1.tv_sec;   // sizeof: 4, alignof: 4
long ival1.tv_usec;  // sizeof: 4, alignof: 4
long ival2.tv_sec;   // sizeof: 4, alignof: 4
long ival2.tv_usec;  // sizeof: 4, alignof: 4
// 2l
uint32_t can_id;     // sizeof: 4, alignof: 4
uint32_t nframes;    // sizeof: 4, alignof: 4
// 0q
// long long frames; // sizeof: 0, alignof: 8
// Add padding bytes due to the the alignment requirements of above
// until we are a multiple of 8
uint8_t pad[4];

However, when alignof(long long) == 4, we see the following

// @: assuming native types and alignments are being used
// 3l
uint32_t opcode;     // sizeof: 4, alignof: 4
uint32_t flags;      // sizeof: 4, alignof: 4
uint32_t count;      // sizeof: 4, alignof: 4
// 4l
long ival1.tv_sec;   // sizeof: 4, alignof: 4
long ival1.tv_usec;  // sizeof: 4, alignof: 4
long ival2.tv_sec;   // sizeof: 4, alignof: 4
long ival2.tv_usec;  // sizeof: 4, alignof: 4
// 2l
uint32_t can_id;     // sizeof: 4, alignof: 4
uint32_t nframes;    // sizeof: 4, alignof: 4
// 0q
// long long frames; // sizeof: 0, alignof: 4
// No padding bytes are required, since everything is 4-byte aligned!

Since everything was already 4-byte aligned, no additional padding was needed. As such, this is only 36 bytes, instead of the expected 40.

Creating the Fix

I first thought that maybe we could salvage the struct format string, and find a type that was guaranteed to be 8-byte aligned. Unfortunately, I don't believe one exists.

I then looked into ctypes and CFFI, and they both seemed promising. I initially thought that I could just specify the _pack_ attribute on the ctypes object, and then it would align everything to the 8-byte boundary. This would result in a cleaner solution, since everything would be encapsulated by the ctypes library. No additional code would need to be written—the only change would be to convert the struct format string to the ctypes equivalent.

However, upon a closer inspection of the documentation, this attribute only specifies the minimum alignment, and not the maximum. As such, this could not be used to solve the problem I was facing.

At this point, I decided to look into CFFI. CFFI allows you to directly copy your C types over, and it tries to "just work". I was hopeful that it would be able to derive the alignment rules automatically, but unfortunately, it doesn't support __attribute__((aligned)), so that was another dead end.

It was looking more and more like we would need to somehow insert padding bytes ourselves. I was hoping to find some library that I could pull in to do that work, banking that someone else had done the legwork already, but none seemed to exist.

Then I thought about it, and it didn't seem too bad.

The whole thing could be handled by a greedy function that tried to place a scalar member, and inserted padding either when the current position wasn't aligned for the current member trying to get placed, or at the end in order for the size of the struct to become a multiple of the widest scalar member. ctypes exposed alignment functionality, so all I needed to do was programatically build a ctypes.Structure for struct bcm_msg_head.

The only snag was that ctypes requires that _fields_ must be known in advance when a class is declared. However, since we don't know how much padding needs to be inserted, this meant that I needed to create a factory that would derive the alignment rules and then return the BcmMsgHeader class.

I wrote a few test cases using mocks providing the data I had gathered from the various platforms as a sanity check to make sure I wasn't breaking something.

Once I finished programming and ran my test SocketCAN application, it ran without any problems. I had to check to make sure I was ssh'd into the right box, and not using one of the working reproduction cases.

Reflection

This was a fun problem to debug, and is one of those problems where the solution becomes obvious when you understand what is going on. Once I understood what the problem was, coding a solution didn't take more than 5 minutes. It was fairly isolated from the rest of the code base, and was reasonably easy to reproduce.

I think after interning at NVIDIA and Tesla, I've become more comfortable diving into unfamiliar code bases in order to figure out what's going on.

NVIDIA was probably the most challenging, since not only was it a large codebase, but I had never worked on the Linux kernel before, and so everything was new to me. I had no mental model of how things should work, besides what was covered in my Operating Systems course (which I didn't find too helpful). At Tesla, I at least had some background about the components of an Electric Vehicle, which helped me build a mental model of how things should work. Working on Tegra Device Drivers at NVIDIA showed me just how much knowledge other individuals have about various subjects that I don't, which was inspiring to see. It really highlighted how there's always more for me to learn.

Anyways, I've been enjoying my time off between graduating and starting full-time just working on things like this. It's nice not having any assignment deadlines hanging over me, and just being able to relax, learn, and work on things I find interesting.