Vulnerability Localization
In the initial phase of vulnerability analysis, due to the absence of readily available PoCs or detailed analysis reports, we first attempted to read and understand the patch code for CVE-2023-4863 in the upstream repository of webmproject/libwebp. However, the WebM Project’s official patch was relatively complex, making it difficult for us to accurately pinpoint the root cause of the vulnerability.
Thus, we turned our attention to Apple’s official patch for CVE-2023-41064, and performed a comparison of the ImageIO framework before and after the update using BinDiff. We noticed that Apple’s patch involved fewer code changes and was much easier to understand.
In short, Apple’s fix introduces an additional check in the WebP decoder: if there is an out-of-bounds write while constructing the Huffman Table, it directly returns an error instead of continuing the decoding process.
diff --git a/src/dec/vp8l_dec.c b/src/dec/vp8l_dec.c
index 45012162..06b142bc 100644
--- a/src/dec/vp8l_dec.c
+++ b/src/dec/vp8l_dec.c
@@ -438,6 +438,7 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
goto Error;
}
+ bound = &huffman_tables[num_htree_groups * table_size];
huffman_table = huffman_tables;
for (i = 0; i < num_htree_groups_max; ++i) {
// If the index "i" is unused in the Huffman image, just make sure the
diff --git a/src/utils/huffman_utils.c b/src/utils/huffman_utils.c
index 90c2fbf7..13054715 100644
--- a/src/utils/huffman_utils.c
+++ b/src/utils/huffman_utils.c
@@ -191,6 +191,7 @@ static int BuildHuffmanTable(HuffmanCode* const root_table, int root_bits,
}
code.bits = (uint8_t)(len - root_bits);
code.value = (uint16_t)sorted[symbol++];
+ if (bound && &table[key >> root_bits + table_size] >= bound) return 0;
ReplicateValue(&table[key >> root_bits], step, table_size, code);
key = GetNextKey(key, len);
}
Therefore, it is highly likely that the vulnerability was caused by the lack of validity checks on input data when constructing the Huffman Table, leading to an overflow of the allocated memory for the table, i.e., a buffer overflow occurred.
Vulnerability Analysis
The root of the vulnerability lies within the code logic that handles WebP images. When parsing a lossless WebP image, the decoder uses the sequential data compression (LZ77), prefix coding, and a color cache to compress the image’s ARGB data. For specific details of this process, Google provides an exhaustive description in its technical documentation and offers a clear structural specification of the WebP file format.
In this process, the decoder first reads prefix-encoded data from the image data stream and uses this data to construct a complete Huffman coding table. Then, based on this table, the decoder decodes the compressed data in the stream, restoring the original image. As there are already many introductory articles on the algorithmic details of Huffman coding, this article will not elaborate further.
Following the Canonical Huffman Coding algorithm, when constructing the Huffman coding table, a first-level table is used, which is for querying Huffman codes with lengths less than N bits (default is 8 bits); if there are codes exceeding N bits, the decoder allocates second-level tables to query these extended codes.
When allocating memory for the Huffman coding tables, the decoder reserves enough space to accommodate all primary and secondary tables at once, and the memory size is fixed:
// Memory needed for lookup tables of one Huffman tree group. Red, blue, alpha
// and distance alphabets are constant (256 for red, blue and alpha, 40 for
// distance) and lookup table sizes for them in worst case are 630 and 410
// respectively. Size of green alphabet depends on color cache size and is equal
// to 256 (green component values) + 24 (length prefix values)
// + color_cache_size (between 0 and 2048).
// All values computed for 8-bit first level lookup with Mark Adler's tool:
// https://github.com/madler/zlib/blob/v1.2.5/examples/enough.c
#define FIXED_TABLE_SIZE (630 * 3 + 410)
static const uint16_t kTableSize[12] = {
FIXED_TABLE_SIZE + 654,
FIXED_TABLE_SIZE + 656,
FIXED_TABLE_SIZE + 658,
FIXED_TABLE_SIZE + 662,
FIXED_TABLE_SIZE + 670,
FIXED_TABLE_SIZE + 686,
FIXED_TABLE_SIZE + 718,
FIXED_TABLE_SIZE + 782,
FIXED_TABLE_SIZE + 912,
FIXED_TABLE_SIZE + 1168,
FIXED_TABLE_SIZE + 1680,
FIXED_TABLE_SIZE + 2704
};
const int table_size = kTableSize[color_cache_bits];
huffman_tables = (HuffmanCode*)WebPSafeMalloc(num_htree_groups * table_size, sizeof(*huffman_tables));
From this code snippet, it can be seen that the table consists of five parts, corresponding to the lookup tables for the red, green, blue, alpha, and distance channels respectively. The size of the tables for the red, blue, and alpha channels are fixed at 630, the size of the table for the distance channel is fixed at 410, while the size of the green channel’s table depends on the size of the color cache. The sum of these sizes constitutes the total size of the entire table. Additionally, the author provides a tool from the zlib library, enough.c, which can calculate the maximum possible size of the lookup table given a specified code length and color cache size.
The problem is that the decoder assumes by default that the Huffman table data stored in the image is reasonable, and thus it pre-calculates the maximum memory length based on this assumption. The core premise here is that the encoded data must conform to the standards of Canonical Huffman Coding, meaning the Huffman tree should be a binary tree, with each leaf node corresponding to a prefix code and no unused hanging leaf nodes. However, since the Huffman table data comes from an untrusted source and may be arbitrarily constructed by an attacker, the decoder parses an incomplete binary tree without validating these data, which could allocate too many second-level tables, leading to the total memory usage exceeding the pre-allocated size and causing a heap buffer overflow.
Taking the green channel as an example, when the color cache size is 0, the maximum size of its Huffman table is 654. Using the enough
tool, we can obtain the possible structure of its Huffman table (referencing lifthrasiir’s expression):
Len Code range # Root entry Overhead #
--- ------------------------------------ --- ----------------- -------- ---
1 0 1 0xxxxxxx 0 128
9 10000000:0 .. 11110110:1 238 10000000-11110110 2^1 119
11110111:0 1 11110111 2^2 1
10 11110111:10 .. 11110111:11 2
11111000:00 .. 11111110:11 28 11111000-11111110 2^2 7
11111111:00 .. 11111111:10 3 11111111 2^7 1
11 11111111:110 1
12 11111111:1110 1
13 11111111:11110 1
15 11111111:1111100 .. 11111111:1111111 4
In this case, the memory size of the Huffman table is 256 (1st table) + 2*119 + 4*8 + 128 = 654, which precisely matches the maximum value hardcoded.
However, if we construct an incomplete Huffman tree, containing a large number of long codes, the decoder will allocate second-level tables without bounds checking:
Len Code range # Root entry Overhead #
--- ------------------------------------ --- ----------------- -------- ---
9 00000000:0 1 00000000 2^7 1
10 00000000:10 1
11 00000000:110 1
12 00000000:1110 1
13 00000000:11110 1
14 00000000:111110 1
15 00000000:1111110 .. 00000000:1111111 2
00000001:0000000 .. 00000010:1111111 256 00000001-00000010 2^7 2
00000011:0000000 .. 00000011:0001111 1 00000011 2^7 1
Under this circumstances, the memory size of the Huffman table should be 256 (1st table) + 128*4 = 768, which undoubtedly exceeds the hardcoded maximum size.
Subsequently, the decoder will call the BuildHuffmanTable
function to build the Huffman table in the initially allocated memory space, writing the reversed prefix codes into memory through the ReplicateValue
function. Since the number of allocated second-level tables exceeds expectations, out-of-bounds writing occurs, which may allow an attacker to execute arbitrary code.
// Fill in 2nd level tables and add pointers to root table.
for (len = root_bits + 1, step = 2; len <= MAX_ALLOWED_CODE_LENGTH;
++len, step <<= 1) {
// ... snip ...
for (; count[len] > 0; --count[len]) {
// ... snip ...
code.bits = (uint8_t)(len - root_bits);
code.value = (uint16_t)sorted[symbol++];
ReplicateValue(&table[key >> root_bits], step, table_size, code); // overflow here
key = GetNextKey(key, len);
}
}
Vulnerability Exploitation
How to construct a PoC and bypass checks?
It naturally occurred to us that, if we could directly construct a sufficiently large Huffman table, would it be possible to overflow the memory space allocated by the program?
The answer is no. Because the overflow length of a single Huffman table is limited and is not enough to cover the entire huffman_tables
. Moreover, each time after constructing a Huffman table, the program will check the integrity of the Huffman tree, and an incomplete tree will cause the decoding process to be interrupted:
// Check if tree is full.
if (num_nodes != 2 * offset[MAX_ALLOWED_CODE_LENGTH] - 1) {
return 0;
}
Therefore, a feasible method is to construct four normal Huffman tables corresponding to the green, red, blue and alpha channels, respectively, and ensure that they can reach their respective maximum allocation space under the condition of ensuring WebP image format compliance. Then, an incomplete Huffman tree is constructed in the distance channel, so that the cumulative size of all Huffman tables exceeds the predetermined memory capacity, resulting in memory overflow.
How to trigger a crash?
We found that although the PoC could trigger ASAN’s error report or cause a crash of the dwebp
tool with glibc, it does not affect mainstream browsers like Safari, Chrome, etc.
After some debugging, we discovered that the default size of the Huffman table is 2954 table entries, and the allocated space is 2954*4=11816 (0x2e28) bytes. However, in Chromium and WebKit, malloc(0x2e28) will eventually allocate 0x3000 bytes of memory space, while the overflow length of the sample is about 0x190 bytes, which means our overflow is not sufficient to cross the allocated memory boundary, therefore not triggering a crash.
Therefore, we need to make the size of the allocated Huffman table as close to 0x3000 as possible. How do we achieve this? In the vulnerability analysis section, we mentioned that the green channel space in kTableSize
is variable-length and its size is affected by the color cache. If we set the size of the color cache to 6, the size of the Huffman table becomes (630 * 3 + 410 + 718) * 4 = 12072 = 0x2f28 bytes, allowing us to overwrite the content of the adjacent heap chunks.
How to build primitives?
In order to achieve controllable and stable heap out-of-bounds writes, we need to construct a powerful exploitation primitive, which needs to achieve the effect that the attacker can write controllable data at any specified heap offset.
However, there are some difficulties in implementing both of these points in WebP vulnerabilities. On the one hand, the unit written by the Huffman table is HuffmanCode, in which only part of the data can be controlled by the attacker; on the other hand, HuffmanCode is not written into the Huffman table sequentially, and its index is calculated according to the rules of reverse Huffman coding, so it can not directly control the position of OOB writes.
How to control the data to write?
Let’s first focus on the deconstruction of the written data. The Huffman table is actually an array containing multiple HuffmanCode, where each HuffmanCode is structured as follows:
typedef struct {
uint8_t bits; // number of bits used for this symbol
uint16_t value; // symbol value or table offset
} HuffmanCode;
The memory layout of the structure is as follows:
| bits (1 byte) | padding (1 byte) | value (2 bytes) |
In this structure, the bits
field represents the code length of the current HuffmanCode, and the value
field represents the value. Each HuffmanCode occupies 4 bytes of memory. The range of bits
is limited by the prefix code length, and its value range is [1, 15]; while value
is the actual encoded data of the current HuffmanCode, which can be controlled by the attacker, and its range depends on the range of the encoded data, for example, for RGB color encoding, the range is [0, 255].
Since we are constructing the overflow sample in the distance channel, the number of coded symbols is 40, so we can write a 4-byte object, in which the 2 bytes of the high address is controllable, and the value range is [0,39].
How to control the position to write?
Reading the code, we find that HuffmanCode is written into memory in the ReplicateValue
function, while the position of the write is calculated in the GetNextKey
function:
// Returns reverse(reverse(key, len) + 1, len), where reverse(key, len) is the
// bit-wise reversal of the len least significant bits of key.
static WEBP_INLINE uint32_t GetNextKey(uint32_t key, int len) {
uint32_t step = 1 << (len - 1);
while (key & step) {
step >>= 1;
}
return step ? (key & (step - 1)) + step : key;
}
code.bits = (uint8_t)(len - root_bits);
code.value = (uint16_t)sorted[symbol++];
ReplicateValue(&table[key >> root_bits], step, table_size, code);
It can be seen that HuffmanCode is not written into the Huffman table sequentially; its index is obtained through reversed prefix code calculation. To control the index of the write, we calculated the index sequence of each HuffmanCode in the second-level Huffman table under a 15-bit prefix code length (encoding field is 2^(15-8)=128
):
0x0 0x40 0x20 0x60 0x10 0x50 0x30 0x70 0x8 0x48 0x28 0x68 0x18 0x58 0x38 0x78
0x4 0x44 0x24 0x64 0x14 0x54 0x34 0x74 0xc 0x4c 0x2c 0x6c 0x1c 0x5c 0x3c 0x7c
0x2 0x42 0x22 0x62 0x12 0x52 0x32 0x72 0xa 0x4a 0x2a 0x6a 0x1a 0x5a 0x3a 0x7a
0x6 0x46 0x26 0x66 0x16 0x56 0x36 0x76 0xe 0x4e 0x2e 0x6e 0x1e 0x5e 0x3e 0x7e
0x1 0x41 0x21 0x61 0x11 0x51 0x31 0x71 0x9 0x49 0x29 0x69 0x19 0x59 0x39 0x79
0x5 0x45 0x25 0x65 0x15 0x55 0x35 0x75 0xd 0x4d 0x2d 0x6d 0x1d 0x5d 0x3d 0x7d
0x3 0x43 0x23 0x63 0x13 0x53 0x33 0x73 0xb 0x4b 0x2b 0x6b 0x1b 0x5b 0x3b 0x7b
0x7 0x47 0x27 0x67 0x17 0x57 0x37 0x77 0xf 0x4f 0x2f 0x6f 0x1f 0x5f 0x3f 0x7f
Our idea is: to construct four 15-bit encodings in the overflowed Huffman table, so that its fourth HuffmanCode is written to the index at 0x60, enabling it to cover the data of the next heap chunk. Additionally, we can construct multiple 9-bit encodings (where the size of the second-level Huffman table is 2), allowing this index to be adjusted in increments of 2, thereby achieving controllable index. As shown in the following figure:
Therefore, we end up with the primitive effect that the attacker can write a partially controllable 4-byte data with a controllable offset in multiples of 8 bytes.