ggml-org/llama.cpp

GGUF endianness cannot be determined from GGUF itself

Open

#3,957 opened on Nov 5, 2023

View on GitHub
 (20 comments) (4 reactions) (0 assignees)C++ (18,202 forks)batch import
breaking changeenhancementgood first issue

Repository metrics

Stars
 (110,169 stars)
PR merge metrics
 (Avg merge 6d 8h) (389 merged PRs in 30d)

Description

As of the time of writing, the big-endian support that was added in https://github.com/ggerganov/llama.cpp/pull/3552 doesn't encode the endianness within the file itself:

https://github.com/ggerganov/llama.cpp/blob/3d48f42efcd05381221654376e9f6f69d76af739/gguf-py/gguf/gguf.py#L689-L698

This means that there is no way to distinguish a big-endian GGUF file from a little-endian file, which may cause some degree of consternation in the future if these files get shared around 😅

The cleanest solution would be to add the endianness to the header - ideally, it would be in the metadata, but the reading of the metadata is dependent on the endianness - but that would be a breaking change.

Given that, my suggestion would be to use FUGG as the header for big-endian files so that a little-endian executor won't attempt to read it at all unless it knows how to deal with it. The same can go the other way, as well (a big-endian executor won't attempt to read a little-endian executor).

Contributor guide