Bencode format
Introduction
Bencode is a simple encoding developed for BitTorrent. It support 4 basic data types:
integer
string
lists
dictionary
The main advantages of bencode are its simplicity and that is unaffected by endianness, which is important for cross-platform applications. Another advantage is that their exists a bijection between values and their encoded form. As a consequence values can be compared in encoded form.
Encoding
Integers
Integers are encoded as: i<base ten ASCII representation>e
.
Leading zeros are not allowed except for the number zero.
Negative values are encoded by prefixing the number with a hyphen-minus.
Negative zero is not permitted.
The number 42 would thus be encoded as i42e
, 0 as i0e
, and -42 as i-42e
.
Strings
Strings are length-prefixed base ten followed by a colon and the string: <length>:<contents>
.
The length is encoded in base 10, like integers and must be positive (zero is allowed).
The contents are the bytes that make up the string.
If the content represents text the encoding must be UTF-8,
but the string datatype can also be used for a sequence of raw bytes.
For example ‘spam’ is encoded as 4:spam
.
Lists
A list of values is encoded as an ‘l’ followed by their encoded elements followed by an ‘e’: l<contents>e
.
Elements are in order and concatenated.
A list consisting of the string “spam” and the number 42 would be encoded as: l4:spami42ee
.
Dictonaries
Dictionaries are encoded as a ‘d’ followed by a list of alternating keys and their
corresponding values followed by an ‘e’: d<bencoded key><bencoded value>>e
.
Keys must be strings and appear in lexicographicly sorted order.
For example {'cow': 'moo', 'spam': 'eggs'}
is encoded as d3:cow3:moo4:spam4:eggse
and {'spam': ['a', 'b']}
. as d4:spaml1:a1:bee