The Musical Instrument Digital Interface (MIDI) protocol is an industry-standard defined in 1982 to represent musical information.
It is used in electronic instruments (keyboards, synthesizers, drum machines, sound cards), computer applications which produce sound, such as multimedia presentations, computer games and others.
MIDI does not transmit an audio signal or media, it transmits "event messages" such as the pitch and intensity of musical notes to play,
control signals for parameters such as volume, vibrato and panning, cues, and clock signals to set the tempo.
To create monophonic MIDI files checkout our online MIDI maker.
Information
A MIDI file (file extension: .mid) consists of a stream of 8-bit bytes. All 16-bit and 32-bit quantities are constructed by reading
in two or four 8-bit bytes, respectively. The bytes are joined together in big-endian order.
In this tutorial, we use notation u1, u2, and u4 to mean an unsigned one-, two-, or four-byte quantity, respectively.
uN means a variable number of bytes.
For Standard MIDI files (SMF) this value is always 6, represented by the following hex values:
00
00
00
06
It is the number of bytes used by "MIDI format (u2)", "Number of tracks in the MIDI file (u2)" and "The speed of the music (time division) (u2)".
Thus u2+u2+u2=u6.
The MIDI file contains a single multi-channel track.
All the data is put into only one track (not to be confused with a channel) in an uninterrupted data stream
in the file, where all the bytes are next to each other.
If you have a simple bass-line and a melody, in format 0, you would alternately encode notes of the bass-line and melody
next to each other.
1
00
01
The MIDI file contains one or more simultanious tracks (or MIDI outputs) of a sequence.
In format 1 the channel data can be compartmentalized into 1 or more tracks (up to 65535).
If you have a simple bass-line and a melody, in format 1, all the melody notes would go into one track, and all the bass
notes would go into another track. Also in format 1, you can imitate the format 0 interleaving style in any of the tracks.
2
00
02
The MIDI file contains one or more sequentially independant single-track patterns.
A format 2 MIDI file is a sort of a combination of the other two formats. It contains multiple tracks, but each track
represents a different sequence which may not necessarily be played simultaneously. This is meant to be used to save
drum patterns, or other multi-pattern music sequences.
The time division is used to decode the track event delta times into "real" time.
It has two formats:
metrical time
Determines the beats (or ticks) per quarter note.
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
0
Where bit number 0-14 (green) represent the delta time "beats or ticks" which make up a quarter note.
Where bit number 15 (red) is always 0.
For example: 0x0080 means:
128 beats for a 1/4 note.
64 beats for a 1/8 note.
32 beats for a 1/16 note.
16 beats for a 1/32 note.
256 beats for a 1/2 note.
512 beats for a whole note.
0x0050 means:
80 beats for a 1/4 note.
40 beats for a 1/8 note.
20 beats for a 1/16 note.
10 beats for a 1/32 note.
160 beats for a 1/2 note.
320 beats for a whole note.
time-code-based time
Determines the number of frames per second SMPTE time and the number of beats (or ticks) per frame.
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
1
Where bit number 0-7 (blue) represents beats (or ticks) per frame.
Where bit number 8-14 (green) represents the frames per second SMPTE time.
Allowed values: -24, -25, -29 or -30.
The negative values are stored in two complements form.
If value is -24, it represents 24 frames per second.
If value is -25, it represents 25 frames per second.
If value is -29, it represents 29 frames per second.
If value is -30, it represents 30 frames per second.
Where bit number 15 (red) is always 1.
For example: 0xE878 means:
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
1
1
1
0
1
0
0
0
0
1
1
1
1
0
0
0
Bit number 0-7 (blue) represents 120 beats (or ticks) per frame.
Bit number 8-14 (green) represents 24 frames per second SMPTE time.
The value 24 is calculated as follow:
Step 1:
11101000 Two complements form. Bit number 8-15.
00010111 Inverted two complements form.
Step 2:
00010111 Inverted two complements form.
00000001 Add 1
Step 3:
00011000 Represents value 24.
For example: 0xE764 means:
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
1
1
1
0
0
1
1
1
0
1
1
0
0
1
0
0
Bit number 0-7 (blue) represents 100 beats (or ticks) per frame.
Bit number 8-14 (green) represents 25 frames per second SMPTE time.
For example: 0xE350 means:
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
1
1
1
0
0
0
1
1
0
0
1
1
0
0
1
0
Bit number 0-7 (blue) represents 50 beats (or ticks) per frame.
Bit number 8-14 (green) represents 29 frames per second SMPTE time.
For example: 0xE250 means:
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
1
1
1
0
0
0
1
0
0
0
1
1
0
0
1
0
Bit number 0-7 (blue) represents 50 beats (or ticks) per frame.
Bit number 8-14 (green) represents 30 frames per second SMPTE time.
The track header must have the value MTrk and marks the start of the track event, where the actual song data are stored.
The track header is represented by the following hex values:
An event can be a message to play or stop a note, to change the instrument, etc.
All events always starts with a delta-time even if the delta-time is zero (=0x00).
A delta-time is stored in at least 1 byte and maximum 4 bytes. Because the delta-time can be stored in a variable number of bytes,
bit no. 7 bit of each byte has a special use. If this bit is zero then this byte is the last byte of the series.
In all of the preceding bytes, bit no. 7 has value 1.
7
6
5
4
3
2
1
0
<bitno.
For example:
Delta-time = 0xFF7F means:
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
1
1
1
1
1
1
1
1
0
1
1
1
1
1
1
1
Bit number 7 must be 0 because it is the last byte of the series.
Bit number 15 must be 1 because it is NOT the last byte of the series.
To determine what the actual value delta-time=0xFF7F represent, do the following:
If bit number 7 and bit number 15 are "emptied" and the most significant byte is shifted 1 bit to the right, the two bytes looks like:
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
<bitno.
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
This represents actual hex value = 3FFF.
So, if a delta-time is between 0-127, it can be represented as one byte. The largest delta-time allowed is 0FFFFFFF, which translates to
4 bytes size. Here are examples of delta-times as 4 bytes values, and the variable length quantities that they translate to:
len
Refers to the length portion of the meta-event syntax, that is, a number, stored as a variable-length quantity,
which specifies how many data bytes follow it in the meta-event.
text and data
Refers to however many bytes of (possibly text) data were just specified by the length.
nnnn
Refers to channel 0-15 (0 = for musicians channel no 1, 15 = for musicians no 16)