The .msg format, used by Microsoft Outlook for Windows, is an odd duck. It feels at times almost as if Microsoft is trying to lock users into the Windows ecosystem by ensuring that the files they archive can only be used on the Windows operating system. But the .msg file format seems almost sane, at least when compared to the .olk format as used by Microsoft Outlook for Macintosh – that really is a howling at the moon crazy format.
So… Let’s examine .olk and see if we can work out what makes it tick.
OLK (Outlook for Mac Local Cache) is a proprietary binary file format used by Microsoft Outlook for Mac to store email messages, calendar events, contacts, and other mailbox items. Unlike Windows Outlook which uses monolithic PST/OST files, Outlook for Mac fragments each email into multiple component files stored across a directory structure.
Characteristics
- Fragmented Storage: A single email message is split across multiple files (header, body, attachments)
- Database-Backed: Files work in conjunction with a SQLite database (
Outlook.sqlite) that serves as an index - Profile-Based: Data is stored per-profile in identity folders
- Version-Specific: OLK14 (Outlook 2011) and OLK15 (Outlook 2016+) have different structures
- Binary Format: Files use little-endian binary encoding with tagged records
- Unicode Support: String data is typically stored as UTF-16-LE
Format Versions
| Version | Outlook Version | Notes |
| OLK14 | Outlook 2011 for Mac | Original Mac format |
| OLK15 | Outlook 2016, 2019, Office 365 for Mac | Enhanced structure, SQLite integration |
Directory Structure
OLK14 (Outlook 2011)
~/Documents/Microsoft User Data/Office 2011 Identities/Main Identity/
├── Data Records/
│ ├── Messages/
│ │ └── *.olk14message
│ ├── Message Sources/
│ │ └── *.olk14msgsource
│ ├── Message Attachments/
│ │ └── *.olk14msgattach
│ ├── Contacts/
│ │ └── *.olk14contact
│ ├── Events/
│ │ └── *.olk14event
│ ├── Tasks/
│ │ └── *.olk14task
│ ├── Notes/
│ │ └── *.olk14note
│ └── Categories/
│ └── *.olk14category
├── Folders/
│ └── *.olk14folder
├── Signatures/
│ └── *.olk14signature
├── Mail Accounts/
│ └── *.olk14mailaccount
├── Searches/
│ └── *.olk14search
└── Preferences/
└── *.olk14pref
OLK15 (Outlook 2016+)
~/Library/Group Containers/UBF8T346G9.Office/Outlook/Outlook 15 Profiles/Main Profile/
├── Data/
│ ├── Messages/
│ │ └── [0-255]/
│ │ └── *.olk15Message
│ ├── Message Sources/
│ │ └── [0-255]/
│ │ └── *.olk15MsgSource
│ ├── Message Attachments/
│ │ └── [0-255]/
│ │ └── *.olk15MsgAttach
│ ├── Contacts/
│ │ └── *.olk15Contact
│ ├── Events/
│ │ └── *.olk15Event
│ ├── Tasks/
│ │ └── *.olk15Task
│ ├── Notes/
│ │ └── *.olk15Note
│ └── Categories/
│ └── *.olk15Category
├── Outlook.sqlite (Primary database/index)
└── Outlook.sqlite-wal (Write-ahead log)
Note: OLK15 uses numbered subdirectories (0-255) to distribute files and prevent filesystem performance issues with large mailboxes.
File Types Summary
Email Related Files
| Extension | Content | Description |
|---|---|---|
.olk14message.olk15Message | Header Metadata | To, From, Subject, Date, Message-ID, recipients. Does NOT contain body text. |
.olk14msgsource .olk15MsgSource | Body Content | Plain text, HTML, and/or RTF body content. May contain embedded MIME data. |
.olk14msgattach.olk15MsgAttach | Attachment Data | Individual attachment files with metadata (filename, content-type, encoding). |
Other Data Types
| Extension | Content |
|---|---|
.olk14contact .olk15Contact | Contact records (name, email, phone, address, organization) |
.olk14event .olk15Event | Calendar events (time, date, invitees, location, recurrence) |
.olk14task .olk15Task | Task/to-do items |
.olk14note .olk15Note | Sticky notes |
.olk14folder .olk15Folder | Folder structure definitions |
.olk14category .olk15Category | Category/label definitions |
.olk14signature .olk15Signature | Email signatures (HTML format) |
.olk14mailaccount .olk15MailAccount | Account configuration and credentials |
.olk14search .olk15Search | Saved search queries |
.olk14pref .olk15Pref | Application preferences |
Binary File Structure
All OLK binary files follow a similar tagged-record pattern. Data is stored in little-endian byte order.
General Binary Layout
+------------------+
| File Header | (Variable size, version-dependent)
+------------------+
| Record 1 |
+------------------+
| Record 2 |
+------------------+
| ... |
+------------------+
| Record N |
+------------------+
| (Optional) Body | (For MsgSource files: raw HTML/text content)
+------------------+
Record Structure
Each record follows a Tag-Length-Value (TLV) pattern:
+----------------+----------------+----------------+----------------+
| Tag (4 bytes) | Type (2 bytes) | Length (var) | Value (var) |
+----------------+----------------+----------------+----------------+
Tag: Identifies the field (e.g., Subject, From, Date) Type: Indicates the data type of the value Length: Size of the value in bytes (encoding depends on type) Value: The actual data
Property Types
| Type ID | Name | Description | Value Format |
|---|---|---|---|
| 0x0001 | Null | No value | (none) |
| 0x0002 | Int16 | 16-bit signed integer | 2 bytes |
| 0x0003 | Int32 | 32-bit signed integer | 4 bytes |
| 0x0004 | Float | 32-bit float | 4 bytes |
| 0x0005 | Double | 64-bit double | 8 bytes |
| 0x0006 | Currency | Currency value | 8 bytes |
| 0x0007 | AppTime | Application time | 8 bytes |
| 0x000A | Error | Error code | 4 bytes |
| 0x000B | Boolean | Boolean value | 2 bytes (0=false, 1=true) |
| 0x0014 | Int64 | 64-bit signed integer | 8 bytes |
| 0x001E | String8 | ASCII string | Length-prefixed bytes |
| 0x001F | Unicode | UTF-16-LE string | Length-prefixed bytes |
| 0x0040 | SysTime | Windows FILETIME | 8 bytes |
| 0x0048 | GUID | 128-bit GUID | 16 bytes |
| 0x0102 | Binary | Binary blob | Length-prefixed bytes |
String Encoding
Strings in OLK15 files are typically: – Stored as UTF-16-LE (Little Endian) – Prefixed with a 4-byte length field (character count, not byte count) – May include null terminator
+----------------+--------------------------------+
| Length (4 B) | UTF-16-LE String Data |
+----------------+--------------------------------+
Field Tags (Property IDs)
These tags are based on MAPI property IDs and identify specific fields within records.
Message Header Fields (olk14message / olk15Message)
| Tag (Hex) | Field Name | Data Type | Description |
|---|---|---|---|
| 0x0017 | Importance | Int32 | 0=Low, 1=Normal, 2=High |
| 0x001A | MessageClass | Unicode | Message type (e.g., “IPM.Note”) |
| 0x0026 | Priority | Int32 | X-Priority value |
| 0x0036 | Sensitivity | Int32 | 0=Normal, 1=Personal, 2=Private, 3=Confidential |
| 0x0037 | Subject | Unicode | Email subject line |
| 0x0039 | SentTime | SysTime | When message was sent |
| 0x0042 | SentRepresentingName | Unicode | Display name of sender delegate |
| 0x0065 | SentRepresentingEmail | Unicode | Email of sender delegate |
| 0x0070 | ConversationTopic | Unicode | Conversation/thread topic |
| 0x0071 | ConversationIndex | Binary | Thread tracking blob |
| 0x0C1A | SenderName | Unicode | Display name of sender |
| 0x0C1F | SenderEmailAddress | Unicode | Email address of sender |
| 0x0E02 | DisplayBcc | Unicode | BCC recipients (display string) |
| 0x0E03 | DisplayCc | Unicode | CC recipients (display string) |
| 0x0E04 | DisplayTo | Unicode | To recipients (display string) |
| 0x0E06 | DeliveryTime | SysTime | When message was delivered |
| 0x0E07 | MessageFlags | Int32 | Bitfield of message status flags |
| 0x0E08 | MessageSize | Int32 | Total message size in bytes |
| 0x0E17 | MessageStatus | Int32 | Status flags |
| 0x0E1D | NormalizedSubject | Unicode | Subject without RE:/FW: prefixes |
| 0x1000 | Body | Unicode | Plain text body (may be in MsgSource) |
| 0x1009 | RtfCompressed | Binary | Compressed RTF body |
| 0x1013 | BodyHtml | Unicode | HTML body content |
| 0x1035 | InternetMessageId | String8 | RFC Message-ID header |
| 0x1042 | InReplyTo | String8 | In-Reply-To header value |
Recipient Fields
| Tag (Hex) | Field Name | Data Type | Description |
|---|---|---|---|
| 0x0C15 | RecipientType | Int32 | 1=To, 2=CC, 3=BCC |
| 0x3001 | DisplayName | Unicode | Recipient display name |
| 0x3003 | EmailAddress | Unicode | Recipient email address |
| 0x39FE | SmtpAddress | Unicode | SMTP email address |
Attachment Fields (olk14msgattach / olk15MsgAttach)
| Tag (Hex) | Field Name | Data Type | Description |
|---|---|---|---|
| 0x0E20 | AttachSize | Int32 | Size of attachment in bytes |
| 0x3701 | AttachDataBinary | Binary | Raw attachment data |
| 0x3703 | AttachExtension | Unicode | File extension |
| 0x3704 | AttachFilename | Unicode | Short filename |
| 0x3705 | AttachMethod | Int32 | How attachment is stored |
| 0x3707 | AttachLongFilename | Unicode | Full filename |
| 0x370E | AttachMimeTag | Unicode | MIME content-type |
| 0x3712 | AttachContentId | Unicode | Content-ID for inline attachments |
| 0x3716 | AttachContentDisposition | Unicode | “inline” or “attachment” |
Message Source File Structure (olk14msgsource / olk15MsgSource)
The MsgSource file contains the actual message body content. It may be structured in several ways:
Format 1: Raw MIME/RFC822
The file may contain a complete RFC822 message with MIME structure:
From: sender@example.com
To: recipient@example.com
Subject: Test
Content-Type: multipart/alternative; boundary="----=_Part_123"
------=_Part_123
Content-Type: text/plain; charset="UTF-8"
Plain text body here.
------=_Part_123
Content-Type: text/html; charset="UTF-8"
<html><body>HTML body here.</body></html>
------=_Part_123--
Format 2: Binary Header + UTF-16 Content
+------------------+
| Binary Header | (40+ bytes of metadata)
+------------------+
| UTF-16-LE HTML | (Starting with "<html" encoded as UTF-16-LE)
+------------------+
To find the HTML content, search for <html encoded as UTF-16-LE: – Byte sequence: 3C 00 68 00 74 00 6D 00 6C 00 = “<html”
Format 3: Tagged Records + Body
Similar to message files, with tagged records followed by body content.
Attachment File Structure (olk14msgattach / olk15MsgAttach)
Header Signature
Attachment files begin with a 4-byte signature:
Signature: "Attc" (0x41 0x74 0x74 0x63)
Attribute Structure
Following the signature, the file contains MIME-like attributes:
| Attribute | Description |
|---|---|
| Content-Type | MIME type of the attachment |
| Name | Original filename |
| Content-Disposition | “inline” or “attachment” |
| Filename | Same as Name |
| Content-Transfer-Encoding | Usually “base64” |
Data Section
After the header attributes, the raw attachment data follows. If Content-Transfer-Encoding is “base64”, the data is Base64-encoded.
File Association Rules
To reconstruct a complete email from OLK files, you must locate and merge associated files.
Association Methods
Method 1: Filename-Based (OLK14)
In OLK14, files for the same message share a common identifier in their filename:
Messages/x00_270429.olk14Message
Message Sources/x00_270429.olk14MsgSource
Message Attachments/x00_270429_1.olk14MsgAttach
Message Attachments/x00_270429_2.olk14MsgAttach
Pattern: {prefix}_{messageId}.{extension} Attachment Pattern: {prefix}_{messageId}_{attachmentIndex}.{extension}
Method 2: Record ID / GUID (OLK15)
OLK15 files use GUIDs as filenames:
Messages/11/0B2132B7-999F-4114-AC6C-E93DE72CEF9A.olk15Message
Message Sources/11/0B2132B7-999F-4114-AC6C-E93DE72CEF9A.olk15MsgSource
Message Attachments/11/0B2132B7-999F-4114-AC6C-E93DE72CEF9A_1.olk15MsgAttach
Pattern: {GUID}.{extension} Attachment Pattern: {GUID}_{attachmentIndex}.{extension}
Method 3: SQLite Database Lookup (OLK15 Preferred)
The most reliable method for OLK15 is to query the Outlook.sqlite database:
-- Find message source for a message
SELECT MessageSource.PathComponent
FROM Message
JOIN MessageSource ON Message.pk = MessageSource.Message
WHERE Message.PathComponent = 'GUID.olk15Message';
-- Find attachments for a message
SELECT Attachment.PathComponent
FROM Message
JOIN Attachment ON Message.pk = Attachment.Message
WHERE Message.PathComponent = 'GUID.olk15Message';
Association Pseudocode
FUNCTION findAssociatedFiles(filePath):
baseName = getBaseName(filePath) // Remove extension
extension = getExtension(filePath)
directory = getParentDirectory(filePath)
rootDirectory = getDataRootDirectory(directory)
// Determine version from extension
IF extension CONTAINS "olk14":
version = "OLK14"
ELSE IF extension CONTAINS "olk15":
version = "OLK15"
ELSE:
RETURN error("Unknown OLK version")
// Extract message identifier
IF version == "OLK14":
// Pattern: prefix_messageId or prefix_messageId_attachIndex
parts = split(baseName, "_")
messageId = parts[0] + "_" + parts[1]
ELSE:
// Pattern: GUID or GUID_attachIndex
IF baseName CONTAINS "_" AND isNumber(lastPart(baseName)):
messageId = baseName UP TO last "_"
ELSE:
messageId = baseName
result = {
messageFile: NULL,
sourceFile: NULL,
attachments: []
}
// Locate message file
messageDir = rootDirectory + "/Messages"
IF version == "OLK15":
// Check subdirectories
FOR subdir IN range(0, 255):
candidate = messageDir + "/" + subdir + "/" + messageId + ".olk15Message"
IF fileExists(candidate):
result.messageFile = candidate
BREAK
ELSE:
candidate = messageDir + "/" + messageId + ".olk14message"
IF fileExists(candidate):
result.messageFile = candidate
// Locate message source file
sourceDir = rootDirectory + "/Message Sources"
IF version == "OLK15":
FOR subdir IN range(0, 255):
candidate = sourceDir + "/" + subdir + "/" + messageId + ".olk15MsgSource"
IF fileExists(candidate):
result.sourceFile = candidate
BREAK
ELSE:
candidate = sourceDir + "/" + messageId + ".olk14msgsource"
IF fileExists(candidate):
result.sourceFile = candidate
// Locate attachment files
attachDir = rootDirectory + "/Message Attachments"
attachIndex = 1
WHILE TRUE:
IF version == "OLK15":
found = FALSE
FOR subdir IN range(0, 255):
candidate = attachDir + "/" + subdir + "/" +
messageId + "_" + attachIndex + ".olk15MsgAttach"
IF fileExists(candidate):
result.attachments.APPEND(candidate)
found = TRUE
BREAK
IF NOT found:
BREAK
ELSE:
candidate = attachDir + "/" + messageId + "_" + attachIndex + ".olk14msgattach"
IF fileExists(candidate):
result.attachments.APPEND(candidate)
ELSE:
BREAK
attachIndex = attachIndex + 1
RETURN result
Parsing Pseudocode
Reading a Message File
FUNCTION parseMessageFile(filePath):
data = readBinaryFile(filePath)
result = {}
offset = 0
// Skip file header (size varies by version)
version = detectVersion(data)
IF version == "OLK15":
offset = 128 // Larger header
ELSE:
offset = 64 // Standard header
// Parse tagged records
WHILE offset < length(data) - 8:
tag = readUInt32LE(data, offset)
offset = offset + 4
type = readUInt16LE(data, offset)
offset = offset + 2
// Read value based on type
value, bytesRead = readPropertyValue(data, offset, type)
offset = offset + bytesRead
// Map tag to field name
fieldName = tagToFieldName(tag)
IF fieldName != NULL:
result[fieldName] = value
RETURN result
FUNCTION readPropertyValue(data, offset, type):
SWITCH type:
CASE 0x0002: // Int16
RETURN readInt16LE(data, offset), 2
CASE 0x0003: // Int32
RETURN readInt32LE(data, offset), 4
CASE 0x000B: // Boolean
RETURN readUInt16LE(data, offset) != 0, 2
CASE 0x0014: // Int64
RETURN readInt64LE(data, offset), 8
CASE 0x001E: // ASCII String
length = readUInt32LE(data, offset)
stringData = readBytes(data, offset + 4, length)
RETURN decodeASCII(stringData), 4 + length
CASE 0x001F: // Unicode String
charCount = readUInt32LE(data, offset)
byteCount = charCount * 2
stringData = readBytes(data, offset + 4, byteCount)
RETURN decodeUTF16LE(stringData), 4 + byteCount
CASE 0x0040: // SysTime (FILETIME)
filetime = readUInt64LE(data, offset)
RETURN filetimeToDate(filetime), 8
CASE 0x0102: // Binary
length = readUInt32LE(data, offset)
RETURN readBytes(data, offset + 4, length), 4 + length
DEFAULT:
RETURN NULL, 0
Reading a Message Source File
FUNCTION parseMessageSource(filePath):
data = readBinaryFile(filePath)
result = {
plainText: NULL,
html: NULL,
rtf: NULL
}
// Try to find HTML content (UTF-16-LE encoded)
htmlMarker = encodeUTF16LE("<html")
htmlStart = findBytes(data, htmlMarker)
IF htmlStart != -1:
// Find end of HTML
htmlEndMarker = encodeUTF16LE("</html>")
htmlEnd = findBytes(data, htmlEndMarker, htmlStart)
IF htmlEnd != -1:
htmlEnd = htmlEnd + length(htmlEndMarker)
htmlData = data[htmlStart : htmlEnd]
result.html = decodeUTF16LE(htmlData)
ELSE:
// Read to end of file
htmlData = data[htmlStart :]
result.html = decodeUTF16LE(htmlData)
ELSE:
// Try plain text extraction
// May be RFC822 format or plain UTF-8/ASCII
textContent = tryDecodeAsText(data)
IF textContent != NULL:
IF looksLikeRFC822(textContent):
result = parseMIMEMessage(textContent)
ELSE:
result.plainText = textContent
RETURN result
Reading an Attachment File
FUNCTION parseAttachment(filePath):
data = readBinaryFile(filePath)
result = {
filename: NULL,
contentType: NULL,
disposition: NULL,
data: NULL
}
// Check for "Attc" signature
signature = readBytes(data, 0, 4)
IF signature != "Attc":
RETURN error("Invalid attachment file")
offset = 4
// Parse header attributes
WHILE offset < length(data):
line = readLine(data, offset)
IF line == "" OR line == NULL:
// End of headers
offset = offset + 2 // Skip blank line
BREAK
IF line CONTAINS ":":
key, value = splitOnFirst(line, ":")
key = trim(key)
value = trim(value)
SWITCH lowercase(key):
CASE "content-type":
result.contentType = value
CASE "name":
result.filename = value
CASE "filename":
result.filename = value
CASE "content-disposition":
result.disposition = value
CASE "content-transfer-encoding":
result.encoding = value
offset = offset + length(line) + 2 // +2 for CRLF
// Read attachment data
attachmentData = data[offset :]
IF result.encoding == "base64":
result.data = base64Decode(attachmentData)
ELSE:
result.data = attachmentData
RETURN result
Merging Associated Files
FUNCTION reconstructEmail(messageFilePath):
// Find all associated files
associated = findAssociatedFiles(messageFilePath)
email = {
headers: {},
body: {
plain: NULL,
html: NULL,
rtf: NULL
},
attachments: []
}
// Parse message header file
IF associated.messageFile != NULL:
headerData = parseMessageFile(associated.messageFile)
email.headers = headerData
// Parse message source (body content)
IF associated.sourceFile != NULL:
bodyData = parseMessageSource(associated.sourceFile)
email.body.plain = bodyData.plainText
email.body.html = bodyData.html
email.body.rtf = bodyData.rtf
// Parse attachments
FOR attachPath IN associated.attachments:
attachData = parseAttachment(attachPath)
email.attachments.APPEND(attachData)
RETURN email
Folder/Mailbox Enumeration
To process an entire mailbox:
FUNCTION enumerateMailbox(profilePath, version):
messages = []
IF version == "OLK15":
messagesDir = profilePath + "/Data/Messages"
// Iterate through numbered subdirectories
FOR subdir IN range(0, 255):
subdirPath = messagesDir + "/" + subdir
IF directoryExists(subdirPath):
FOR file IN listFiles(subdirPath):
IF file ENDS WITH ".olk15Message":
messageInfo = {
path: subdirPath + "/" + file,
id: removeExtension(file)
}
messages.APPEND(messageInfo)
ELSE:
messagesDir = profilePath + "/Data Records/Messages"
FOR file IN listFiles(messagesDir):
IF file ENDS WITH ".olk14message":
messageInfo = {
path: messagesDir + "/" + file,
id: removeExtension(file)
}
messages.APPEND(messageInfo)
RETURN messages
SQLite Database Schema (OLK15)
The Outlook.sqlite database provides an index of all cached items. Key tables include:
Message Table
CREATE TABLE Message (
pk INTEGER PRIMARY KEY,
PathComponent TEXT, -- Filename (GUID.olk15Message)
RecordIdentifier TEXT, -- Internal record ID
FolderPath TEXT, -- Parent folder path
MessageClass TEXT, -- "IPM.Note", etc.
Subject TEXT,
SenderName TEXT,
SenderEmailAddress TEXT,
DateReceived REAL, -- Unix timestamp
DateSent REAL,
HasAttachments INTEGER,
IsRead INTEGER,
IsFlagged INTEGER,
Importance INTEGER,
...
);
MessageSource Table
CREATE TABLE MessageSource (
pk INTEGER PRIMARY KEY,
Message INTEGER, -- Foreign key to Message.pk
PathComponent TEXT, -- Filename (GUID.olk15MsgSource)
...
);
Attachment Table
CREATE TABLE Attachment (
pk INTEGER PRIMARY KEY,
Message INTEGER, -- Foreign key to Message.pk
PathComponent TEXT, -- Filename (GUID_N.olk15MsgAttach)
Filename TEXT,
ContentType TEXT,
FileSize INTEGER,
...
);
Folder Table
CREATE TABLE Folder (
pk INTEGER PRIMARY KEY,
PathComponent TEXT,
DisplayName TEXT,
ParentFolder INTEGER,
FolderType INTEGER,
...
);
Date/Time Handling
Windows FILETIME
Dates are stored as 64-bit Windows FILETIME values: – 100-nanosecond intervals since January 1, 1601 UTC
FUNCTION filetimeToUnixTimestamp(filetime):
// FILETIME epoch: 1601-01-01
// Unix epoch: 1970-01-01
// Difference: 11644473600 seconds
seconds = filetime / 10000000
unixTimestamp = seconds - 11644473600
RETURN unixTimestamp
FUNCTION unixTimestampToFiletime(unixTimestamp):
seconds = unixTimestamp + 11644473600
filetime = seconds * 10000000
RETURN filetime
Implementation Recommendations
Opening a Single File
When a user attempts to open a single OLK file:
- Detect file type from extension
- Extract message identifier from filename
- Locate associated files using the association rules
- Offer to merge header + body + attachments
- Present unified view to user
Opening a Folder Structure
When opening an entire mailbox:
- Locate profile directory based on Outlook version
- Check for SQLite database (OLK15) for efficient indexing
- Enumerate message files in Messages directory
- Build index of messages with metadata
- Lazy-load body content and attachments on demand
Error Handling
- Handle missing associated files gracefully (show partial data)
- Validate file signatures before parsing
- Handle encoding variations (UTF-8, UTF-16-LE, ASCII)
- Gracefully handle corrupted or truncated files
References
- pyolk: Python parser for Outlook OLK binary caches
- UBF8T346G9Parser: Parser for Outlook 2016 Mac storage
- MAPI Property Tags: Microsoft documentation on MAPI property identifiers
- MS-OXMSG: Microsoft Office Outlook Message File Format specification
The solution
The solution to the problem of opening .msg files is to use MailRaider – available either here or on the Mac (or iOS) App Stores.