{"id":961,"date":"2025-12-19T20:55:44","date_gmt":"2025-12-19T20:55:44","guid":{"rendered":"https:\/\/www.45rpmsoftware.com\/blog\/?p=961"},"modified":"2025-12-19T20:55:44","modified_gmt":"2025-12-19T20:55:44","slug":"what-has-microsoft-been-smoking","status":"publish","type":"post","link":"https:\/\/www.45rpmsoftware.com\/blog\/?p=961","title":{"rendered":"What has Microsoft been smoking?"},"content":{"rendered":"\n<p>The .msg format, used by Microsoft Outlook for Windows, is an odd duck.  It feels at times almost as if Microsoft is trying to lock users into the Windows ecosystem by ensuring that the files they archive can only be used on the Windows operating system.  But the .msg file format seems almost sane, at least when compared to the .olk format as used by Microsoft Outlook for Macintosh &#8211; that really is a howling at the moon crazy format. <\/p>\n\n\n\n<!--more-->\n\n\n\n<p>So\u2026 Let&#8217;s examine .olk and see if we can work out what makes it tick.<\/p>\n\n\n\n<p>OLK (Outlook for Mac Local Cache) is a proprietary binary file format used by Microsoft Outlook for Mac to store email messages, calendar events, contacts, and other mailbox items. Unlike Windows Outlook which uses monolithic PST\/OST files, Outlook for Mac fragments each email into multiple component files stored across a directory structure.<\/p>\n\n\n\n<p><strong>Characteristics<\/strong><\/p>\n\n\n\n<ul><li><strong>Fragmented Storage<\/strong>: A single email message is split across multiple files (header, body, attachments)<\/li><li><strong>Database-Backed<\/strong>: Files work in conjunction with a SQLite database (<code>Outlook.sqlite<\/code>) that serves as an index<\/li><li><strong>Profile-Based<\/strong>: Data is stored per-profile in identity folders<\/li><li><strong>Version-Specific<\/strong>: OLK14 (Outlook 2011) and OLK15 (Outlook 2016+) have different structures<\/li><li><strong>Binary Format<\/strong>: Files use little-endian binary encoding with tagged records<\/li><li><strong>Unicode Support<\/strong>: String data is typically stored as UTF-16-LE<\/li><\/ul>\n\n\n\n<h1>Format Versions<\/h1>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Version<\/strong><\/td><td><strong>Outlook Version<\/strong><\/td><td><strong>Notes<\/strong><\/td><\/tr><tr><td>OLK14<\/td><td>Outlook 2011 for Mac<\/td><td>Original Mac format<\/td><\/tr><tr><td>OLK15<\/td><td>Outlook 2016, 2019, Office 365 for Mac<\/td><td>Enhanced structure, SQLite integration<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h1>Directory Structure<\/h1>\n\n\n\n<p><strong>OLK14 (Outlook 2011)<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>~\/Documents\/Microsoft User Data\/Office 2011 Identities\/Main Identity\/\n\u251c\u2500\u2500 Data Records\/\n\u2502   \u251c\u2500\u2500 Messages\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk14message\n\u2502   \u251c\u2500\u2500 Message Sources\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk14msgsource\n\u2502   \u251c\u2500\u2500 Message Attachments\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk14msgattach\n\u2502   \u251c\u2500\u2500 Contacts\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk14contact\n\u2502   \u251c\u2500\u2500 Events\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk14event\n\u2502   \u251c\u2500\u2500 Tasks\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk14task\n\u2502   \u251c\u2500\u2500 Notes\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk14note\n\u2502   \u2514\u2500\u2500 Categories\/\n\u2502       \u2514\u2500\u2500 *.olk14category\n\u251c\u2500\u2500 Folders\/\n\u2502   \u2514\u2500\u2500 *.olk14folder\n\u251c\u2500\u2500 Signatures\/\n\u2502   \u2514\u2500\u2500 *.olk14signature\n\u251c\u2500\u2500 Mail Accounts\/\n\u2502   \u2514\u2500\u2500 *.olk14mailaccount\n\u251c\u2500\u2500 Searches\/\n\u2502   \u2514\u2500\u2500 *.olk14search\n\u2514\u2500\u2500 Preferences\/\n    \u2514\u2500\u2500 *.olk14pref<\/code><\/pre>\n\n\n\n<p><strong>OLK15 (Outlook 2016+)<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>~\/Library\/Group Containers\/UBF8T346G9.Office\/Outlook\/Outlook 15 Profiles\/Main Profile\/\n\u251c\u2500\u2500 Data\/\n\u2502   \u251c\u2500\u2500 Messages\/\n\u2502   \u2502   \u2514\u2500\u2500 &#91;0-255]\/\n\u2502   \u2502       \u2514\u2500\u2500 *.olk15Message\n\u2502   \u251c\u2500\u2500 Message Sources\/\n\u2502   \u2502   \u2514\u2500\u2500 &#91;0-255]\/\n\u2502   \u2502       \u2514\u2500\u2500 *.olk15MsgSource\n\u2502   \u251c\u2500\u2500 Message Attachments\/\n\u2502   \u2502   \u2514\u2500\u2500 &#91;0-255]\/\n\u2502   \u2502       \u2514\u2500\u2500 *.olk15MsgAttach\n\u2502   \u251c\u2500\u2500 Contacts\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk15Contact\n\u2502   \u251c\u2500\u2500 Events\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk15Event\n\u2502   \u251c\u2500\u2500 Tasks\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk15Task\n\u2502   \u251c\u2500\u2500 Notes\/\n\u2502   \u2502   \u2514\u2500\u2500 *.olk15Note\n\u2502   \u2514\u2500\u2500 Categories\/\n\u2502       \u2514\u2500\u2500 *.olk15Category\n\u251c\u2500\u2500 Outlook.sqlite                 (Primary database\/index)\n\u2514\u2500\u2500 Outlook.sqlite-wal             (Write-ahead log)<\/code><\/pre>\n\n\n\n<p><strong>Note<\/strong>: OLK15 uses numbered subdirectories (0-255) to distribute files and prevent filesystem performance issues with large mailboxes.<\/p>\n\n\n\n<h1>File Types Summary<\/h1>\n\n\n\n<p><strong>Email Related Files<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Extension<\/th><th>Content<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td><code>.olk14message<\/code><br><code>.olk15Message<\/code><\/td><td>Header Metadata<\/td><td>To, From, Subject, Date, Message-ID, recipients. Does NOT contain body text.<\/td><\/tr><tr><td><code>.olk14msgsource<\/code>\u00a0<br><code>.olk15MsgSource<\/code><\/td><td>Body Content<\/td><td>Plain text, HTML, and\/or RTF body content. May contain embedded MIME data.<\/td><\/tr><tr><td><code>.olk14msgattach<\/code><br><code>.olk15MsgAttach<\/code><\/td><td>Attachment Data<\/td><td>Individual attachment files with metadata (filename, content-type, encoding).<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p id=\"toc_9\"><strong>Other Data Types<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Extension<\/th><th>Content<\/th><\/tr><\/thead><tbody><tr><td><code>.olk14contact<\/code>\u00a0<br><code>.olk15Contact<\/code><\/td><td>Contact records (name, email, phone, address, organization)<\/td><\/tr><tr><td><code>.olk14event<\/code>\u00a0<br><code>.olk15Event<\/code><\/td><td>Calendar events (time, date, invitees, location, recurrence)<\/td><\/tr><tr><td><code>.olk14task<\/code>\u00a0<br><code>.olk15Task<\/code><\/td><td>Task\/to-do items<\/td><\/tr><tr><td><code>.olk14note<\/code>\u00a0<br><code>.olk15Note<\/code><\/td><td>Sticky notes<\/td><\/tr><tr><td><code>.olk14folder<\/code>\u00a0<br><code>.olk15Folder<\/code><\/td><td>Folder structure definitions<\/td><\/tr><tr><td><code>.olk14category<\/code>\u00a0<br><code>.olk15Category<\/code><\/td><td>Category\/label definitions<\/td><\/tr><tr><td><code>.olk14signature<\/code>\u00a0<br><code>.olk15Signature<\/code><\/td><td>Email signatures (HTML format)<\/td><\/tr><tr><td><code>.olk14mailaccount<\/code>\u00a0<br><code>.olk15MailAccount<\/code><\/td><td>Account configuration and credentials<\/td><\/tr><tr><td><code>.olk14search<\/code>\u00a0<br><code>.olk15Search<\/code><\/td><td>Saved search queries<\/td><\/tr><tr><td><code>.olk14pref<\/code>\u00a0<br><code>.olk15Pref<\/code><\/td><td>Application preferences<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h1>Binary File Structure<\/h1>\n\n\n\n<p>All OLK binary files follow a similar tagged-record pattern. Data is stored in little-endian byte order.<\/p>\n\n\n\n<p id=\"toc_11\"><strong>General Binary Layout<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>+------------------+\n| File Header      |  (Variable size, version-dependent)\n+------------------+\n| Record 1         |\n+------------------+\n| Record 2         |\n+------------------+\n| ...              |\n+------------------+\n| Record N         |\n+------------------+\n| (Optional) Body  |  (For MsgSource files: raw HTML\/text content)\n+------------------+<\/code><\/pre>\n\n\n\n<p id=\"toc_12\"><strong>Record Structure<\/strong><\/p>\n\n\n\n<p>Each record follows a Tag-Length-Value (TLV) pattern:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>+----------------+----------------+----------------+----------------+\n| Tag (4 bytes)  | Type (2 bytes) | Length (var)   | Value (var)    |\n+----------------+----------------+----------------+----------------+<\/code><\/pre>\n\n\n\n<p><strong>Tag<\/strong>: Identifies the field (e.g., Subject, From, Date)&nbsp;<strong>Type<\/strong>: Indicates the data type of the value&nbsp;<strong>Length<\/strong>: Size of the value in bytes (encoding depends on type)&nbsp;<strong>Value<\/strong>: The actual data<\/p>\n\n\n\n<p id=\"toc_13\"><strong>Property Types<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Type ID<\/th><th>Name<\/th><th>Description<\/th><th>Value Format<\/th><\/tr><\/thead><tbody><tr><td>0x0001<\/td><td>Null<\/td><td>No value<\/td><td>(none)<\/td><\/tr><tr><td>0x0002<\/td><td>Int16<\/td><td>16-bit signed integer<\/td><td>2 bytes<\/td><\/tr><tr><td>0x0003<\/td><td>Int32<\/td><td>32-bit signed integer<\/td><td>4 bytes<\/td><\/tr><tr><td>0x0004<\/td><td>Float<\/td><td>32-bit float<\/td><td>4 bytes<\/td><\/tr><tr><td>0x0005<\/td><td>Double<\/td><td>64-bit double<\/td><td>8 bytes<\/td><\/tr><tr><td>0x0006<\/td><td>Currency<\/td><td>Currency value<\/td><td>8 bytes<\/td><\/tr><tr><td>0x0007<\/td><td>AppTime<\/td><td>Application time<\/td><td>8 bytes<\/td><\/tr><tr><td>0x000A<\/td><td>Error<\/td><td>Error code<\/td><td>4 bytes<\/td><\/tr><tr><td>0x000B<\/td><td>Boolean<\/td><td>Boolean value<\/td><td>2 bytes (0=false, 1=true)<\/td><\/tr><tr><td>0x0014<\/td><td>Int64<\/td><td>64-bit signed integer<\/td><td>8 bytes<\/td><\/tr><tr><td>0x001E<\/td><td>String8<\/td><td>ASCII string<\/td><td>Length-prefixed bytes<\/td><\/tr><tr><td>0x001F<\/td><td>Unicode<\/td><td>UTF-16-LE string<\/td><td>Length-prefixed bytes<\/td><\/tr><tr><td>0x0040<\/td><td>SysTime<\/td><td>Windows FILETIME<\/td><td>8 bytes<\/td><\/tr><tr><td>0x0048<\/td><td>GUID<\/td><td>128-bit GUID<\/td><td>16 bytes<\/td><\/tr><tr><td>0x0102<\/td><td>Binary<\/td><td>Binary blob<\/td><td>Length-prefixed bytes<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p id=\"toc_14\"><strong>String Encoding<\/strong><\/p>\n\n\n\n<p>Strings in OLK15 files are typically: &#8211; Stored as UTF-16-LE (Little Endian) &#8211; Prefixed with a 4-byte length field (character count, not byte count) &#8211; May include null terminator<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>+----------------+--------------------------------+\n| Length (4 B)   | UTF-16-LE String Data          |\n+----------------+--------------------------------+<\/code><\/pre>\n\n\n\n<h1 id=\"toc_15\">Field Tags (Property IDs)<\/h1>\n\n\n\n<p>These tags are based on MAPI property IDs and identify specific fields within records.<\/p>\n\n\n\n<p id=\"toc_16\"><strong>Message Header Fields (olk14message \/ olk15Message)<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Tag (Hex)<\/th><th>Field Name<\/th><th>Data Type<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td>0x0017<\/td><td>Importance<\/td><td>Int32<\/td><td>0=Low, 1=Normal, 2=High<\/td><\/tr><tr><td>0x001A<\/td><td>MessageClass<\/td><td>Unicode<\/td><td>Message type (e.g., &#8220;IPM.Note&#8221;)<\/td><\/tr><tr><td>0x0026<\/td><td>Priority<\/td><td>Int32<\/td><td>X-Priority value<\/td><\/tr><tr><td>0x0036<\/td><td>Sensitivity<\/td><td>Int32<\/td><td>0=Normal, 1=Personal, 2=Private, 3=Confidential<\/td><\/tr><tr><td>0x0037<\/td><td>Subject<\/td><td>Unicode<\/td><td>Email subject line<\/td><\/tr><tr><td>0x0039<\/td><td>SentTime<\/td><td>SysTime<\/td><td>When message was sent<\/td><\/tr><tr><td>0x0042<\/td><td>SentRepresentingName<\/td><td>Unicode<\/td><td>Display name of sender delegate<\/td><\/tr><tr><td>0x0065<\/td><td>SentRepresentingEmail<\/td><td>Unicode<\/td><td>Email of sender delegate<\/td><\/tr><tr><td>0x0070<\/td><td>ConversationTopic<\/td><td>Unicode<\/td><td>Conversation\/thread topic<\/td><\/tr><tr><td>0x0071<\/td><td>ConversationIndex<\/td><td>Binary<\/td><td>Thread tracking blob<\/td><\/tr><tr><td>0x0C1A<\/td><td>SenderName<\/td><td>Unicode<\/td><td>Display name of sender<\/td><\/tr><tr><td>0x0C1F<\/td><td>SenderEmailAddress<\/td><td>Unicode<\/td><td>Email address of sender<\/td><\/tr><tr><td>0x0E02<\/td><td>DisplayBcc<\/td><td>Unicode<\/td><td>BCC recipients (display string)<\/td><\/tr><tr><td>0x0E03<\/td><td>DisplayCc<\/td><td>Unicode<\/td><td>CC recipients (display string)<\/td><\/tr><tr><td>0x0E04<\/td><td>DisplayTo<\/td><td>Unicode<\/td><td>To recipients (display string)<\/td><\/tr><tr><td>0x0E06<\/td><td>DeliveryTime<\/td><td>SysTime<\/td><td>When message was delivered<\/td><\/tr><tr><td>0x0E07<\/td><td>MessageFlags<\/td><td>Int32<\/td><td>Bitfield of message status flags<\/td><\/tr><tr><td>0x0E08<\/td><td>MessageSize<\/td><td>Int32<\/td><td>Total message size in bytes<\/td><\/tr><tr><td>0x0E17<\/td><td>MessageStatus<\/td><td>Int32<\/td><td>Status flags<\/td><\/tr><tr><td>0x0E1D<\/td><td>NormalizedSubject<\/td><td>Unicode<\/td><td>Subject without RE:\/FW: prefixes<\/td><\/tr><tr><td>0x1000<\/td><td>Body<\/td><td>Unicode<\/td><td>Plain text body (may be in MsgSource)<\/td><\/tr><tr><td>0x1009<\/td><td>RtfCompressed<\/td><td>Binary<\/td><td>Compressed RTF body<\/td><\/tr><tr><td>0x1013<\/td><td>BodyHtml<\/td><td>Unicode<\/td><td>HTML body content<\/td><\/tr><tr><td>0x1035<\/td><td>InternetMessageId<\/td><td>String8<\/td><td>RFC Message-ID header<\/td><\/tr><tr><td>0x1042<\/td><td>InReplyTo<\/td><td>String8<\/td><td>In-Reply-To header value<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p id=\"toc_17\"><strong>Recipient Fields<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Tag (Hex)<\/th><th>Field Name<\/th><th>Data Type<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td>0x0C15<\/td><td>RecipientType<\/td><td>Int32<\/td><td>1=To, 2=CC, 3=BCC<\/td><\/tr><tr><td>0x3001<\/td><td>DisplayName<\/td><td>Unicode<\/td><td>Recipient display name<\/td><\/tr><tr><td>0x3003<\/td><td>EmailAddress<\/td><td>Unicode<\/td><td>Recipient email address<\/td><\/tr><tr><td>0x39FE<\/td><td>SmtpAddress<\/td><td>Unicode<\/td><td>SMTP email address<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p id=\"toc_18\"><strong>Attachment Fields (olk14msgattach \/ olk15MsgAttach)<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Tag (Hex)<\/th><th>Field Name<\/th><th>Data Type<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td>0x0E20<\/td><td>AttachSize<\/td><td>Int32<\/td><td>Size of attachment in bytes<\/td><\/tr><tr><td>0x3701<\/td><td>AttachDataBinary<\/td><td>Binary<\/td><td>Raw attachment data<\/td><\/tr><tr><td>0x3703<\/td><td>AttachExtension<\/td><td>Unicode<\/td><td>File extension<\/td><\/tr><tr><td>0x3704<\/td><td>AttachFilename<\/td><td>Unicode<\/td><td>Short filename<\/td><\/tr><tr><td>0x3705<\/td><td>AttachMethod<\/td><td>Int32<\/td><td>How attachment is stored<\/td><\/tr><tr><td>0x3707<\/td><td>AttachLongFilename<\/td><td>Unicode<\/td><td>Full filename<\/td><\/tr><tr><td>0x370E<\/td><td>AttachMimeTag<\/td><td>Unicode<\/td><td>MIME content-type<\/td><\/tr><tr><td>0x3712<\/td><td>AttachContentId<\/td><td>Unicode<\/td><td>Content-ID for inline attachments<\/td><\/tr><tr><td>0x3716<\/td><td>AttachContentDisposition<\/td><td>Unicode<\/td><td>&#8220;inline&#8221; or &#8220;attachment&#8221;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h1 id=\"toc_19\">Message Source File Structure (olk14msgsource \/ olk15MsgSource)<\/h1>\n\n\n\n<p>The MsgSource file contains the actual message body content. It may be structured in several ways:<\/p>\n\n\n\n<p id=\"toc_20\"><strong>Format 1: Raw MIME\/RFC822<\/strong><\/p>\n\n\n\n<p>The file may contain a complete RFC822 message with MIME structure:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>From: sender@example.com\nTo: recipient@example.com\nSubject: Test\nContent-Type: multipart\/alternative; boundary=\"----=_Part_123\"\n\n------=_Part_123\nContent-Type: text\/plain; charset=\"UTF-8\"\n\nPlain text body here.\n\n------=_Part_123\nContent-Type: text\/html; charset=\"UTF-8\"\n\n&lt;html&gt;&lt;body&gt;HTML body here.&lt;\/body&gt;&lt;\/html&gt;\n\n------=_Part_123--<\/code><\/pre>\n\n\n\n<p id=\"toc_21\"><strong>Format 2: Binary Header + UTF-16 Content<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>+------------------+\n| Binary Header    |  (40+ bytes of metadata)\n+------------------+\n| UTF-16-LE HTML   |  (Starting with \"&lt;html\" encoded as UTF-16-LE)\n+------------------+<\/code><\/pre>\n\n\n\n<p>To find the HTML content, search for&nbsp;<code>&lt;html<\/code>&nbsp;encoded as UTF-16-LE: &#8211; Byte sequence:&nbsp;<code>3C 00 68 00 74 00 6D 00 6C 00<\/code>&nbsp;= &#8220;&lt;html&#8221;<\/p>\n\n\n\n<p id=\"toc_22\"><strong>Format 3: Tagged Records + Body<\/strong><\/p>\n\n\n\n<p>Similar to message files, with tagged records followed by body content.<\/p>\n\n\n\n<h1 id=\"toc_23\">Attachment File Structure (olk14msgattach \/ olk15MsgAttach)<\/h1>\n\n\n\n<p id=\"toc_24\"><strong>Header Signature<\/strong><\/p>\n\n\n\n<p>Attachment files begin with a 4-byte signature:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Signature: \"Attc\" (0x41 0x74 0x74 0x63)<\/code><\/pre>\n\n\n\n<p id=\"toc_25\"><strong>Attribute Structure<\/strong><\/p>\n\n\n\n<p>Following the signature, the file contains MIME-like attributes:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Attribute<\/th><th>Description<\/th><\/tr><\/thead><tbody><tr><td>Content-Type<\/td><td>MIME type of the attachment<\/td><\/tr><tr><td>Name<\/td><td>Original filename<\/td><\/tr><tr><td>Content-Disposition<\/td><td>&#8220;inline&#8221; or &#8220;attachment&#8221;<\/td><\/tr><tr><td>Filename<\/td><td>Same as Name<\/td><\/tr><tr><td>Content-Transfer-Encoding<\/td><td>Usually &#8220;base64&#8221;<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p id=\"toc_26\"><strong>Data Section<\/strong><\/p>\n\n\n\n<p>After the header attributes, the raw attachment data follows. If Content-Transfer-Encoding is &#8220;base64&#8221;, the data is Base64-encoded.<\/p>\n\n\n\n<h1 id=\"toc_27\">File Association Rules<\/h1>\n\n\n\n<p>To reconstruct a complete email from OLK files, you must locate and merge associated files.<\/p>\n\n\n\n<h2 id=\"toc_28\">Association Methods<\/h2>\n\n\n\n<h4 id=\"toc_29\">Method 1: Filename-Based (OLK14)<\/h4>\n\n\n\n<p>In OLK14, files for the same message share a common identifier in their filename:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Messages\/x00_270429.olk14Message\nMessage Sources\/x00_270429.olk14MsgSource\nMessage Attachments\/x00_270429_1.olk14MsgAttach\nMessage Attachments\/x00_270429_2.olk14MsgAttach<\/code><\/pre>\n\n\n\n<p><strong>Pattern<\/strong>:&nbsp;<code>{prefix}_{messageId}.{extension}<\/code>&nbsp;<strong>Attachment Pattern<\/strong>:&nbsp;<code>{prefix}_{messageId}_{attachmentIndex}.{extension}<\/code><\/p>\n\n\n\n<h4 id=\"toc_30\">Method 2: Record ID \/ GUID (OLK15)<\/h4>\n\n\n\n<p>OLK15 files use GUIDs as filenames:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Messages\/11\/0B2132B7-999F-4114-AC6C-E93DE72CEF9A.olk15Message\nMessage Sources\/11\/0B2132B7-999F-4114-AC6C-E93DE72CEF9A.olk15MsgSource\nMessage Attachments\/11\/0B2132B7-999F-4114-AC6C-E93DE72CEF9A_1.olk15MsgAttach<\/code><\/pre>\n\n\n\n<p><strong>Pattern<\/strong>:&nbsp;<code>{GUID}.{extension}<\/code>&nbsp;<strong>Attachment Pattern<\/strong>:&nbsp;<code>{GUID}_{attachmentIndex}.{extension}<\/code><\/p>\n\n\n\n<h4 id=\"toc_31\">Method 3: SQLite Database Lookup (OLK15 Preferred)<\/h4>\n\n\n\n<p>The most reliable method for OLK15 is to query the&nbsp;<code>Outlook.sqlite<\/code>&nbsp;database:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-- Find message source for a message\nSELECT MessageSource.PathComponent \nFROM Message\nJOIN MessageSource ON Message.pk = MessageSource.Message\nWHERE Message.PathComponent = 'GUID.olk15Message';\n\n-- Find attachments for a message\nSELECT Attachment.PathComponent\nFROM Message\nJOIN Attachment ON Message.pk = Attachment.Message\nWHERE Message.PathComponent = 'GUID.olk15Message';<\/code><\/pre>\n\n\n\n<h2 id=\"toc_32\">Association Pseudocode<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>FUNCTION findAssociatedFiles(filePath):\n    baseName = getBaseName(filePath)         \/\/ Remove extension\n    extension = getExtension(filePath)\n    directory = getParentDirectory(filePath)\n    rootDirectory = getDataRootDirectory(directory)\n    \n    \/\/ Determine version from extension\n    IF extension CONTAINS \"olk14\":\n        version = \"OLK14\"\n    ELSE IF extension CONTAINS \"olk15\":\n        version = \"OLK15\"\n    ELSE:\n        RETURN error(\"Unknown OLK version\")\n    \n    \/\/ Extract message identifier\n    IF version == \"OLK14\":\n        \/\/ Pattern: prefix_messageId or prefix_messageId_attachIndex\n        parts = split(baseName, \"_\")\n        messageId = parts&#91;0] + \"_\" + parts&#91;1]\n    ELSE:\n        \/\/ Pattern: GUID or GUID_attachIndex\n        IF baseName CONTAINS \"_\" AND isNumber(lastPart(baseName)):\n            messageId = baseName UP TO last \"_\"\n        ELSE:\n            messageId = baseName\n    \n    result = {\n        messageFile: NULL,\n        sourceFile: NULL,\n        attachments: &#91;]\n    }\n    \n    \/\/ Locate message file\n    messageDir = rootDirectory + \"\/Messages\"\n    IF version == \"OLK15\":\n        \/\/ Check subdirectories\n        FOR subdir IN range(0, 255):\n            candidate = messageDir + \"\/\" + subdir + \"\/\" + messageId + \".olk15Message\"\n            IF fileExists(candidate):\n                result.messageFile = candidate\n                BREAK\n    ELSE:\n        candidate = messageDir + \"\/\" + messageId + \".olk14message\"\n        IF fileExists(candidate):\n            result.messageFile = candidate\n    \n    \/\/ Locate message source file\n    sourceDir = rootDirectory + \"\/Message Sources\"\n    IF version == \"OLK15\":\n        FOR subdir IN range(0, 255):\n            candidate = sourceDir + \"\/\" + subdir + \"\/\" + messageId + \".olk15MsgSource\"\n            IF fileExists(candidate):\n                result.sourceFile = candidate\n                BREAK\n    ELSE:\n        candidate = sourceDir + \"\/\" + messageId + \".olk14msgsource\"\n        IF fileExists(candidate):\n            result.sourceFile = candidate\n    \n    \/\/ Locate attachment files\n    attachDir = rootDirectory + \"\/Message Attachments\"\n    attachIndex = 1\n    WHILE TRUE:\n        IF version == \"OLK15\":\n            found = FALSE\n            FOR subdir IN range(0, 255):\n                candidate = attachDir + \"\/\" + subdir + \"\/\" + \n                           messageId + \"_\" + attachIndex + \".olk15MsgAttach\"\n                IF fileExists(candidate):\n                    result.attachments.APPEND(candidate)\n                    found = TRUE\n                    BREAK\n            IF NOT found:\n                BREAK\n        ELSE:\n            candidate = attachDir + \"\/\" + messageId + \"_\" + attachIndex + \".olk14msgattach\"\n            IF fileExists(candidate):\n                result.attachments.APPEND(candidate)\n            ELSE:\n                BREAK\n        attachIndex = attachIndex + 1\n    \n    RETURN result<\/code><\/pre>\n\n\n\n<h1 id=\"toc_33\">Parsing Pseudocode<\/h1>\n\n\n\n<h2 id=\"toc_34\">Reading a Message File<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>FUNCTION parseMessageFile(filePath):\n    data = readBinaryFile(filePath)\n    result = {}\n    offset = 0\n    \n    \/\/ Skip file header (size varies by version)\n    version = detectVersion(data)\n    IF version == \"OLK15\":\n        offset = 128    \/\/ Larger header\n    ELSE:\n        offset = 64     \/\/ Standard header\n    \n    \/\/ Parse tagged records\n    WHILE offset &lt; length(data) - 8:\n        tag = readUInt32LE(data, offset)\n        offset = offset + 4\n        \n        type = readUInt16LE(data, offset)\n        offset = offset + 2\n        \n        \/\/ Read value based on type\n        value, bytesRead = readPropertyValue(data, offset, type)\n        offset = offset + bytesRead\n        \n        \/\/ Map tag to field name\n        fieldName = tagToFieldName(tag)\n        IF fieldName != NULL:\n            result&#91;fieldName] = value\n    \n    RETURN result\n\nFUNCTION readPropertyValue(data, offset, type):\n    SWITCH type:\n        CASE 0x0002:    \/\/ Int16\n            RETURN readInt16LE(data, offset), 2\n        \n        CASE 0x0003:    \/\/ Int32\n            RETURN readInt32LE(data, offset), 4\n        \n        CASE 0x000B:    \/\/ Boolean\n            RETURN readUInt16LE(data, offset) != 0, 2\n        \n        CASE 0x0014:    \/\/ Int64\n            RETURN readInt64LE(data, offset), 8\n        \n        CASE 0x001E:    \/\/ ASCII String\n            length = readUInt32LE(data, offset)\n            stringData = readBytes(data, offset + 4, length)\n            RETURN decodeASCII(stringData), 4 + length\n        \n        CASE 0x001F:    \/\/ Unicode String\n            charCount = readUInt32LE(data, offset)\n            byteCount = charCount * 2\n            stringData = readBytes(data, offset + 4, byteCount)\n            RETURN decodeUTF16LE(stringData), 4 + byteCount\n        \n        CASE 0x0040:    \/\/ SysTime (FILETIME)\n            filetime = readUInt64LE(data, offset)\n            RETURN filetimeToDate(filetime), 8\n        \n        CASE 0x0102:    \/\/ Binary\n            length = readUInt32LE(data, offset)\n            RETURN readBytes(data, offset + 4, length), 4 + length\n        \n        DEFAULT:\n            RETURN NULL, 0<\/code><\/pre>\n\n\n\n<h2 id=\"toc_35\">Reading a Message Source File<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>FUNCTION parseMessageSource(filePath):\n    data = readBinaryFile(filePath)\n    result = {\n        plainText: NULL,\n        html: NULL,\n        rtf: NULL\n    }\n    \n    \/\/ Try to find HTML content (UTF-16-LE encoded)\n    htmlMarker = encodeUTF16LE(\"&lt;html\")\n    htmlStart = findBytes(data, htmlMarker)\n    \n    IF htmlStart != -1:\n        \/\/ Find end of HTML\n        htmlEndMarker = encodeUTF16LE(\"&lt;\/html&gt;\")\n        htmlEnd = findBytes(data, htmlEndMarker, htmlStart)\n        \n        IF htmlEnd != -1:\n            htmlEnd = htmlEnd + length(htmlEndMarker)\n            htmlData = data&#91;htmlStart : htmlEnd]\n            result.html = decodeUTF16LE(htmlData)\n        ELSE:\n            \/\/ Read to end of file\n            htmlData = data&#91;htmlStart :]\n            result.html = decodeUTF16LE(htmlData)\n    ELSE:\n        \/\/ Try plain text extraction\n        \/\/ May be RFC822 format or plain UTF-8\/ASCII\n        textContent = tryDecodeAsText(data)\n        IF textContent != NULL:\n            IF looksLikeRFC822(textContent):\n                result = parseMIMEMessage(textContent)\n            ELSE:\n                result.plainText = textContent\n    \n    RETURN result<\/code><\/pre>\n\n\n\n<h2 id=\"toc_36\">Reading an Attachment File<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>FUNCTION parseAttachment(filePath):\n    data = readBinaryFile(filePath)\n    result = {\n        filename: NULL,\n        contentType: NULL,\n        disposition: NULL,\n        data: NULL\n    }\n    \n    \/\/ Check for \"Attc\" signature\n    signature = readBytes(data, 0, 4)\n    IF signature != \"Attc\":\n        RETURN error(\"Invalid attachment file\")\n    \n    offset = 4\n    \n    \/\/ Parse header attributes\n    WHILE offset &lt; length(data):\n        line = readLine(data, offset)\n        IF line == \"\" OR line == NULL:\n            \/\/ End of headers\n            offset = offset + 2    \/\/ Skip blank line\n            BREAK\n        \n        IF line CONTAINS \":\":\n            key, value = splitOnFirst(line, \":\")\n            key = trim(key)\n            value = trim(value)\n            \n            SWITCH lowercase(key):\n                CASE \"content-type\":\n                    result.contentType = value\n                CASE \"name\":\n                    result.filename = value\n                CASE \"filename\":\n                    result.filename = value\n                CASE \"content-disposition\":\n                    result.disposition = value\n                CASE \"content-transfer-encoding\":\n                    result.encoding = value\n        \n        offset = offset + length(line) + 2    \/\/ +2 for CRLF\n    \n    \/\/ Read attachment data\n    attachmentData = data&#91;offset :]\n    \n    IF result.encoding == \"base64\":\n        result.data = base64Decode(attachmentData)\n    ELSE:\n        result.data = attachmentData\n    \n    RETURN result<\/code><\/pre>\n\n\n\n<h1 id=\"toc_37\">Merging Associated Files<\/h1>\n\n\n\n<pre class=\"wp-block-code\"><code>FUNCTION reconstructEmail(messageFilePath):\n    \/\/ Find all associated files\n    associated = findAssociatedFiles(messageFilePath)\n    \n    email = {\n        headers: {},\n        body: {\n            plain: NULL,\n            html: NULL,\n            rtf: NULL\n        },\n        attachments: &#91;]\n    }\n    \n    \/\/ Parse message header file\n    IF associated.messageFile != NULL:\n        headerData = parseMessageFile(associated.messageFile)\n        email.headers = headerData\n    \n    \/\/ Parse message source (body content)\n    IF associated.sourceFile != NULL:\n        bodyData = parseMessageSource(associated.sourceFile)\n        email.body.plain = bodyData.plainText\n        email.body.html = bodyData.html\n        email.body.rtf = bodyData.rtf\n    \n    \/\/ Parse attachments\n    FOR attachPath IN associated.attachments:\n        attachData = parseAttachment(attachPath)\n        email.attachments.APPEND(attachData)\n    \n    RETURN email<\/code><\/pre>\n\n\n\n<h1 id=\"toc_38\">Folder\/Mailbox Enumeration<\/h1>\n\n\n\n<p>To process an entire mailbox:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>FUNCTION enumerateMailbox(profilePath, version):\n    messages = &#91;]\n    \n    IF version == \"OLK15\":\n        messagesDir = profilePath + \"\/Data\/Messages\"\n        \n        \/\/ Iterate through numbered subdirectories\n        FOR subdir IN range(0, 255):\n            subdirPath = messagesDir + \"\/\" + subdir\n            IF directoryExists(subdirPath):\n                FOR file IN listFiles(subdirPath):\n                    IF file ENDS WITH \".olk15Message\":\n                        messageInfo = {\n                            path: subdirPath + \"\/\" + file,\n                            id: removeExtension(file)\n                        }\n                        messages.APPEND(messageInfo)\n    ELSE:\n        messagesDir = profilePath + \"\/Data Records\/Messages\"\n        FOR file IN listFiles(messagesDir):\n            IF file ENDS WITH \".olk14message\":\n                messageInfo = {\n                    path: messagesDir + \"\/\" + file,\n                    id: removeExtension(file)\n                }\n                messages.APPEND(messageInfo)\n    \n    RETURN messages<\/code><\/pre>\n\n\n\n<h1 id=\"toc_39\">SQLite Database Schema (OLK15)<\/h1>\n\n\n\n<p>The&nbsp;<code>Outlook.sqlite<\/code>&nbsp;database provides an index of all cached items. Key tables include:<\/p>\n\n\n\n<p id=\"toc_40\"><strong>Message Table<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE Message (\n    pk INTEGER PRIMARY KEY,\n    PathComponent TEXT,           -- Filename (GUID.olk15Message)\n    RecordIdentifier TEXT,        -- Internal record ID\n    FolderPath TEXT,              -- Parent folder path\n    MessageClass TEXT,            -- \"IPM.Note\", etc.\n    Subject TEXT,\n    SenderName TEXT,\n    SenderEmailAddress TEXT,\n    DateReceived REAL,            -- Unix timestamp\n    DateSent REAL,\n    HasAttachments INTEGER,\n    IsRead INTEGER,\n    IsFlagged INTEGER,\n    Importance INTEGER,\n    ...\n);<\/code><\/pre>\n\n\n\n<p id=\"toc_41\"><strong>MessageSource Table<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE MessageSource (\n    pk INTEGER PRIMARY KEY,\n    Message INTEGER,              -- Foreign key to Message.pk\n    PathComponent TEXT,           -- Filename (GUID.olk15MsgSource)\n    ...\n);<\/code><\/pre>\n\n\n\n<p id=\"toc_42\"><strong>Attachment Table<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE Attachment (\n    pk INTEGER PRIMARY KEY,\n    Message INTEGER,              -- Foreign key to Message.pk\n    PathComponent TEXT,           -- Filename (GUID_N.olk15MsgAttach)\n    Filename TEXT,\n    ContentType TEXT,\n    FileSize INTEGER,\n    ...\n);<\/code><\/pre>\n\n\n\n<p id=\"toc_43\"><strong>Folder Table<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE Folder (\n    pk INTEGER PRIMARY KEY,\n    PathComponent TEXT,\n    DisplayName TEXT,\n    ParentFolder INTEGER,\n    FolderType INTEGER,\n    ...\n);<\/code><\/pre>\n\n\n\n<h1 id=\"toc_44\">Date\/Time Handling<\/h1>\n\n\n\n<p id=\"toc_45\"><strong>Windows FILETIME<\/strong><\/p>\n\n\n\n<p>Dates are stored as 64-bit Windows FILETIME values: &#8211; 100-nanosecond intervals since January 1, 1601 UTC<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>FUNCTION filetimeToUnixTimestamp(filetime):\n    \/\/ FILETIME epoch: 1601-01-01\n    \/\/ Unix epoch: 1970-01-01\n    \/\/ Difference: 11644473600 seconds\n    \n    seconds = filetime \/ 10000000\n    unixTimestamp = seconds - 11644473600\n    RETURN unixTimestamp\n\nFUNCTION unixTimestampToFiletime(unixTimestamp):\n    seconds = unixTimestamp + 11644473600\n    filetime = seconds * 10000000\n    RETURN filetime<\/code><\/pre>\n\n\n\n<h1 id=\"toc_46\">Implementation Recommendations<\/h1>\n\n\n\n<p id=\"toc_47\"><strong>Opening a Single File<\/strong><\/p>\n\n\n\n<p>When a user attempts to open a single OLK file:<\/p>\n\n\n\n<ol><li><strong>Detect file type<\/strong>&nbsp;from extension<\/li><li><strong>Extract message identifier<\/strong>&nbsp;from filename<\/li><li><strong>Locate associated files<\/strong>&nbsp;using the association rules<\/li><li><strong>Offer to merge<\/strong>&nbsp;header + body + attachments<\/li><li><strong>Present unified view<\/strong>&nbsp;to user<\/li><\/ol>\n\n\n\n<p id=\"toc_48\"><strong>Opening a Folder Structure<\/strong><\/p>\n\n\n\n<p>When opening an entire mailbox:<\/p>\n\n\n\n<ol><li><strong>Locate profile directory<\/strong>&nbsp;based on Outlook version<\/li><li><strong>Check for SQLite database<\/strong>&nbsp;(OLK15) for efficient indexing<\/li><li><strong>Enumerate message files<\/strong>&nbsp;in Messages directory<\/li><li><strong>Build index<\/strong>&nbsp;of messages with metadata<\/li><li><strong>Lazy-load<\/strong>&nbsp;body content and attachments on demand<\/li><\/ol>\n\n\n\n<p id=\"toc_49\"><strong>Error Handling<\/strong><\/p>\n\n\n\n<ul><li>Handle missing associated files gracefully (show partial data)<\/li><li>Validate file signatures before parsing<\/li><li>Handle encoding variations (UTF-8, UTF-16-LE, ASCII)<\/li><li>Gracefully handle corrupted or truncated files<\/li><\/ul>\n\n\n\n<h1 id=\"toc_50\">References<\/h1>\n\n\n\n<ul><li><a href=\"https:\/\/github.com\/hshore29\/pyolk\">pyolk<\/a>: Python parser for Outlook OLK binary caches<\/li><li><a href=\"https:\/\/github.com\/glymphie\/UBF8T346G9Parser\">UBF8T346G9Parser<\/a>: Parser for Outlook 2016 Mac storage<\/li><li>MAPI Property Tags: Microsoft documentation on MAPI property identifiers<\/li><li>MS-OXMSG: Microsoft Office Outlook Message File Format specification<\/li><\/ul>\n\n\n\n<h1>The solution<\/h1>\n\n\n\n<p>The solution to the problem of opening .msg files is to use MailRaider &#8211; available either <a href=\"https:\/\/www.45rpmsoftware.com\/Software\/MailRaider\/page.html\">here<\/a> or on the Mac (or iOS) App Stores.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The .msg format, used by Microsoft Outlook for Windows, is an odd duck. It feels at times almost as if Microsoft is trying to lock users into the Windows ecosystem by ensuring that the files they archive can only be used on the Windows operating system. But the .msg file format seems almost sane, at &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.45rpmsoftware.com\/blog\/?p=961\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;What has Microsoft been smoking?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[28,5],"tags":[],"_links":{"self":[{"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/961"}],"collection":[{"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=961"}],"version-history":[{"count":1,"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/961\/revisions"}],"predecessor-version":[{"id":962,"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/961\/revisions\/962"}],"wp:attachment":[{"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=961"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=961"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.45rpmsoftware.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=961"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}