Skip to main content

Crate mail_parser

Crate mail_parser 

Source
Expand description

ยงmail-parser

crates.io build docs.rs crates.io

mail-parser is an e-mail parsing library written in Rust that fully conforms to the Internet Message Format standard (RFC 5322), the Multipurpose Internet Mail Extensions (MIME; RFC 2045 - 2049) as well as many other internet messaging RFCs.

It also supports decoding messages in 41 different character sets including obsolete formats such as UTF-7. All Unicode (UTF-*) and single-byte character sets are handled internally by the library while support for legacy multi-byte encodings of Chinese and Japanese languages such as BIG5 or ISO-2022-JP is provided by the optional dependency encoding_rs.

In general, this library abides by the Postelโ€™s law or Robustness Principle which states that an implementation must be conservative in its sending behavior and liberal in its receiving behavior. This means that mail-parser will make a best effort to parse non-conformant e-mail messages as long as these do not deviate too much from the standard.

Unlike other e-mail parsing libraries that return nested representations of the different MIME parts in a message, this library conforms to RFC 8621, Section 4.1.4 and provides a more human-friendly representation of the message contents consisting of just text body parts, html body parts and attachments. Additionally, conversion to/from HTML and plain text inline body parts is done automatically when the alternative version is missing.

Performance and memory safety were two important factors while designing mail-parser:

  • Zero-copy: Practically all strings returned by this library are Cow<str> references to the input raw message.
  • High performance Base64 decoding based on Chromiumโ€™s decoder (the fastest non-SIMD decoder).
  • Fast parsing of message header fields, character set names and HTML entities using perfect hashing.
  • Written in 100% safe Rust with no external dependencies.
  • Every function in the library has been fuzzed and thoroughly tested with MIRI.
  • Battle-tested with millions of real-world e-mail messages dating from 1995 until today.
  • Used in production environments worldwide by Stalwart Mail Server.

ยงUsage Example

    let input = br#"From: Art Vandelay <art@vandelay.com> (Vandelay Industries)
To: "Colleagues": "James Smythe" <james@vandelay.com>; Friends:
    jane@example.com, =?UTF-8?Q?John_Sm=C3=AEth?= <john@example.com>;
Date: Sat, 20 Nov 2021 14:22:01 -0800
Subject: Why not both importing AND exporting? =?utf-8?b?4pi6?=
Content-Type: multipart/mixed; boundary="festivus";

--festivus
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: base64

PGh0bWw+PHA+SSB3YXMgdGhpbmtpbmcgYWJvdXQgcXVpdHRpbmcgdGhlICZsZHF1bztle
HBvcnRpbmcmcmRxdW87IHRvIGZvY3VzIGp1c3Qgb24gdGhlICZsZHF1bztpbXBvcnRpbm
cmcmRxdW87LDwvcD48cD5idXQgdGhlbiBJIHRob3VnaHQsIHdoeSBub3QgZG8gYm90aD8
gJiN4MjYzQTs8L3A+PC9odG1sPg==
--festivus
Content-Type: message/rfc822

From: "Cosmo Kramer" <kramer@kramerica.com>
Subject: Exporting my book about coffee tables
Content-Type: multipart/mixed; boundary="giddyup";

--giddyup
Content-Type: text/plain; charset="utf-16"
Content-Transfer-Encoding: quoted-printable

=FF=FE=0C!5=D8"=DD5=D8)=DD5=D8-=DD =005=D8*=DD5=D8"=DD =005=D8"=
=DD5=D85=DD5=D8-=DD5=D8,=DD5=D8/=DD5=D81=DD =005=D8*=DD5=D86=DD =
=005=D8=1F=DD5=D8,=DD5=D8,=DD5=D8(=DD =005=D8-=DD5=D8)=DD5=D8"=
=DD5=D8=1E=DD5=D80=DD5=D8"=DD!=00
--giddyup
Content-Type: image/gif; name*1="about "; name*0="Book ";
              name*2*=utf-8''%e2%98%95 tables.gif
Content-Transfer-Encoding: Base64
Content-Disposition: attachment

R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
--giddyup--
--festivus--
"#;

    let message = MessageParser::default().parse(input).unwrap();

    // Parses addresses (including comments), lists and groups
    assert_eq!(
        message.from().unwrap().first().unwrap(),
        &Addr::new(
            "Art Vandelay (Vandelay Industries)".into(),
            "art@vandelay.com"
        )
    );

    assert_eq!(
        message.to().unwrap().as_group().unwrap(),
        &[
            Group::new(
                "Colleagues",
                vec![Addr::new("James Smythe".into(), "james@vandelay.com")]
            ),
            Group::new(
                "Friends",
                vec![
                    Addr::new(None, "jane@example.com"),
                    Addr::new("John Smรฎth".into(), "john@example.com"),
                ]
            )
        ]
    );

    assert_eq!(
        message.date().unwrap().to_rfc3339(),
        "2021-11-20T14:22:01-08:00"
    );

    // RFC2047 support for encoded text in message readers
    assert_eq!(
        message.subject().unwrap(),
        "Why not both importing AND exporting? โ˜บ"
    );

    // HTML and text body parts are returned conforming to RFC8621, Section 4.1.4
    assert_eq!(
        message.body_html(0).unwrap(),
        concat!(
            "<html><p>I was thinking about quitting the &ldquo;exporting&rdquo; to ",
            "focus just on the &ldquo;importing&rdquo;,</p><p>but then I thought,",
            " why not do both? &#x263A;</p></html>"
        )
    );

    // HTML parts are converted to plain text (and viceversa) when missing
    assert_eq!(
        message.body_text(0).unwrap(),
        concat!(
            "I was thinking about quitting the โ€œexportingโ€ to focus just on the",
            " โ€œimportingโ€,\nbut then I thought, why not do both? โ˜บ\n"
        )
    );

    // Supports nested messages as well as multipart/digest
    let nested_message = message
        .attachment(0)
        .unwrap()
        .message();
        .unwrap();

    assert_eq!(
        nested_message.subject().unwrap(),
        "Exporting my book about coffee tables"
    );

    // Handles UTF-* as well as many legacy encodings
    assert_eq!(
        nested_message.body_text(0).unwrap(),
        "โ„Œ๐”ข๐”ฉ๐”ญ ๐”ช๐”ข ๐”ข๐”ต๐”ญ๐”ฌ๐”ฏ๐”ฑ ๐”ช๐”ถ ๐”Ÿ๐”ฌ๐”ฌ๐”จ ๐”ญ๐”ฉ๐”ข๐”ž๐”ฐ๐”ข!"
    );
    assert_eq!(
        nested_message.body_html(0).unwrap(),
        "<html><body>โ„Œ๐”ข๐”ฉ๐”ญ ๐”ช๐”ข ๐”ข๐”ต๐”ญ๐”ฌ๐”ฏ๐”ฑ ๐”ช๐”ถ ๐”Ÿ๐”ฌ๐”ฌ๐”จ ๐”ญ๐”ฉ๐”ข๐”ž๐”ฐ๐”ข!</body></html>"
    );

    let nested_attachment = nested_message.attachment(0).unwrap();

    assert_eq!(nested_attachment.len(), 42);

    // Full RFC2231 support for continuations and character sets
    assert_eq!(
        nested_attachment.attachment_name().unwrap(),
        "Book about โ˜• tables.gif"
    );

    // Integrates with Serde
    println!("{}", serde_json::to_string_pretty(&message).unwrap());

More examples available under the examples directory. Please note that this library does not support building e-mail messages as this functionality is provided separately by the mail-builder crate.

ยงTesting, Fuzzing & Benchmarking

To run the testsuite:

 $ cargo test --all-features

or, to run the testsuite with MIRI:

 $ cargo +nightly miri test --all-features

To fuzz the library with cargo-fuzz:

 $ cargo +nightly fuzz run mail_parser

and, to run the benchmarks:

 $ cargo +nightly bench --all-features

ยงConformed RFCs

ยงSupported Character Sets

  • UTF-8
  • UTF-16, UTF-16BE, UTF-16LE
  • UTF-7
  • US-ASCII
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-3
  • ISO-8859-4
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • ISO-8859-10
  • ISO-8859-13
  • ISO-8859-14
  • ISO-8859-15
  • ISO-8859-16
  • CP1250
  • CP1251
  • CP1252
  • CP1253
  • CP1254
  • CP1255
  • CP1256
  • CP1257
  • CP1258
  • KOI8-R
  • KOI8_U
  • MACINTOSH
  • IBM850
  • TIS-620

Supported character sets via the optional dependency encoding_rs:

  • SHIFT_JIS
  • BIG5
  • EUC-JP
  • EUC-KR
  • GB18030
  • GBK
  • ISO-2022-JP
  • WINDOWS-874
  • IBM-866

ยงLicense

Licensed under either of

  • Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
  • MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Copyright (C) 2020, Stalwart Labs LLC

Modulesยง

core
decoders
mailbox
parsers

Structsยง

Addr
An RFC5322 or RFC2369 internet address.
Attribute
ContentType
An RFC2047 Content-Type or RFC2183 Content-Disposition MIME header field.
DateTime
An RFC5322 datetime.
Group
An RFC5322 address group.
Header
A message header.
Message
An RFC5322/RFC822 message.
MessageParser
RFC5322/RFC822 message parser.
MessagePart
MIME Message Part
Received

Enumsยง

Address
Encoding
MIME Part encoding type
Greeting
HeaderForm
Header form
HeaderName
A header field
HeaderValue
Parsed header value.
Host
PartType
A text, binary or nested e-mail MIME message part.
Protocol
TlsVersion

Traitsยง

GetHeader
MimeHeaders
MIME Header field access trait

Type Aliasesยง

MessagePartId
Unique ID representing a MIME part within a message.