University of Iowa College of Liberal Arts & Sciences

  Computer Science | Mathematics | Statistics and Actuarial Science | Applied Mathematical and Computational Sciences | Informatics

Mathematical Sciences Computer Support Group

How to Hack MIMEDefang to Filter Backscatter


Overview

The general idea is that a header with an encrypted key is added to all mail going out from the server. Bounce messages are checked for the key. Bounce messages that do not contain the key are identified as backscatter.

This system is not foolproof. First of all, there has to be some way to identify a bounce message. The envelope From address and the Subject: header can be used. But there are many, many different subject headers found in bounced email, in many different languages. Secondly, for this method to work, the bounce message must contain the headers from the original message. But bounce messages from autoresponders, for example, typically do not.

So, this technique will not catch all backscatter. But it can catch most backscatter messages. I don't have any hard statistics, but my gut feel is that my implementation catches between 70% to 90% of all backscatter.

My solution is implemented with MIMEDefang, mainly because I was already using MIMEDefang and I had some experience with it. I made all the hacks in mimedefang-filter. I run SpamAssassin from MIMEDefang, so I decided to mark backscatter in a way that backscatter messages could be filtered separately or just filtered along with other marked spam.

The Details

Global Variables

First of all, I created several global variables at the top of mimedefang-filter to hold data used in the code. (I use the convention of prepending _MDLOCAL_ to any new variables and subroutines I create in mimedefang-filter, so that they're easily found in a text search.)

# Our backscatter spam score
$_MDLOCAL_Backscatter_Score = "06.000";

# Our header to filter backscatter
$_MDLOCAL_Header = "X-some-identifiable-header-name";

# Secret string used for our header hash value
$_MDLOCAL_Secret = "Some unguessable text unique to your shop";

# Envelope From pattern to detect backscatter
$_MDLOCAL_bs_sender_pattern = ".*(administra(d|t)or|m(|ailer-)daemon|mail (delivery (|sub)system)|postmaster).*\@.*";

# Subject pattern to detect backscatter
$_MDLOCAL_bs_subject_pattern = "(automatically rejected mail|.*benachrichtung|delivery (|delay(|ed) |fail(ed|ure) |status )(|noti(ce|fication))|failed mail|failure noti(ce|fication)|.*gb2312.q..d3.ca.bc.fe.cd.b6.b5.dd.b3.ac.ca.b1.b4.ed.ce.f3|.+iso-(2022-jp.+gyrcjwehpcvrjsglase8rexdthsoqg|8859-1.+no(n_remis|tifica(c|t)))|.*mail could not be delivered|.*(mail|message) (|delivery )(delay(|ed)|fail(ed|ure)|rejected)|notifica(ci.+de.+estado.+de.+entrega| sullo stato del recapito|ci.+unicode-1-1-utf-7.+apm)|.*(returned|undeliver(able|ed)) mail|.+unicode-1-1-utf-7.+(ku1p4xk2yuu|tycqenk2yag|uloqxnlayuu)qgnfl|.+utf-8.+(wiadomo.+9b.+87_od.+systemu pocztowego|zmfpbhvyzsbub3rpy2xvvijpljnor6.+lm57lvlnkv6hvvik|zpr.+a1vu_nelze_doru|zur.+bcckgeschickte_mail)|warning: could not send message).*";

I determined a good value of $_MDLOCAL_bs_subject_pattern by looking at a representative sample of backscatter that comes into our server.

New Subroutines

I wrote some new subroutines that I would need:

_MDLOCAL_trusted_relay is a locally-defined subroutine that tests the incoming relay's IP address and hostname to see whether the relay is trusted (our mail server or another server within the larger organization). It's basically just a big if statement, which shouldn't be difficult to implement.

_MDLOCAL_is_outbound is a subroutine that uses the relay's IP address and hostname and the sendmail macros to determine whether the user has authenticated or whether the message is from a locally managed machine.

_MDLOCAL_backscatter_hash unangles the sender address, appends the secret string, and returns a Base 64 MD5 hash of the resulting string:

sub _MDLOCAL_backscatter_hash {
  my($sender_addr_str) = @_;
  use Digest::MD5 qw(md5_base64);
  my($sender_addr) = '';

  ($sender_addr)
    = ($sender_addr_str =~ /^.*([A-Za-z0-9_\-\.]+\@[A-Za-z0-9_\-\.]+).*$/);
  return md5_base64($sender_addr . $_MDLOCAL_Secret);
}

_MDLOCAL_is_backscatter checks the body of the message for my special header and, if found, uses the sender address from the original message to determine whether the header is genuine:

sub _MDLOCAL_is_backscatter {
  my($next_is_from, $next_is_bstag) = 0;
  my($line, $original_from, $original_hash, $test_hash) = '';

  # Do the Sender and Subject headers look like a bounce?
  if (   (   $Sender
          && ($Sender ne "<>")
          && (! ($Sender =~ /^$_MDLOCAL_bs_sender_pattern$/i)))
      || (! ($Subject =~ /^$_MDLOCAL_bs_subject_pattern$/i))) {
    # The message does not look like a bounce
    return 0;
  }
  # If we can't open the message for some reason, do no harm
  if (! (open(MSG, "<./INPUTMSG"))) {
    return 0;
  }
  # Scan the message for our custom header and for the sender
  # Note that MIMEDefang parses the message such that each header is on two
  # lines
  while (($line = <MSG>) && (! $original_hash)) {
    if ($line =~ /^From:/) {
      $next_is_from = 1;
    }
    elsif ($line =~ /^$_MDLOCAL_Header:/) {
      $next_is_bstag = 1;
    }
    elsif ($next_is_from) {
      $original_from = $line;
      $next_is_from = 0;
    }
    elsif ($next_is_bstag) {
      $original_hash = $line;
      $next_is_bstag = 0;
    }
  }
  close(MSG);
  # If we haven't found the custom header, it's backscatter
  if (! $original_hash) {
    return 1;
  }
  # Check the hash
  chop($original_hash);
  chop($original_from);
  $test_hash = _MDLOCAL_backscatter_hash($original_from);
  # If the hashes don't match, it's backscatter
  if ($test_hash ne $original_hash) {
    return 1;
  }
  # Default to "not backscatter"
  return 0;
}

Trusted Mail Pass-Through

I removed code from filter_recipient that returned ACCEPT_AND_NO_MORE_FILTERING for trusted mail--outbound mail from authenticated users or local relays (our mail server or machines we manage) or from trusted relays (mail servers within the larger organization). My method requires adding a special header to all outbound mail and checking the body of incoming mail for the headers from the original message. But the MIMEDefang subroutines filter_relay, filter_helo and filter_recipient are called before the message body has been received, so they cannot be used in my implementation, and furthermore, they must not accept mail and end filtering.

My hack to filter_recipient meant that all email would now pass through the entire MIMEDefang filter, so I had to make sure that trusted mail would not be subjected to MIMEDefang filtering. I added the following if statement to the beginning of the body of the filter_begin subroutine:

    if (   defined($SendmailMacros{'auth_type'})
        || _MDLOCAL_trusted_relay($RelayAddr, $RelayHostname)) {
      #md_syslog('warning', "filter_begin accepting from $Sender, $RelayAddr, $RelayHostname");
      # Note:  Cannot call action_accept from inside filter_begin
      return;
    }

And I added this if statement to the beginning of filter and filter_multipart:

    if (   defined($SendmailMacros{'auth_type'})
        || _MDLOCAL_trusted_relay($RelayAddr, $RelayHostname)) {
      #md_syslog('warning', "filter_multipart accepting from $Sender, $RelayAddr, $RelayHostname");
      return action_accept();
    }

Tagging Outgoing Messages

I coded the guts of my implementation inside filter_end. First, after the return if message_rejected(); statement, I inserted code to check for outbound messages and add the special header:

    if (_MDLOCAL_is_outbound($RelayAddr, $RelayHostname, \%SendmailMacros)) {
      my($header_hash) = _MDLOCAL_backscatter_hash($Sender);
      md_syslog('warning', "Adding header $_MDLOCAL_Header: $Sender hashed to $header_hash, $MessageID");
      action_delete_all_headers($_MDLOCAL_Header);
      action_add_header($_MDLOCAL_Header, $header_hash);
      # Note:  Cannot call action_accept from inside filter_end
      return;
    }

Trusted Mail Pass-Through, Again

Next, immediately after the code that adds my special tag, I inserted an if statement to end filtering if the message is trusted:

    if (   defined($SendmailMacros{'auth_type'})
        || _MDLOCAL_trusted_relay($RelayAddr, $RelayHostname)) {
      #md_syslog('warning', "filter_end accepting from $Sender, $RelayAddr, $RelayHostname, $MessageID");
      # Note:  Cannot call action_accept from inside filter_end 
      return;
    }

Checking Headers

Finally, immediately following the trusted mail pass-through, I added another if statement to mark backscatter:

    if (_MDLOCAL_is_backscatter()) {
      md_syslog('warning', "Backscatter:  $Sender, $Subject");
      action_delete_all_headers("Subject");
      action_add_header("Subject",
                          $_MDLOCAL_spam_flag
                        . $_MDLOCAL_Backscatter_Score
                        . ' (backscatter) '
                        . $Subject);
      md_graphdefang_log('backscatter', "6.00", $RelayAddr);
      md_graphdefang_log('mail_in');
      return;
    }

That's the hack. It seems to work reasonably well, but it's not foolproof.