NAME

 SP Filter 0.59 (finally considered beta...)


DESCRIPTION

 1) fetch multiple public available ip-based access-lists into the
    local cache-directory, create diffs and clean out old copies.
    supports either LWP, wget or rsync, *.bz2 transparently handled.
 2) read entries from local cache into memory (hash), convert cidr-
    netmask into octets and dedupe/consolidate in one single pass.
    sort entries and write file in the format of your preferred mta,
    optionally preserve/reimport existing lines (magic_update).


OPTIONS

 ./spfilter.pl -verbose -format=sendmail,postfix,... SOURCE ...
 ./spfilter.pl [ -verbose ] [ -debug ] [-format=format,... ]
        [ -cachedir=./cache ] [ -outdir=(./outdir|outfile|STDOUT) ]
        [ -workdir=workdir] [ -pubdir=./publish ] [ -user=spfilter ]
        [ -xmlconf=./spfilter-local.xml ]
        [ -keyring=(NULL|spfilter-keyring.gpg) ]
        [ -tiehash=(NULL|/tmp/tiehash.gdbm|/tmp/tiehash.db) ]
        [ -email=responsible#contact.dom ]
        [ -zone=localhost,127.0.0.1,43200 ]
        SOURCE [SOURCE2] ...
        only the first character after '-' is relevant, the single
        char arguments from previous versions will still work.
        keep using the short version in scripts, as the handling of
        long arguments (wording) may change any time.
 -c, -cachedir=: directory for cached sources
        defaults './cache', directory must exist
        old cached sources will be purged after successful fetch
 -d, -debug: boolean, for testing and linting, may use multiple
 -e, -email=: string passed in the HTTP_USER_AGENT, max. 48 chars
        default: list sources as specified by -source or @ARGV
 -f, -format=: output format(s), as named in the xml-config
        default: octets (tab-delimited, no quotes)
        for mta's: courier, exim, postfix, rblsmtpd, qmail_uce, sendmail
        for dnsbl: rbldnsd, tinydns, bind and generic 'reverse'
        for benchmarking: queryperf (from bind/contrib)
        multiple formats may be specified separated with comas
        use 'cdb', 'gdbm' or 'db' to compile output into DB_File
        NOTE: contents of -format may be used as part of USER_AGENT
 -h, -help: boolean, display built-in manpage (this document)
 -k, -keyring=: use the named keyring to verify spfilter-config.xml
        default 'spfilter-keyring.gpg', in './' or '/usr/local/etc'
        the Makefile will generate the keyring on 'make keyring'
        specify 'NULL' for keyring to disable gpg-functionality
 -l, -log: log to syslog, named file or email (not implemented)
 -o, -outdir=:  directory or filename with optional path
        default './outdir' if exist, or the current workdir '.'
        specify '-outdir=STDOUT' for use in pipes (with singe format)
 -p, -pubdir=: directory to publish *.bz2 for redistribution
        three subdirectories required: ./input, ./diff and ./output
        example1: http://spfilter.openrbl.org/data/
        example2: http://mirror.openrbl.org/spfilter/
        note: primary sites should run spfilter one time per day,
        in the time from 00:30 to 01:30 UTC (GMT)
 -q, -quiet: decrease verbosity (see also -verbose and -debug)
 -s, -source=: input sources (legacy, just list them on commandline)
        default: import all from set DEFAULT
        set DEFAULT equivalent to: -source=SPEWS,SPAMSITE,PDL
        for relays add DSBL and/or RSL: -source=DEFAULT,RELAYS
        set RELAYS equivalent to: -source=DSBL,DSBL_MULTIHOP,RSL
        please refer to spfilter-config.xml, preset_section
        NOTE: contents of -source may be used as part of USER_AGENT
 -t, -tiehash=: tie working hash to in-memory or file-db
        DEFAULT: none, will trade memory against cpu and disk
        '-tiehash=NULL': use in-memory db, reduce memory usage by 50%
        '-tiehash=/tmp/spfilter-tiehash.$$.gdbm': reduce memory by ~70%
        '-tiehash=/tmp/spfilter-tiehash.$$.db': whatever works better
 -u, -user=: drop root-privilegies for external program ($CFG{user})
        default setuid user 'nobody' if run by root (uid 0)
        WARNING: this setuid is not save, start spfilter with:
                su nobody "perl ./spfilter -format=sendmail"
 -v, -verbose: verbose output
        increase level of verbosity with multiple -vv (see also -d)
 -w, -workdir=: chdir into this directory on startup
        default none (no chdir will be done by spfilter)
        setting simplifies usage from cron and in pipes (with STDOUT)
 -z, -zone=: dnsbl-in-a-box (with -f bind and/or tinydns)
        default -zone=localhost,127.0.0.1,43200 should work everywhere


BUGS

 - timestamps sometimes based on .yymmdd-extensions, and sometimes 
        the file modification-time (for If-Modified-Since).
 use the tracker at http://sourceforge.net/projects/spfilter/
 code still considered alpha, backup your files as always


FILES

 ./spfilter-config.xml: definition of sources and formats
        self-updating copy kept in the directory ./cache
 ./spfilter-pubring.gpg: verify embedded signatures with gpg
 files are internally signed with gpg, verify with 'make verify' 
 files will be searched in '.' and in /usr/local/etc
 files must be owned by root or the user running spfilter
        and cant have any group- or world-writable permissions.
        same for files reused from the cache-directory (-cachedir)


Author/License

 spfilter(at)gmx.net. QPL licence apply


Prerequisites

 Perl5 with LWP::UserAgent (libwww-perl) or wget in $PATH,
        XML::Simple (included in tarball), bunzip, rsync recommended
        and optionally diff for primary sites.
        mta with support for ip-based access-lists or nameserver.


Installation

 - check if you have all the necessary executables in $ENV{PATH}
   by running: `which -a perl rsync wget bunzip`
 - primary (publishing) sites also need gpgv, diff and bzip2
 - make shure to have the perl-module XML::Simple installed
   (available at CPAN) or use the one included in ./XML/Simple.pm
 - fetch the Makefile into an empty directory, run 'make 'all'
   this will create the two subdirectories ./cache and ./outdir,
   fetches the public-key, generate the keyring and finally also
   fetches and verify both spfilter-config.xml and spfilter.pl.
 - run `./spfilter.pl -vd TEST_LIST`, or simply 'make test'
 - ./spfilter-config.xml: review for your own safety, its signed
 - ./spfilter-local.xml: used only with '-x ./spfilter-local.xml'
 - enable output-format for your mta or dnsbl with argument -f
 - its recommented to let spfilter write into the default
   ./outdir, and set a symlink from the location your application
   (mta, nameserver etc) expects.
 optional:
        - use argument -s to specify your own set of input-sources
        - magic_update => 1 preserves existing lines even across updates
        - count the existing lines and run twice to check the 'magic'
 - !!! new code: use -v and check the daily output from cron !!!


Configure Input

 input sources have already been defined in spfilter-config.xml:
        some of them are:
        [SPEWS|SPEWS2] SPAMSITE PERMBLOCK PDL RSL [KOREA|CHINA|KRCN]
        (cant list them all here, check out spfilter-config.xml
        also at http://spfilter.openrbl.org/code/xml-view.php
 default setting equivalent to:
 
        ./spfilter.pl SPEWS SPAMSITE PDL
        ./spfilter.pl -s SPEWS,SPAMSITE,PDL
        ./spfilter.pl DEFAULT
        ./spfilter.pl
 several sets of sources have been defined in spfilter-config.xml,
 they will be recursively expanded to the regular sources.
 see http://spfilter.openrbl.org/code/xml-view.php#PRESET_SECTION
 IMPORTANT: you also need to check incoming mail against a realtime dnsbl
 for open relays and proxies. The persistent ones will end up on DSBL and
 Wirehub's great PERMBLOCK but (unfortunately) not in realtime.
 If you dont use relays.osirusoft.com, dnsbl.njabl.org etc. enable  DSBL
 and update daily.
 please always use rsync:// for DSBL and WIREHUB, dont waste bandwidth.
 
 For 'complete' protection also use bl.spamcop.org (via dns) and consider
 enable KOREA TAIWAN HONGKONG - depending on your location.
 keys for %SRC in spfilter-config.xml: (only 'url' mandatory)
 type: /^(addr|cidr|range|reverse|axfr|host)/), to be documented
 interval (number): reuse cached files up to interval days (3)
 tag (string): prepend this instead of the name, may be set to 'NULL'
        SBL hack: if the tag ends with = there will be no space after
 prepend (string): append string after text (default none)
        ! deprecated, will be removed, legacy support only !
        use the tag "tag" instead, config.xml already updated
 append (string): append (optional) string and ip after $text
 url (string): http-, ftp- or rsync- or file-resource:
        - url's ending with *.bz2 will be decompressed transparently
        - use relative or absolute path for local sources
        - macro {YYMMDD} expandes to UTC (GMT) datestamp
        - macro {FILENAME} expands to the contents of that key
 conflict (string): warn if this source is already defined
        - only partially implemented, never trust a dumb machine ;)


Configure Output

 predefined output-formats in spfilter-config.xml:
        octets, courier, exim, postfix, qmail_uce, rblsmtpd, sendmail
        reverse, rbldnsd, tinydns, bind
 default output format equivalent to '-format=octets'
 specify multiple formats separated with coma or whitespace
 keys for %FMT in spfilter-config.xml: (all optional)
 default (boolean): 0=disabled, 1=enabled (default 0)
 type (string): 'addr', 'cidr/nn', 'range', 'config', 'rbldns',
        'axfr/cname', 'axfr/txt' or 'axfr/a' (default 'octet')
 linestart (string): prepended to the begin of each line (default none)
 separator (string): inserted between $addr and $text (default "\t")
 lineend (string): appended after $text (default none)
 magic_update (boolean): preserve manually inserted lines
        silently ignored if output sent to '-outdir=STDOUT' or DB


GPG keys and signatures

  - spfilter will use only the own keyring (spfilter-keyring.gpg)
        and will accept any good signatures from the keys listed there.
        dont add other public-keys unless you know what you are doing!
 - spfilter-config.xml and spfilter.pl contain embedded gpg-signature
        verify manually with 'gpgv --verify' as usual if you have the
        public-key in your trusted keyring (which is deprecated!).
        use 'make verify' instead, no need to mess up existing keyrings.
 - build the pubkey: 'make pubkey' will fetch the pubkey from keyserver,
        build the gpg-keyring for spfilter in a (hopefully) save way.
        'make verify' additionally checks the embedded gpg-signatures.
 - public key for spfilter@openrbl.org available at:
        http://search.keyserver.net:11371/pks/lookup?op=vindex&template=netensearch&search=spfilter
        http://pgp.mit.edu:11371/pks/lookup?search=spfilter&op=index&fingerprint=on
 - if you prefer to build the keyring manually: (all in one line)
        gpg --no-default-keyring --keyring ./spfilter-keyring.gpg \
                ./doc/spfilter-pubkey.asc

- - - ERRATA: old instructions told you to import spfilter's public key.
this is not needed anymore as the detached signarure of the *.tgz
tarball has been discontinued and deprecated for security reasons:
$ gpg --import ./spfilter/doc/spfilter-pubkey.asc
$ gpg --verify ./spfilter-$VERSION.tgz.asc ./spfilter-$VERSION.tgz


Global Hashes (as reference, not complete, use the code)

 %CFG:
        workdir "."     # opw_w
        debug   0       # opt_d
        verbose 0       # opt_v
        interval
        sources "ONE,TWO,THREE,..."     # opt_s and/or @ARGV
        formats "one,two,three,..."     # opt_f
        email   $sources        # show in HTTP_USER_AGENT
        tempfile        ""      # tie temporary hash, opt_t
        xmlfile ""      # opt_x (additional local config.xml)
        cachedir        "./cache"       # opt_c
        outdir  "./outdir"      # opt_o
        pubdir  "./publish"     # opt_p
        exec_user       "nobody"        # should use 'spfilter' if available
        exec_uid        -1      # uid from exec_user
        exec_path       "/bin:/usr/bin:/usr/local/bin"  # should be safe
        exec_http       "wget ..."      # alternative to Perl::LWP
        exec_rsync      "rsync ..."     # recommended
        pack_ext        '(bz2|gz)'      # gz not tested
        exec_bunzip     "bunzip ..."    # strongly recommended
        exec_gunzip     "gunzip ..."    # you tell me if it works ;)
        exec_bzip       "bzip2 ..."     # for republishing on primary sites
        exec_diff       "diff ..."      # used if available
        exec_patch      "patch ..."     # not implemented yet
        exec_gpgv       "gpgv ..."      # strongly recommended
        keyring 'spfilter-keyring.gpg'  # $opt_k (use NULL to disable)
        zone_name       "localhost"     # $opt_z (first field)
        zone_addr       "127.0.0.1"     # $opt_z (second field)
        zone_ttl        "43200"         # $opt_z (third field)
        program         "spfilter"
        version         "0.00"
        date            "YYMMDD"
        useragent       "$program/0.00"
        magic
        yymmdd          "YYMMDD"        # always uses UTC (~GMT)
        count_cached    0++
        count_notmodi   0++
        count_fetched   0++
 %SRC:
        name            # auto-generated, dont mess with
        url_primary     # experimental, for use by redistributing sites only
        url             # multiple tried in order, {FILENAME} and {YYMMDD} expanded
        interval        # interval in days between updates
        type
        alias           # experimental, use the same file in ./cache
        filename        # explixitely set filename in cache, defaults to $name
        minsize 1       # min kb, reject anything below 513 bytes (rounded)
        maxsize 2000    # max kb, protect somewhat against dos
        conflict        # only one single value handled
        regexp_include  # perl-regexp, will be enclosed in =~/.../
        regexp_exclude  # perl-regexp, will be enclosed in !~/.../
        option          # experimental axfrexpand, notext, html2text
        tag             # prepend to each line of output, defaults to $name
        prepend         # DEPRECATED hack, comes just bevore $append ;)
        append          # construct url, $addr appended to string
        cache_status    # -1: 304 Not Modified, 0: 404 Error, 1: 200 OK
        cache_fname     # name of cached file
        cache_ifmod     # contains If-Modified-Since date for HTTP
        cache_fetched   # name of fetched file
 %FMT:
        name    # the key itself, dont set or change
        type    "txt"
        publish 0
        magic_update
        include ""      # include content verbatim in output
        notation        "octet"
        linestart
        separator       "\t"
        lineend
        secondline      # print additional lines, for bind and tinydns
        secondlinestart
        option  # [bindhack|tinydnshack|tcpserverhack]


Windows (ActiveState Perl)

 - spfilter runs with ActiveState which has all modules already included
        http://aspn.activestate.com/ASPN/Downloads/ActivePerl/Source
        http://downloads.activestate.com/ActivePerl/Windows/5.6/ActivePerl-5.6.1.633-MSWin32-x86.msi
        spfilter-bunzip-cmd.zip contains bunzip2.exe and a cmd-sample
 - spfilter has been reported on recent versions of cygwin,
        some more documentation welcome. (check out ./docs)
        http://cygutils.netpedia.net/
        http://webmaster.indiana.edu/perl56/pod/perlcygwin.html
        http://search.cpan.org/author/COOPERCL/XML-Parser-2.31/
        (or http://search.cpan.org/author/MSERGEANT/XML-SAX-0.11/)
        http://search.cpan.org/dist/XML-Simple/ (or ./XML/Simple.pm)
        Note: there are reports after 'perl Makefile.PL' the variables
        PERL, FULLPERL and PERL_CORE may all have assigned '0' (zero)
        and 'make' will fail badly. (W2K, 2002-11-01)
 - WARNING: console application only, no colors and mouse support !
        mailservers with less than a few thousand mails per day are
        better off using traditional dnsbl-queries.
        Windows 2000 with a fixed ip and bind9 may still be used as a
        dnsbl-server for zones generated by spfilter.
        nameserver, consulting and support available for serious projects


ToDo List

 - consistent handling of keywords OK (WHITELIST) and FREEMAIL (MXCHECK)
 - aggregate input, create index hash from textual description
 - aggregate output, optionally into cidr or range
 - modularize the code, split into input.pl, output.pl and spfilter.pm
 - courses-based selection for sources and formats (contribute!)
 - update from daily diffs, uses only 2..10% (see /data/input/diff)
 - documentation (may be submitted via sourceforge project home)
 (suggestions, patches and working code always welcome)


History & ChangeLog


SEE ALSO

 homepage: http://spfilter.openrbl.org/
 mirror: http://mirror.openrbl.org/spfilter/code/