Stupid Gentoo Tricks

What initially attracted me to Gentoo is its sometimes-elegant portage system, which is Gentoo’s version of a package manager — one of the things that distinguishes Linux flavors from one another.

Portage suffers from a sort of chicken-egg conundrum in that portage and all of its files and dependencies are themselves managed by portage, which means that upgrading libraries that everything relies upon can quickly lead to a system where portage becomes inoperative.

Recently, I managed to mangle “wget” by deinstalling a library it relied upon.  This is difficult to recover from since wget is essential to portage’s ability to install packages … such as wget and the libraries it requires.  I also discovered that an ftp client is not installed by default, which is surprising, but effectively ruled out just copying a working wget from another system.

As it turns out, the default Gentoo installation does include busybox, which is theoretically less functional, but will do the trick.  For those unfamiliar with busybox, it’s essentially one binary that contains (and is capable of replacing) a number of tiny command line functions, from cp and cat to rm and xargs.  If you have it installed, typing “busybox” will tell you exactly what command line tools it can replace.

Therefore, it’s possible to [re]install wget by telling portage to use busybox’s wget instead of wget itself:


FETCHCOMMAND="/bin/busybox wget \${URI} -P \${DISTDIR}" emerge wget

Share

The physical to the virtual

If you’re like me, you have at least 40 computers lying around, some of which have software installed that’s useful just often enough to justify their continued existence — inevitably, right after you decide the hardware would be useful for another purpose and wipe out the disk.

In this day and age of virtual computing, it seems like it should be trivial to make a drive image, drop it on a network drive, and then mount that as a virtual image.  I don’t know of any drive imaging software that’s directly compatible with virtual machines, or vice versa, so there’s a conversion step involved, where the image from the drive imaging software becomes a drive image for the virtual machine.

Conversion is probably the wrong word to use here, since the process seems to involve running the drive image on the virtual machine software to recover the image.  For virtual machines, I used VMWare, and for the drive image software, I used Ghost.

Getting Ghost to save its images to a network share is a matter of selecting the right options when creating its boot disk, most notably the network driver, that has to match the hardware of the machine you’re making the image of.  Note that there appears to be a 2 gigabyte limit for files (probably because Ghost is essentially running from DOS) so that Ghost will need to create “spanning” files, or a whole bunch of 2 gigabyte files, that will comprise the image.

If all goes well here, create another boot disk for Ghost to restore the image.  This time, you’ll want to get the right network drivers for VMWare, which emulates “AMD PCNet II” hardware.  You can get the right drivers here.

Booting a virtual machine from there, I noticed that Ghost.exe wasn’t actually on the boot disk (makes sense, since it’s over 1 megabyte by itself) so I copied it to the network drive made accessible to the boot disk.  Before running it, I noticed that I had to manually run “mouse” (the mouse driver) to be able to use my mouse for its DOS-based GUI.

Following the menus to select an image, I got to the point where a dialog box came up to select the image, filling in the A: drive as the default location, and the machine locked up.  I don’t mean just the virtual machine either, but the host machine itself.  Trying various combinations of A: drive access, removal, and virtualization was no help, Ghost would consistently lock up at this point.

The problem appears to be the dialog itself, and perhaps whatever it’s doing to the floppy drive.  Selecting the file from the command line did the trick:

ghost.exe -clone,mode=restore,src=first_1.gho,dst=1

Note that this takes a long time.  It took about 12 hours to recover an 80 gigabyte hard drive into a virtual machine.  Your mileage may vary.

After all this, once the virtual machine comes up, the almost inevitable result for an XP system will be a blue screen of death — easily remedied by booting the virtual machine from the XP installation CD, and running the recovery process (by pretending to install up until the point where it detects a prior installation of XP, then pressing R.)

Share

X10 and Compact Fluorescents

X10 is an unfortunately-named industry standard for controlling devices (e.g., lights) via low-voltage signals over power lines.  First created in 1975, it has the advantage of having relatively cheap hardware, and although there have been attempts to create a more modern and capable replacement, they have been hampered by high costs — an X10 switch might cost $10 if one shops around, and a Lonworks or Insteon switch might cost $100 or more.  Until the day comes when each outlet in my home is also its own web server, I think I’ll settle for the balance of control and cheapness that X10 provides.

X10 works by adding a 5 volt signal to the powerline’s power, which is then picked up by an X10 device that listens to the embedded command (e.g., turn on, turn off, dim, and so on.)  The problem with this is that various other things plugged into the powerline tend to absorb or attenuate this 5V signal, so by the time it reaches the device which should be listening, it can be too weak to reliably accomplish anything.

The traditional solution is to locate devices that are weakening the X10 signal, and place them behind a filter.  This works pretty well, although it can be tedious to locate the devices responsible, and it’s possible to require a lot of filters for acceptable performance.  An alternative is the installation of a repeater, which listens for the signal and echoes it back to the powerline, presumably closer to whatever device you’re trying to control.  It’s not cheap, and if the signal is weak enough, either the repeater won’t hear the original signal, or the device won’t hear the repeater.  I’ll come back to this problem in a moment.

Compact fluorescents are little fluorescent bulbs that screw into the place of regular bulbs, and consume considerable less power than incandescent lights.  However, they aren’t compatible with X10 switches designed for incandescent bulbs.  You can either use X10 switches designed for fluorescent loads (also known as “non-dimming,” “appliance,” or “relay”) or you can buy dimmable compact fluorescent bulbs, which has the advantage of able to dim them … somewhat.  (As they dim, they flicker and go out where an incandescent bulb would continue to dim through yellow and red — compact fluorescent bulbs simply cannot be dimmed as well or through the same range.)

However, these bulbs have the side effect of attenuating the X10 signal, and built-in lights aren’t candidates for simple plug-in filters.

On the plus side, there’s a device that can overcome this — not by filtering each source of attenuation, but by boosting the X10 signal itself.  It’s called an XTB, or X10 Transmit Booster, and it’s a clever little device that sits between the source of your X10 signals and the power line, intercepting the 5 volt X10 signal and putting out about a 20 volt X10 signal.

It works really, really well.

The XTB kit and components

The XTB kit and components

The company — or more accurately, guy — who produces these hasn’t the wads of cash for UL approval, so they’re sold in kit form.  The kit itself is beautifully put together, with excellent instructions.  The trickiest part was a surface-mount op-amp.

When it arrived, I’ll admit to being eager enough to whang the thing together in about half an hour, with the caveat that soldering components is almost second nature to me, and that I probably should have read the part about mounting the LED a little more carefully.  On the plus side, it worked flawlessly.  For those not adept with an iron, it’s possible to have it assembled for you.

If you’re running X10 and have any kind of signal issues, I’d recommend this before I’d recommend bothering with filters.

Share

Backing Up Open Files on Windows with Rsync (and BackupPC)

Update:

Versions of the files below may be downloaded here.  This post is probably still useful as documentation.

 


 

This isn’t specific to BackupPC by any means, but I’ll preface this with a brief explanation:  BackupPC is a “set it and forget it” backup system driven from the server, that allows you to back up the entire network of *nix and Windows PCs.  It doesn’t require any software on the systems it backs up at all, since it relies upon rsync and smbclient, and optionally ssh.

For *nix, this works beautifully.  For Windows, this also works beautifully, except that “open files” can’t be backed up at all.  This problem isn’t unique to BackupPC, any attempt to back up or copy these files will fail, so most commercial backup systems have special “open file” clients to cope with it.

The official Windows solution for XP and later is something called a “volume shadow copy.”  It’s probably far more complex than it possibly needs to be, but essentially, it creates a pseudo-volume for any actual volume, with the difference being that you can actually back up files on it.  So, this can be handily used for rsync in order to make full backups, including every single file…  in theory, anyway.

My goals in getting this working:

  1. The solution should work with off-the-shelf components (i.e., no binaries or code)
  2. Installation and footprint should be minimal
  3. It should “just work” — if it’s too delicate, it’s not all that useful as a backup solution

It took quite a bit of trial-and-error, so I’ll skip what didn’t work, and get straight to what actually does work.  There are a few required components:

  1. winexe, a *nix program for remotely executing commands on Windows systems
  2. vshadow, a Windows program that creates and manages shadow copies
  3. dosdev, a Windows program that maps drive letters to volumes
  4. cwrsync, a Windows version of rsync (the “server” isn’t necessary)

Once all the pieces are assembled, I created a C:\BackupPC directory on the Windows box with all the necessary files.  Note that rsync does not need to be installed as a service, it actually gets loaded on-the-fly.  (Note that this directory is hard-coded in a lot of the files.) Here’s a listing of that directory:

Directory of C:\BackupPC
08/08/2008  07:11 PM                65 backuppc.cmd
08/10/2008  12:56 PM             1,928 cwrsync.cmd
07/22/2008  04:30 PM         1,082,368 cygcrypto-0.9.8.dll
04/11/2008  07:03 AM           999,424 cygiconv-2.dll
04/11/2008  07:03 AM            31,744 cygintl-3.dll
04/11/2008  07:03 AM            20,480 cygminires.dll
07/22/2008  04:30 PM         1,872,884 cygwin1.dll
04/11/2008  07:03 AM            66,048 cygz.dll
09/28/2004  02:07 PM             6,656 dosdev.exe
08/11/2008  11:08 PM             1,000 pre-cmd.vbs
08/11/2008  11:05 PM                44 pre-exec.cmd
07/22/2008  02:26 PM           348,160 rsync.exe
08/11/2008  10:12 PM               161 rsyncd.conf
08/11/2008  10:12 PM                22 rsyncd.secrets
08/11/2008  11:26 PM             1,177 sleep.vbs
06/08/2005  03:17 PM           294,912 vshadow.exe
08/11/2008  10:09 PM               581 vsrsync.cmd
08/11/2008  11:33 PM               308 vss-setvar.cmd

So, here’s how it works.  Before each backup, BackupPC has an option to call a local script first, waiting for that script to finish.  Here’s the execution chain:

  1. preusercmd.sh launches “pre-exec.cmd” on the Windows box
  2. preexec.cmd launches “pre-cmd.vbs”
  3. pre-cmd.vbs cleans up some files, launches “sleep.vbs” in the background (more on this later) and then launches “backuppc.cmd” in the background, and waits for the pid file to appear that signals that rsyncd has been launched
  4. backuppc.cmd launches vshadow, and tells it to execute vsrsync.cmd
  5. vsrsync.cmd maps the shadow volume to B:, and launches rsyncd — it sits and waits here, leaving vshadow and rsync open while the backup or rsync process runs — on the shadow copy on B:

Once the backup is completed, another local script is run — here’s its execution chain:

  1. postusercmd.sh puts a file called “wake.up” in the C:\BackupPC directory
  2. sleep.vbs wakes up, sees this file, reads rsyncd.pid, and kills the rsyncd process
  3. vsrsync.cmd now continues, since the rsync process is dead.  It unmaps the B: drive.  Once this script completes, vshadow automatically deletes the shadow volume.

Sure, it seems simple, but a lot of work went into that, since there are a lot of nuances to sort out.  Here are the file listings:

preusercmd.sh

#!/bin/bash
WINEXE=/usr/bin/winexe
UNAME="Administrator"
PWD="admin.password"
WRKGRP="WORKGROUP"
BOX=$1
$WINEXE --interactive=0 -U $UNAME -W $WRKGRP --password=$PWD //$BOX 'cmd /c c:\backuppc\pre-exec.cmd'
sleep 5
echo "Rsync and shadow copy loaded"
kill $$
# The script needs to be killed, otherwise, winexe waits for input

pre-exec.cmd

cd \backuppc
@echo off
cscript pre-cmd.vbs

pre-cmd.vbs

Const Flag = "C:\BackupPC\rsyncd.pid"
'
' Pid file shouldn't be there already
'
If DoesFileExist(Flag)=0 Then
   Set fso = CreateObject("Scripting.FileSystemObject")
   Set aFile = fso.GetFile(Flag)
   aFile.Delete
End If
'
' Nor should "wake.up"
'
If DoesFileExist("C:\BackupPC\wake.up")=0 Then
   Set fso = CreateObject("Scripting.FileSystemObject")
   Set aFile = fso.GetFile("C:\BackupPC\wake.up")
   aFile.Delete
End If
'
Set objShell = CreateObject("WScript.Shell")
objShell.Exec "cscript C:\BackupPC\sleep.vbs"
'
Set objShell = CreateObject("WScript.Shell")
objShell.Exec "C:\BackupPC\backuppc.cmd > C:\BackupPC\file.out"
'
' Just sleep until the file "rsyncd.pid" appears
'
While DoesFileExist(Flag)
   wscript.sleep 10000
Wend
'
' functions
'
function DoesFileExist(FilePath)
Dim fso
	Set fso = CreateObject("Scripting.FileSystemObject")
	if not fso.FileExists(FilePath) then
		DoesFileExist = -1
	else
		DoesFileExist = 0
	end if
	Set fso = Nothing
end function

sleep.vbs

Const Rsync = "C:\BackupPC\rsyncd.pid"
Const Flag = "C:\BackupPC\wake.up"
'
' Just sleep until the file "rsyncd.pid" appears
'
While DoesFileExist(Rsync)
   wscript.sleep 10000
Wend
'
' Now sleep until the file "wake.up" appears
'
While DoesFileExist(Flag)
   wscript.sleep 10000
Wend
'
Set fso = CreateObject("Scripting.FileSystemObject")
Set aFile = fso.GetFile(Flag)
aFile.Delete
'
' It's time to kill Rsync
'
Set fso = CreateObject("Scripting.FileSystemObject")
Set aReadFile = fso.OpenTextFile(Rsync, 1)
strContents = aReadFile.ReadLine
aReadFile.Close
'
Set objShell = CreateObject("WScript.Shell")
objShell.Run "taskkill /f /pid " & strContents, 0, true
'
' Wait for Rsync to let go
'
wscript.sleep 5000
'
' Delete PID file
'
If DoesFileExist(Rsync)=0 Then
   Set objShell = CreateObject("WScript.Shell")
   objShell.Run "cmd /c del C:\BackupPC\rsyncd.pid", 0, true
End If
'
' functions
'
function DoesFileExist(FilePath)
Dim fso
	Set fso = CreateObject("Scripting.FileSystemObject")
	if not fso.FileExists(FilePath) then
		DoesFileExist = -1
	else
		DoesFileExist = 0
	end if
	Set fso = Nothing
end function

backuppc.cmd

cd \backuppc
vshadow -script=vss-setvar.cmd -exec=vsrsync.cmd c:

vsrsync.cmd

REM @ECHO OFF
call vss-setvar.cmd
cd \BackupPC
SET CWRSYNCHOME=\BACKUPPC
SET CYGWIN=nontsec
SET CWOLDPATH=%PATH%
SET PATH=\BACKUPPC;%PATH%
dosdev B: %SHADOW_DEVICE_1%
REM Go into daemon mode, we'll kill it once we're done
rsync -v -v --daemon --config=rsyncd.conf --no-detach --log-file=diagnostic.txt
dosdev -r -d B:

rsyncd.conf

use chroot = false
strict modes = false
pid file = rsyncd.pid
[C]
path = /cygdrive/B/
auth users = Administrator
secrets file = rsyncd.secrets

postusercmd.sh

#!/bin/bash
WINEXE=/usr/bin/winexe
UNAME="Administrator"
PWD="admin.password"
WRKGRP="WORKGROUP"
BOX=$1
PID=$($WINEXE -U $UNAME -W $WRKGRP --password=$PWD //$BOX 'cmd /c echo '1' > c:\backuppc\wake.up')
echo "Rsync and shadow copy unloaded"
Share

Windows Mobile and ActiveSync

ActiveSync is a wonderful thing — unless something goes wrong.  In the maddening manner of most Microsoft error messages, you get the functional equivalent of “something went wrong” with no other detail — and worse, an error message that appears to be telling you something:

“Critical communications services have failed to start.  Try resetting the mobile device, and then connect again.”

This might lead you to conclude that the issue is on your mobile device, when in fact, it appears to mean nothing of the sort:  it appears to actually mean, “ActiveSync didn’t receive any communications from the device” which is just as likely, if not more likely, to be a problem on the PC side.

ActiveSync communicates on these ports:

TCP from Mobile Device to PC:  990, 999, 5678, 5721, 26675
UDP from PC to Mobile Device:  5679

So you can start troubleshooting by making sure these are open and available on the PC.  (If you have a “personal firewall” on the PC, start there.)

In my case, I finally traced it to Winsock2 corruption, as explained here.  Something I’d installed or deinstalled apparently managed to leave a wake of destruction.

The solution was to open a command window and execute “netsh winsock reset”, then reboot.

Share

Replacing Google Browser Sync with Weave

Google Browser Sync was one of the handiest things for people who use Firefox 2.0.  For those who are unfamiliar, it synchronizes bookmarks, passwords, history and persistent cookies across installations of Firefox, using Google’s own servers.  Google announced that they are dropping support for Google Browser Sync effective in 2008.

Weave picks up where Google Browser Sync left off, and then some, effective today.  Unfortunately, Firefox has closed registration to their own synchronization server, which appears to be one of the few ways to install the extension.  However, you don’t actually need to use their server to store and synchronize your browsers — you can use your own WebDAV server.  Doing so requires a little finagling, but it’s well worth it.

On a side note, if you don’t have your own WebDAV server, you can get one from GoDaddy.  They have an “online file folder” which fits the bill nicely.  The 50MB edition should be enough for most people; my probably-normal use of Firefox puts a little over 4MB on a WebDAV server.

Since you can’t register right now, the first thing to do is to acquire the file weave-0.2.4.xpi.  This is the extension itself; if you download it within Firefox, you call install it directly.  Alternatively, save it to a file, and from within Firefox, File->Open will allow you to install the xpi file.

For it to be useful, you’ll need your own WebDAV server with https installed.  I assume you either have one set up, or can set it up yourself — note that if you use a self-signed certificate, be sure to browse there first, and make sure you create an exception so that you can utilize the server with Weave.  (An exception is a way of loading a cert into Firefox so that it can trust a site that’s not chained to a root certificate.  You can set one up at Tools->Options->Advanced->Encryption tab->View Certificates->Server tab->Add Exception.)

On the server, create a directory called user/[username] where [username] is a valid WebDAV account.  This is the directory where everything will be placed, so make sure it’s writable from the WebDAV account.  (Test this with any WebDAV client if you’re not sure, like cadaver.)

Once you’ve installed the xpi and restarted Firefox, Weave will come up with a screen where you can create an account.  Hit [cancel], since there’s no way to specify your own server at this point.  It will also try to navigate a browser window to services.mozilla.com, which may or may not work, depending on how their servers are holding up.

There should now be a Weave submenu under the Tools menu.  Tools->Weave->Preferences->Advanced tab will take you to “Server Location,” where you can fill in the URL of your WebDAV server.  Change the server location, and hit [OK] to close the preferences.  (Until you do so, it won’t pay any attention to the location of your new server.)

Tools->Weave->Sign In will now take you to the registration window, but it will be using your WebDAV server.  Select “Set Up Another Computer” (even if it’s your first one.)  Weave will look on your WebDAV server for api/register/regopen, but if it’s not there, it will assume everything’s fine, and let you enter a username, password, and encryption passphrase.  Password is your webdav password, and “passphrase” can be anything, as long as it’s consistent across machines.

That’s it! It will take a while to synchronize initially, so some patience is warranted at this point.

In addition to synchronizing everything Google Browser Sync did, it also adds the ability to synchronize tabs, which is just nifty.

Share

Free Antivirus Software

For a while, I worked for an antivirus software company, which leads me to understand malware and viruses better than most. Antivirus software is the rare category of software that I’ll not just pay for, but keep paying for, because antivirus software is software that continually costs money to maintain. Somebody has to keep virus signatures and detection methods up to date, or the software quickly decays and becomes worthless — unlike some software which I’ve been using for more than ten years without changing — once it works, it works.

Open source anti-virus software is a particularly rare category of software because of the resources it takes (not to mention the possibility that virus authors will examine the anti-virus code to help their creations elude detection.) With this in mind, anti-virus software tends to be heavily commercial, heavily advertised, and it’s difficult to find the free solutions that are out there. Luckily, there are a few good ones, which work well for those of us who would prefer to avoid spending money we don’t have to, especially on systems that aren’t at much risk.

MoonSecure is the first one on my list. It’s a real-time scanner built on the ClamAV engine, which is quite a good Unix scanning engine in its own right.  ClamAV has Windows binaries and a periodic scanning engine as well, but lacks the “real-time” scanning component, which is probably more appropriately termed “scanning on the fly.”  By all appearances, it’s entirely non-commercial, and it’s the only one in this list that installs cleanly on a server version of Windows without complaint.

ClamAV deserves its own mention, because it has its own Windows binaries and also installs on a server and is non-commercial in nature.  Also, according to the MoonSecure people, MoonSecure is developing their own engine.

Grisoft’s AVG is next on the list, free for non-commercial use, and supported by its more-capable, non-free counterparts.  It does a decent job on a desktop; it won’t install on a server.  You have to get a license key and have a valid email address.

Avast! is similar in many ways, being free, and supported by its professional non-free counterpart.  Once a year or so, you have to get a new license number — which is free.  Despite its weird annual license renewal and terrifically loud “virus database updated,” I do have a particular fondness for this one.

Avira’s Antivir fits neatly into the same category.  However, it also has a daily pop-up that’s quite irritating.  It’s not too hard to disable, frankly, but it might be violating their license.   I’d pick something else for this reason alone, but otherwise, not too bad.

PC Tools Antivirus is also in the same category, and pretty decent all around.  Having used it the least, I don’t have a lot to say about it, but I include it here for completeness.

Share

Spam Prevention, or, the sorry state of Email

Email spam is universally loathed. It’s difficult to prevent entirely, not only because spammers have a wealth of techniques at their disposal, but because so many legitimate mailers are misconfigured or routinely behave like spammers. The best approaches to combating spam involve multiple techniques to combat various spam techniques. I’ll outline what works, and what doesn’t, and hopefully provide some insight into how spammers work, and some of the more sleazy techniques I’ve encountered.

There’s a spectrum of spam, from the terrifically illegitimate to the “legitimate,” where a semi-reputable company adds you to their mailing list because of something you ordered (perhaps you left a default box checked that said “I want to receive marketing material via email.”) On the illegitimate side are usually commercial operations dedicated to spam, often using zombie farms of compromised machines to send out vast volumes. They often use sophisticated techniques to avoid content filters (like sending vast amounts of legitimate-sounding gibberish.) Eliminating the maximum amount of spam requires a multiple layered approach. I’ve outlined this for mail administrators:

Layer 1: Blacklists and Server Verification
Spam blacklists are simply wonderful at eliminating most spam from bot-farms and sleazy operators. Blacklists are DNS lookups where you can verify an IP address is not on it before you accept email from that IP address. False positives are nearly zero for the good lists, though every now and then somebody like AOL makes it onto a blacklist, but at the best blacklists, this doesn’t happen. A review of our mail servers’ statistics show that sbl-xbl.spamhaus.org is solely responsible for rejecting over 95% of attempts to spam our server. This capability is provided by milter-dnsbl.

Server verification covers the other 5%. In a nutshell, this verifies that the IP address that a system provides during the MTA phase of negotiations is legitimate. Over time, we’ve encountered a few mailers that, for whatever reason, have run afoul of this filter, either from misconfiguration, or from perversely sending email from an unresolvable address. I can’t think of a legitimate reason why anybody would feel the need to use unresolvable addresses to send mail; in cases where I’ve pursued this, it’s generally been the fault of a bumbling administrator or IT department. Every time I’m tempted to relax this requirement, I look at the volumes of spam eliminated and think, hey, if you can’t configure your own mailer properly, maybe nobody should accept mail from you. This capability is provided by spamilter.

Layer 2: Greylists
After making it past the blacklist, the next thing encountered by a would-be mailer is the greylist. To put it succinctly, a greylist is a way of telling certain mailers, “try again later.” Legitimate mailers will do exactly that, while a lot of spam farms give up confusedly. For others, it gives them enough time to be placed on a blacklist next time they make the attempt. A greylist works by tracking the IP (and often, origin email) of the mailer that is contacting you. Next time that same mailer contacts you, if enough time has expired, it’s allowed through.

The tricky part about greylists is coping with the behavior of some mailers, particularly big ones. Those that adhere to SPF are easy, most greylists will happily let SPF-compliant mailers right through. For the rest, most greylist implementations have a “whitelist” of mailers that respond poorly to the technique, either by sending from a different IP address every time (and therefore never satisfying the waiting period) or known issues where mailers may get confused or not retry for a very long time.

Another side effect is that legitimate mail can, and will, be delayed. A particularly effective technique is to greylist all email from origins not within your country — in my case, skipping the greylist for US-origin addresses interferes with as little mail as possible — and most of the spam comes from non-US computers. This capability is provided by milter-greylist.

Layer 3: Content Filtering
Hopefully, most spam is eliminated before we get this far, because no matter how sophisticated content filtering gets, it can be problematic to consistently separate spam from (for example) messages from a family member who spells poorly and has questions for you about Viagara.

So the first thing to go is make another run through the blacklists. While this may seem redundant, the reason for this is that it will pick up blacklisted IP addresses that are relaying through somewhere else. A common spam technique is to create an email forwarding address for you on a service like bigfoot (I see a lot of these) and then spam that address, which merrily forwards all the spam to you, thus effectively skipping the blacklist — unless you scan through all the headers, too. This capability is provided by spamilter.

The next thing to do is eliminate the obvious — mangled email. While spammers make an effort to make their mail look legitimate, invalid or multiple headers can result from spam being relayed through security holes in web sites. Spammers generally can’t see the results, nor do they care. In a related way, it’s a common technique for spammers to add multiple headers of the same type, violating most specifications but often bypassing content filters that expect mail to be in mail format, or by pumping through headers designed to exploit loopholes in clients or to overload mail servers. Since people using legitimate mail clients aren’t capable of producing broken mail, getting rid of broken mail causes no harm. This capability provided by mimedefang.

The next capability is filtering the content itself using a number of heuristic techniques that have been tuned over time, using capabilities provided by Spamassassin. Spamassassin does quite a good job, although sophisticated spammers will regularly test their spam content against its rules. Therefore, a good practice is to update its rules regularly using sa-update.

It’s also worth eliminating virus spam at this point. clamav provides this capability handily. As with spamassassin, it’s most effective when updated regularly.

Level 4: Sieve Rules
At this point, there is still a potential for false positives, and some things are going to slip through. Therefore, content filters normally just flag email. Sieve rules are a hierarchy of rules that determine how to treat email. So legitimate mail can be saved from the junk filter, and persistent spammers can be shuttled over to the Junk folder. These are normally in the hands of end users, but general rules can be effective site-wide.

Level 5: Don’t REPLY
This is true on a number of levels, the first being that a mailer should summarily reject all mail that’s not to legitimate users, rather than accept it, and attempt to bounce it back. There’s a whole class of spam known as “bounce spam,” where the “reply to” address is the actual victim, and the spammer sends email to a legitimate mailer and an invalid address. The mailer happily forwards it “back,” which actually sends the spam to the victim. There’s no benefit to ever automatically emailing the reply to address from the mailer level, either to inform an end user that they’ve typoed an email address (rejection serves that purpose adequately) or to inform an end user that they’ve sent a virus — the reply address is almost never the originator.

This also extends to the end user. For legitimate businesses, replying is usually an effective way to be removed from their mailing list — if you recognize the domain and have done business with them, there’s little risk. More sleazy operators, however, take the opportunity to add your legitimate email address to hundreds of other lists, even while nominally removing you from the list you’re presumably unsubscribing from. Your legitimate email address can now be sold to other spammers.

In a similar vein, it’s often a bad idea to click where it says “click here to be removed” for the same reasons. A particularly sleazy form of this actually takes you to a page covered with ads, and the unsubscribe box (filled in) in the middle. The spammer has now made money, because you’re a unique visitor to whom those ads have been displayed — even more if one catches your eye and you click on it.

Level 6: Report Spam
Reporting spam has a number of benefits, the biggest one is the overall reduction in spam. Spamcop is probably the best way to report spam — it sends email directly to the administrators of the systems, which are either misconfigured (open proxies or relays) or a customer of theirs is the spammer. Spamcop does an excellent job of analyzing email headers and finding out who’s really responsible. Note that spammers will often include legitimate URL’s in their spam, so it’s best to pay close attention to who the reports are being sent to any why.

Share

Misdirected email and email disclaimers

Like many people who have been active on the Internet since AOL was a standalone service, I’ve accumulated a number of email addresses over the years, many of which I still use. Some are short and easy to remember, and at least a few of them are routinely given out by people who think they are their own.

The worst offender was a ski resort, who kept giving out my email address as their own — perhaps they even used it as their “reply to” address, since people were particularly stubborn in their insistence that they had the right address. I had a lot of conversations like these:

“I’m sorry, I’m not affiliated with any ski resort, you’ll have to phone or mail the resort to get the correct address.”

“But this is the address they gave me. Do you have parking for an RV?”

“Well, on the street, but I’m not sure what good this will do you, since I’m probably a few hundred miles away from where you want to be. As I mentioned, I have nothing to do with the resort, and I do not know how to get in touch with them.”

“Oh good. How far is the street from the slopes?”

Perhaps they just appealed to a particularly obtuse clientele, but they kept doing it. So I asked somebody who emailed me for the number of the resort, and I called them to let them know their mistake. “No, that’s our email address,” I was told. I couldn’t convince them otherwise. Eventually I resorted to just giving out reservation confirmations, and they finally stopped.

“Is it too late to reserve rooms for eight people for this weekend?”

“No, you’re all set. Your confirmation number is 6893-261#-3472@.9653!7160321796. Please have this ready when you arrive.”

I guess having irate people show up is a lot more effective than politely asking them to knock it off. A lot of people give one of my email addresses out as their own when asked for an email address. I’m not sure if they just don’t know their own, or they just don’t think it matters, but I’ve been signed up by proxy for an appalling amount of things:

  • Bank accounts (complete with “here’s your password to bank online”)
  • Home loans (complete with “update your payment address”)
  • Retail sites of all kinds, a handful with active “buy it now” credit cards
  • Medical records
  • Insurance records
  • Porn memberships (with recurring payments and a changeable password)
  • Job sites (complete with “update your resume/profile”)
  • Social networking sites (as above)
  • Dating sites (even more fun, as above)

As the mood takes me, I might locate the phone number of the person whose account it is, and notify them of their mistake (reactions have ranged from confusion to threatening to sue me.) Sometimes I’ll just change the password and forget about it (there are probably a few poor schmucks still paying for porn that they don’t have access to and can’t cancel.) Sometimes I’ll update their profile in amusing ways. Although the thought has occurred to me to drain a few bank accounts, these are people who strike me as most genuinely confused and in need of an explanation — and I’m not really that much of a bastard.

I also get signed up for a lot of mailing lists, which can be fairly obnoxious. If mailing lists have a simple way to unsubscribe, I will. Better yet, mailing lists that ask for confirmation. I don’t confirm, and that’s the end of it. Some mailing lists are particularly obnoxious — no way to unsubscribe, or even worse, the only way to unsubscribe is to enter a lot of personal information on a separate web site (which, if it doesn’t match whatever information the idiot gave them when they provided your email address, won’t let you unsubscribe) or points to a site that doesn’t exist or resolve, etc. Since I don’t want to be on the mailing list, I’ll complain directly to their ISP. I’ve had a few car dealerships disconnected from the Internet by their ISP’s — who are usually pretty cooperative.

Note to email list administrators: always confirm email address, and have a simple way to unsubscribe, or you’re a spammer.

I also get emails directly from misguided individuals. It’s remarkable the amount of personal detail that people will include to an email address they’ve never sent anything to before. I usually reply to let them know I’m not who they think they’re contacting. Occasionally, they argue (which is bizarre to me, but some people get ideas stuck in their heads. “Dot! Stop fooling around!”) and occasionally, they’re just weird — some ask for unrelated computer help (which I provide, to the extent that I can help via email) and one lady told me that she was a “married Christian woman” and that it was improper for her to talk to a strange man. (This, of course, implies to me that she desperately wants to, and either is unhappy with her husband or her repressive brand of Christianity — and she actually does keep writing — go figure.)

High on the obnoxiousness scale are the business emails I get, usually with tons of insider information, and a standard disclaimer telling me what I can and can’t do, my duties if I’m not the intended recipient, etc. I’m not a lawyer, and this isn’t legal advice by any means, but I don’t think I’m bound by any of this crap. If you send me an email, it’s mine. I’ll do what I want with it. If you’re incompetent enough to send me insider or confidential information from your company, I’m going to feel free to post it on the Internet if I damned well feel like it, and you can stick your disclaimer wherever you like.

We don’t have a contractual relationship, and your email was unsolicited. You can’t create one using your disclaimer; I don’t agree to your terms. Any of your terms. If I feel like sending you back an email informing you of your mistake, I might do that. Doing so does not mean I agree to your disclaimers, nor does it obligate me to send you another email informing you of your future mistakes when you do it again and again.

If we were to have a contractual relationship, I could see the value of a disclaimer, to, say, remind me of a confidentiality contract we mutually signed. But unsolicited email is precisely that; just as you can’t send me junk in the mail and obligate me to do anything with it, you can’t via email, either.

Share

Outlook, Mail Archives, and Duplicates

Exchange and Outlook are dismal examples of code, but the fact remains that they are ubiquitous. Nobody has managed to create a mail/calendar/contacts/task application with wider adoption, and it has enough inertia that well designed applications have little chance to make inroads, which means a lot of people are stuck with it. For those of us who prefer elegant, well designed applications, putting up with their quirks is maddening.

Outlook, for example, has a hard-to-explain 2 gigabyte limit on mail archives — and mail archives are arguably one of the niftier features that Outlook offers. Early versions of Outlook don’t know any better, and simply corrupt your mail archives. Later versions of Outlook know better, and warn you not to exceed the limit. While some noise has been made about Outlook finally removing the 2 gigabye limit, it’s actually not quite true, it’s only been removed for Exchange style mailboxes, and is still there, for example, for imap mail boxes.

For those of us with lots of mail and the need to archive it (I receive a lot of technical documents, some very large, via email) using Outlook’s built-in “archives” isn’t really an option, so I used the simple expedient of setting up an archive IMAP server, where the size wouldn’t be an issue. While this works reasonably well going forward, Outlook puked enough while trying to move messages from its proprietary formats to imap, that I was left with a vast number of duplicates.

On a significantly large mailbox, this is a bigger problem than it sounds like — especially since the duplicates were created with different mail id’s, and in many cases the white space or envelopes are different, while the messages are clearly identical. Maddening, but it largely means that any automated duplicate removal will have to happen through IMAP, not through the filesystem.

While it seems that a tool to locate and eliminate duplicate IMAP emails would be simple to find, it appears that such a beast simply does not exist, except for the trivial case in which the message id’s are identical. At the imap level, there are a decent number of tools here:

http://www.athensfbc.com/imap_tools/

Which work admirably, for the most part. For the remainder, I used this Thunderbird Add-on, which took care of the remaining fringe cases. The only problem, of course, is that on a really large email folder, Thunderbird starts to complain endlessly about script timeouts. However, you shouldn’t really need to do this regularly.

Share