Making a ps2pdf print queue

Updated 2/05/2003 (modified filter for compatibility with RH 7.2)
Updated 1/19/2001 (modified filter for compatibility with NSF requirements)

Introduction
Since the NSF now requires that documents be submitted in pdf form, we've all been looking at various ways of producing pdf documents. In the following pages, I'll describe one of the options that may have been overlooked: creating a network print queue that produces pdf files. This approach was inspired by Adobe's pdfwriter, and will work for Windows computers and (with a little caution) with Macs.

This method won't be a good option for everyone, since it requires some work to set up, and assumes that you have a handy Unix computer available. But if you already have a Unix computer in your department, and you like to tinker, you might want to consider it.

How does it work?
Here's what your users see: The Windows user clicks on "print" in a Windows application (any application at all), and sees a new printer called 'ps2pdf' (or whatever you choose to call it). He/she chooses this printer and clicks "OK". A few minutes later, the user receives an e-mail message. This message will contain either a PDF attachment (if the file is small) or a URL pointing to a PDF file in a holding area (for larger files).

So what happened? When the user printed the job, it went to a network print queue on a Unix computer. This print queue invoked a filter which converted the input into a PDF file, then mailed the PDF (or a link) back to the user. The conversion was done by Ghostscript, through the 'ps2pdf' script that comes as part of the Ghostscript distribution. The filter determined the user's userid from the information sent over with the print job, and constructed a mailing address of the form "userid@mailhost".

The Windows computer can either send the job to the Unix computer via LPD, or through a Microsoft print queue. To use LPD, you'll need to install either ACITS (for Win95 computers) or the "tcp/ip printing" service that comes with NT. If you use Microsoft printing, you won't need to install anything on the Windows computers, but you'll need to install Samba on the Unix computer.

Why do it this way?
Here are the strong points of this method:

The weak points are:

For comparison, here are some of the other options for converting documents to pdf under Windows:

MethodComments
Adobe Acrobat installed locally Acrobat costs money. Even if you can get the cost down to $40-$50 per seat, it adds up quickly. You may not find it cost-effective to spend that much money on a package like Acrobat that does much more than is actually required (converting documents to pdf) and will seldom be used. Also, Acrobat can't be centrally managed. When a new version comes out, it must be installed on each desktop individually.
Adobe Acrobat installed on secretarial computers The above arguments still apply, although we've reduced the number of desktops, but now we're asking our faculty members (who typically are an independent-minded lot) to take their documents to a secretary for conversion. Experience has shown that many of the faculty feel more comfortable having the tools on their own computer, making them independent of others.
Ghostscript installed locally Ghostscript is available for Windows, and it's free. The windows version even comes with a "ps2pdf" batch file to convert postscript documents to pdf. But there are training issues. In order to use the ps2pdf batch file, the user must first save a postscript version of his/her document. This can be done by printing to a postscript file (assuming that a postscript printer driver is installed), but it requires several steps.

From this, you can see that the network print queue approach could have some of appeal.

Requirements for Creating a ps2pdf Queue
Minimal requirements are:

In addition, it's useful to have some of the following:

How do I configure the Unix computer?
There are a lot of answers to this. I'll consider several special cases below. First of all, though, there are a few of basic things you'll always need: a new printcap entry, a new print filter, and a cron job to do some cleanup. Here are the ones I use.

How do I configure the rest of the system?
Here are a few examples showing various ways you can set up the rest of the system, depending on what hardware and software you have available.

  1. The simplest case
    Here's the simplest scenario I've come up with. To make it work, you'll need to create the new print queue on the "print" server and install the appropriate tcp/ip printing system and printer drivers on each client machine. Any printer driver that generates plain ol' postscript will probably do, but I use the Apple LaserWriter driver. (I miss the good old days when the list of drivers included "Generic Postscript".) Note that you can get around the need to install the printer driver on each machine by using Samba or talking to the print server through an NT server, and having the client machine grab the driver automatically when the local print queue is installed. I'll talk more about this later.

    In this configuration, jobs are submitted to the print server via LPD, then converted to PDF. Once the file is created, the pdfmail filter checks the size of the file and decides whether to mail it or drop it off for later pickup. The pickup area can either be a local directory on the print server, in which case the print server needs to be running an http or ftp server to allow the user to pick up the file, or it can be a directory on an existing http or ftp server, in which case the file will need to be transferred to the other server by NFS, scp, or some other method.

    Remember that users need to have accounts on the mail server under the same user names they use when logging into their workstations. This is true because the userid sent over with the print job is used to generate the outgoing mail message. If the userid doesn't exist on the mail server, the mail will bounce back to the print server's administrator. Even worse, if the userid exists on the mail server, but is assigned to someone else, the wrong user will receive the mail. Consider, for example, the case of a user named Mike, whose official userid is mst3k but who likes to log into his Windows95 computer under the name "mike". When he prints a job to the ps2pdf print queue, the resulting mail message will be sent off to "mike" on our chosen mail server (say "virginia.edu"). But the user with the e-mail alias mike@virginia.edu may be someone completely different from mst3k.

    We could improve the script by requiring that the userid exist in the UVA whois (or ldap) database. This still wouldn't preclude name collisions in cases where an official userid looks like a proper name, though. For example, we have a user named Blaine E. Norum, whose userid is 'ben'. If Ben Johnson logged in as 'ben', and submitted a print job to the ps2pdf queue, the filter would find that 'ben' is a valid UVA userid, and the resulting pdf would be mailed to Blaine Norum.

    Checking against the whois database would also disallow users who don't have official UVA userids, but who have local mail accounts. For example, in Physics we create userids of the form 'phy_xx', 'rec_xx' and 'reu_xx' for visiting researchers, RECET students (grade school teachers who take classes during the summer) and REU students (undergraduates who participate in research during the summer), respectively. These people can send or receive mail through our departmental mail server, but they don't appear in the whois database.

  2. How we do it in Physics
    As another example, here's how we do things in the Physics Department. The most obvious difference from the previous example is the addition of an NT server to the mix. We do it this way, rather than printing directly to the print server, for several reasons. First of all, there's inertia. Our workstations were configured to print through the NT server long ago because (at the time) there was no such thing as ACITS, and Samba servers couldn't authenticate against an NT domain, requiring a separate login which, to top it all, required that the password be sent in plain text. That's no longer the case. Our current print server can authenticate against our NT domain with no problem.

    Even so, there's still a good reason to keep an NT server in the mix. The print queues on our NT server are configured to supply printer drivers for Win95 and WinNT workstations. (For information about how to do this, look at this Knowledge Base article. It's not straightforward, and the instructions in the article aren't exactly correct, but you can get it to work with a little trial and error.) This makes it much easier for users to connect to network print queues. The current version of Samba can supply printer drivers for Win95, but not for NT. This is planned for the next Samba release, which should be out in a few months. At that time, we'll probably start configuring machines to print directly to the print server.

    Note that we don't have a separate web or ftp server for large PDF files. The print server itself is running Apache, and we simply have the pdfmail script move the PDF file into a directory that's available through http.

Will it work for Macs?
Yes, you can use this with Macs if you either (a) install Netatalk+asun on the print server or (b) install "printing services for macintosh" on an intermediate NT server (see the Physics Department example, above). You'll have to be extra careful about user names, though. When the Mac sends the print job, it sends the "owner" of the Mac as the associated userid. If you wanted to use this with Macs, you'd have to make sure that each Mac using the service was configured to have a valid mail id listed as the Mac's owner. I haven't tried it here, but we do have Macs sending regular print jobs to the print server with no problems. Our print accounting logs show the Mac's owner as the userid of the person who printed the job.

What other stupid LPD tricks do you know?
Lpd is an underappreciated protocol. Every user knows how to print a file, and lpd print queues can be made to do almost anything with what they're sent. Here are a few examples, submitted for your consideration: