Hacking Thy Fearful Symmetry

Hacker, hacker coding bright
Powered by a Gamboling Beluga

Pod::Manual Starring in the Timeless Classic "It's a Wonderful PDF"

created: December 27, 2010

(Apologies for the truly terrible blog entry title. The Holiday season's tv programmation is somewhat corrupting me.)

Pod::Manual was born a little bit more than three years ago (three years? Egad, times does fly...), and because of a severe lack of tuit, kind of lingered in alpha-land ever since. But now, thanks to the Holidays and a long vacation stretch, I had the opportunity to return to the project and do terrible things to it. The code is even more alpha than it was before, and it's now in a post-hack shamble, but at least it has been moosified and (or so I hope) pushed in the right direction.

Good sir, it's been 3 years. We could use a recap.

Oh yes, right. Pod::Manual was created of my wish to print out nice manuals out of the POD of big projects like Moose or Catalyst. To do that, two big steps are involved: gathering the raw material and transmutate it into something that can be printed.

Step 1: Gather the manuscripts

Perl's POD system serves us well for the one module = one manpage model, but for bigger distributions having quite a few documentation pieces, it has its weaknesses. To wit:

No order or hierarchy between documents
This is no problem for distributions that have a mere handful of modules, but a first glance at, say, Moose is more daunting. Which documentation should I read first? Which one is important for a user? For someone who want to play with the guts of the system? Usually, the problem is solved by having a Foo::Manual or Foo::TableOfContent pod document thrown in the mix, but it'd be nice to have a way to physically gather together the different documents into a big one (or several big ones aimed at different audiences).
Repetitive sections
While it makes sense to have the VERSION, AUTHOR and COPYRIGHT sections present in the documentation of every module of a distribution, they sure become great paper-wasters when they are printed out. Wouldn't be great, for a paper version of the PODs, to just skip over sections that we don't care about (say the VERSION that is being printed on the cover page anyway)? And for sections that we *do* care about, but that don't change between PODs of a same distribution, maybe it would make sense to print it only once, in an appendix?
No table of content
Unfortunately, paper doesn't offer full-text searches. In consequences, if a POD results in more than 40 printed pages, I usually find myself craving a table of content, with page number references. And, likewise, I want my pages numbered, with titles and stuff in the headers and footers.

Building the manual

And that's where Pod::Manual fits in. Let's say that I want to build a manual for Dist::Zilla. Then I could do:

package Pod::Manual::DistZilla;

use Moose;

extends 'Pod::Manual';

use Module::Pluggable search_path => ['Dist::Zilla::Plugin'];

my $manual = __PACKAGE__->master;

$manual->title('Dist::Zilla');

$manual->ignore( ['VERSION'] );

$manual->move_one_to_appendix( ['COPYRIGHT AND LICENSE'] );

$manual->add_module( [ qw/
    Dist::Zilla
    Dist::Zilla::Tutorial 
/, ]);

$manual->ignore( [] );

$manual->move_one_to_appendix( [] );

$manual->add_module( [ $manual->plugins ] );

$manual;

Let's go over that again, in a more detailed fashion.

What I did was to first create the the class Pod::Manual::DistZilla that inherits from Pod::Manual. (Mind you, creating a subclass is not strictly necessary, but it helps re-use and, as we'll see in a few lines, allows for a nifty trick with Module::Pluggable).

package Pod::Manual::DistZilla;

use Moose;

extends 'Pod::Manual';

use Module::Pluggable search_path => ['Dist::Zilla::Plugin'];

Then, because we're using a class, I'm grabbing the singleton instance for it.

my $manual = __PACKAGE__->master;

I assign the title of the manual.

$manual->title('Dist::Zilla');

I don't want to see the VERSION sections.

$manual->ignore( ['VERSION'] );

And I want to have one instance of the COPYRIGHT AND LICENSE section punted to the appendix (with the rest being ignored).

$manual->move_one_to_appendix( ['COPYRIGHT AND LICENSE'] );

That done, I can now add the main Dist::Zilla modules I want to see in the manual.

$manual->add_module( [ qw/
    Dist::Zilla
    Dist::Zilla::Tutorial 
/ ] );

Because I can't resist being a cleaver monkey, I used Module::Pluggable to throw in all the Dist::Zilla::Plugin::* modules that I have installed on my machine. But since those modules might be coming from other distributions than the core Dist::Zilla, I don't want to ignore the VERSION or COPYRIGHT sections for them.

$manual->ignore( [] );
$manual->move_one_to_appendix( [] );

$manual->add_module( [ $manual->plugins ] );

Finally, we drop a $manual; as a last piece of cleverness. As $manual evaluates to true, our code can be used as a normal module:

use Pod::Manual::DistZilla;

my $manual = Pod::Manual::DistZilla->master;

print $manual->as_docbook;

or used with do if we don't want to install the module or play with lib.

my $manual = do 'path/to/DistZilla.pm';

print $manual->as_docbook;

(That's going to be useful later on to make the command-line interface as supple as possible.)

Step 2: Warm up the printing press

At the core: DocBook

Now we have our manual object, but we still have to output it in some useful format. Pod::Manual is using DocBook as its base format, and we can get to it by doing:

print $manual->as_docbook;

Or, if we have a css file that we want to associate to the resulting DocBook:

print $manual->as_docbook( css => '/path/to/file.css' );

Beyond DocBook

DocBook is nice as a starting point, but let's not forget that our end-goal is to have something printable, like a pdf file.

The tricky part with that, though, is that not only the roads to get to a pdf file are numerous, but they are also usually relying on third-party software (XSLT stylesheets and transformation engines, LaTeX, etc) that might or might not be present on any given machine. Trying to implement a single transformation method would probably doom Pod::Manual to work only on my own system. So I decided to take the plugin approach instead. Manual formatters are roles that can be slapped on Pod::Manual classes, and should provide a as_format and/or a save_as_format method. That way we can let the user who want to generate the manual pick himself the roles he'll need to get there. To be nice, we can even provide a little command-line utility script that can do that for us:

#!/usr/bin/env perl

use Getopt::Long;

GetOptions(
    'formatter=s' => \my $formatters,
    'as=s'        => \my $format,
    'output=s'    => \my $output_file,
);

my $source = shift;

my $manual;

if ( -e $source ) {
    $manual = do $source;
}
else {
    eval "use $source;";
    die $@ if $@;

    $manual = $source->master;
}

for my $f ( split ',', $formatters ) {
    my $fclass = "Pod::Manual::Formatter::$f";
    eval "use $fclass;";
    die $@ if $@;

    $fclass->meta->apply( $manual );
}

my $method = 'save_as_' . $format;

print "creating $output_file...\n";

$manual->$method( filename => $output_file );

print "done\n";

A first PDF formatter using Prince

For a first way to get to the golden pdf format, I went the easy way and used the Prince XML to PDF translator. While Prince is not free, they do offer a free version of it for non-commercial uses which add a little icon on the first page -- something I can quite live with. Its main appeal -- beside the gorgeous output it produces -- is the direct DocBook to pdf translation. Only a css stylesheet is required and the little translation engine does all the magic.

So what I did was to encapsulate the work in Pod::Manual::Formatter::PDFPrince:

package Pod::Manual::Formatter::PDFPrince;

use Moose::Role;

use Carp;
use File::ShareDir qw/ dist_file /;

sub save_as_pdf {
    my ( $self, %arg ) = @_;

    my $docbook = $self->as_docbook( 
        css => dist_file( 'Pod-Manual', 'prince.css' )
    );

    open my $db_fh, '>', 'manual.docbook'
        or croak "can't open file 'manual.docbook' for writing: $!";

    print $db_fh $docbook;
    close $db_fh;

    system 'prince', 'manual.docbook', '-o', $arg{filename};
}

1;

And now, provided that prince is installed on our machine, we can generate our first pdf from the script describe in the first section:

$ podmanual --formatter=PDFPrince \
            --as=pdf              \
            --output=dist-zilla-prince.pdf examples/dist-zilla.pl 
creating d.pdf...
done

The resulting pdf is here. The format still has to be tweaked to be truly pretty, but... we have a table of content! we have pages that look like pages from a real book! Woohoo!

PDF the good ol' Knuth fashioned way

For those who prefer good old LaTeX processing, I'm also working on upgrading the powerful, but slightly eldritch Pod::Manual::Docbook2LaTeX into XML::XSS::Stylesheet::Docbook2LaTeX. That's a topic for another blog entry, but once I'm done, we'll be able to get LaTeX output by doing

$ podmanual --formatter=LaTeX --as=latex \
    --output=dist-zilla.latex examples/dist-zilla.pl 

and use it to generate the pdf via

$ podmanual --formatter=LaTeX,PDFLaTeX --as=pdf \
    --output=dist-zilla.pdf examples/dist-zilla.pl 

What Lies Ahead

A heck of a lot of work lies ahead. The hardest part if to juggle through the different formats. In some cases (cough LaTeX cough), I have to get reacquinted with it as it's been a very long while since I last used it. And then there's fiddling with the code such that doesn't look too much like the logic wasteland it currently is. Oh yes, and there's that documentation thing I should also do at some point.

Buuut I wanted to share the work of the last few days, just to let the people that have been looking around for a fresher copy of the Moose and Catalyst manuals I'd generated the first time around know that there's reason to hope that a new version of those should be available in a not-so-distant future. :-)

comments powered by Disqus