Fun in POD-land
Fun in POD-land
I love to tinker with tools, utilities, tweaks, anything that can be used to make grease the production chain into stupendously slick efficiency. So it stands to reason that I’d be drawn to documentation, its format, its processors, the tools to display and search it. Heck, I’ve even been known to actually read it, now and then.
Documentation, in the Perl sphere, means POD. It’s a fairly simple markup format, but with just enough twists to make things… interesting.
As POD is as old as Perl, there is plenty of modules out there to parse it, and to generate output in pretty much all the usual formats. There are none, however, that does exactly what I want.
What do I want? I want simplicity. I want extensibility. I want trivially simple manipulations. And I think I want a peppermint tea. Don’t move, I’ll be back in 5 minutes.
Aaah, that’s better. Where was I? Oh yes, wants. To be more pragmatic, there are two use cases I’m pursuing.
The first one is the transformation of POD documents into any other format. I did a first foray into that when I played with PDF documents and Pod::Manual. That project is still on the backburner, and these days I’m also eyeing exporting Perl distribution documentation as Dash/Zeal docsets.
The second one is POD extension/manipulation in the context of distribution building. “But there are Pod::Elemental and Pod::Weaver for that!”, I hear you say. And you are right. But I have a confession to make:
Pod::Elemental
and Pod::Weaver
scare the everlasting bejeesus outta me.
Although fear can keeps me away only for so long. Underneath the initial
gasp, I think that what disatisfy me is that each POD solution that I found
on CPAN that gives me DOM manipulating powers is a special snowflake of
a parser. Meaning that I have to learn about its DOM structure, its
node descending rules, all that jazz. Meh. It’s all hard work. Why couldn’t it
be simpler? Like, couldn’t it be just like a jQuery-enabled page, where
getting a section could be as simple as $('head1.synopsis')
and moving it elsewhere be one $some_section.insert_after($some_other_section)
?
If you found yourself nodding at that last paragraph, keep reading. If, on the other hand, you felt the cold finger of dread run down your spine, you might want to go and get your comfort blanket before soldiering on…
It’s all too complicated. Let’s use XML!
To be honest, that’s not something I was expecting to say of my own volition. But, really, this is the type of stuff XML was born to do. Considering how well-understood XML/HTML is, it kinda makes sense to use it as the core format. And with that type of document, we don’t have to invent a way to move around in the DOM tree — there are already XPath and CSS selectors that are there for that. And lookee, lookee, there’s Web::Query that would give us access to CSS and XPath selectors (for XPath selectors, just wait a few days for its next release to be out), and nifty jQuery-like DOM-manipulating methods.
In a nutshell, that’s what I wanted to try: create a prototype of a pipeline that would slurp in some POD, allow to easily muck with it, and spit it out as whatever is desired.
The prototype, Pod::Knit, exists. It’s still extremely alpha, but it’s already at a point where an owner tour might be in order. Here, follow me…
POD comes in…
First thing on the agenda: read some POD and convert it to some XML document. Now, writing this part myself, considering how many POD parsers are out there, would be silly. Well, sillier than the rest of my plan, that is. So I went shopping on CPAN.
At first, I found Pod::POM. Its HTML output takes some liberties with the formatting attributes, so I wrote a quick generic XML view module.
… And only then realized the parser isn’t easily extended to accept new POD elements. Dammit.
So I switched to Pod::Simple and Pod::Simple::DumpAsXML, making the POD-to-XML journey look like:
use Pod::Simple::DumpAsXML;
my $parser = Pod::Simple::DumpAsXML->new;
$parser->output_string( my $xml );
$parser->parse_string_document( $source_code );
print $xml;
So far, so good.
Close the doors, we’re altering the patient
Now the fun part comes: modifying the document.
As I want extensibility and modularity, I went for a plugin approach, where
the POD document (presented as a thinly wrapped Web::Query
object)
would be passed through all the plugins in different stages.
And to make it real, I crafted a set of plugins that would exercise the basic manipulations I’d expect the framework to support:
---
plugins:
# create the '=head1 NAME' section from the package/#ABSTRACT lines
- Abstract
# add an AUTHORS section
- Authors:
authors:
- Yanick Champoux
# grok '=method' elements
- Methods
# sort the POD elements in the given order
- Sort:
order:
- NAME
- SYNOPSIS
- DESCRIPTION
- METHODS
- '*'
- AUTHORS
Let’s now see the different processing stages, and how those plugins implement them.
Stage 1: POD parser configuration
First, when the Pod::Simple
parser is created, each plugin
is given the chance to tweak it. For the moment, this is
mostly to give them the opportunity to declare new POD elements.
For example, the ‘Methods’ plugin has
package Pod::Knit::Plugin::Methods;
use Moose;
with 'Pod::Knit::Plugin';
sub setup_parser {
my( $self, $parser ) = @_;
$parser->accept_directive_as_processed( 'method' );
}
Stage 2: Preprocessing, aka putting those Russian Dolls together
The second stage is the “preprocessing” stage, where plugins take the
raw output of Pod::Simple::DumpAsXML
and groom it into
the desired base structure. In most of the cases, that will be
turning the raw flat list of elements given by Pod::Simple
into a structure form.
For example, the raw head elements look like
<head1>DESCRIPTION</head1>
<para>Blah blah<para>
<verbatimformatted>$foo->bar</verbatimformatted>
<para>More blah</para>
<head1>OTHER SECTION</head1>
...
but what we want is
<head1>
<title>DESCRIPTION</title>
<para>Blah blah<para>
<verbatimformatted>$foo->bar</verbatimformatted>
<para>More blah</para>
</head1>
...
There’s an implicit plugin, HeadsToSections
, that take care of that.
And in our example, the plugin ‘Methods’ does the same thing for
=method
elements, slurping in the relevant following elements:
sub preprocess {
my( $self, $doc ) = @_;
$doc->find( 'method' )->each(sub{
$_->html(
'<title>'. $_->html . '</title>'
);
my $done = 0;
my $method = $_;
$_->find( './following::*' )->each(sub{
return if $done;
my $tagname = $_->tagname;
return if $done = !grep { $tagname eq $_ }
qw/ para verbatimformatted /;
$_->detach;
$method->append($_);
});
});
}
Stage 3: Do your thing
Finally, the stage where we can expect the document to be in the proper format, and where the plugins can go wild.
Things can be inserted. Based just on configuration items:
package Pod::Knit::Plugin::Authors;
use Moose;
use Web::Query;
with 'Pod::Knit::Plugin';
has "authors" => (
isa => 'ArrayRef',
is => 'ro',
lazy => 1,
default => sub {
my $self = shift;
[];
},
);
sub transform {
my( $self, $doc ) = @_;
my $section = wq( '<over-text>' );
for ( @{ $self->authors } ) {
$section->append(
'<item-text>' . $_ . '</item-text>'
);
}
# section() will return the existing
# section with that title, or create
# a new one if it doesn't exist yet
$doc->section( 'authors' )->append(
$section
);
}
Or by looking at the source code or whatever
Pod::Knit
makes accessible to the plugins.
package Pod::Knit::Plugin::Abstract;
use Moose;
use Web::Query;
with 'Pod::Knit::Plugin';
sub transform {
my( $self, $doc ) = @_;
my( $package, $abstract ) =
$self->source_code =~ /^s*packages+(S+);s*^s*#s*ABSTRACT:s*(.*?)s*$/m
or return;
$doc->section( 'name' )->append(
join '',
'<para>',
join( ' - ', $package, $abstract ),
'</para>'
);
}
Things can also be modified.
package Pod::Knit::Plugin::Methods;
...
sub transform {
my( $self, $doc ) = @_;
my $section = $doc->section( 'methods' );
$doc->find( 'method' )->each(sub{
$_->detach;
$_->tagname( 'head2' );
$section->append($_);
});
}
Or reordered.
package Pod::Knit::Plugin::Sort;
use Moose;
with 'Pod::Knit::Plugin';
has "order" => (
isa => 'ArrayRef',
is => 'ro',
lazy => 1,
default => sub {
[]
},
);
sub transform {
my( $self, $doc ) = @_;
my $i = 1;
my %rank = map { uc($_) => $i++ } @{ $self->order };
$rank{'*'} ||= $i; # not given? all last
my %sections;
$doc->find('head1')->each(sub{
$_->detach;
my $title = uc $_->find('title')->first->text =~ s/^s+|s+$//gr;
$sections{$title} = $_;
});
for my $s ( sort { ($rank{$a}||$rank{'*'}) <=> ($rank{$b}||$rank{'*'}) } keys %sections ) {
$doc->append( $sections{$s} );
}
}
1;
Basically, anything goes.
Delicious sausages come out
With the transformed document being XML with a specific schema, we’re now free to use whatever transformation engine we want. To make the prototype go full circle, I had to re-translate that XML into POD. And to do that, I resorted to good ol’ insane XML::XSS:
package Pod::Knit::Output::POD;
use Moose::Role;
use XML::XSS;
sub as_pod {
my $self = shift;
my $xss = XML::XSS->new;
$xss->set( 'document' => {
pre => "=pod\n\n",
post => "=cut\n\n",
});
$xss->set( "head$_" => {
pre => "=head$_ ",
}) for 1..4;
$xss->set( 'title' => {
pre => '',
post => "\n\n",
});
$xss->set( 'verbatimformatted' => {
pre => '',
content => sub {
my( $self, $node ) = @_;
my $output = $self->render( $node->childNodes );
$output =~ s/^/ /mgr;
},
post => "\n\n",
});
$xss->set( 'item-text' => {
pre => "=item ",
post => "\n\n",
});
$xss->set( 'over-text' => {
pre => "=over\n\n",
});
$xss->set( '#text' => {
filter => sub {
s/^s+|s+$//mgr;
}
} );
$xss->set( 'para' => {
content => sub {
my( $self, $node ) = @_;
my $output = $self->render( $node->childNodes );
$output =~ s/^s+|s+$//g;
return $output . "\n\n";
},
} );
$xss->render( $self->as_xml );
}
1;
Mind you, it’s not nowhere near complete. But it’s enough to make Pod::Knit
take
package Foo::Bar;
# ABSTRACT: Do things
=head1 SYNOPSIS
blah blah
blah
=method one
Do things
$self->one
and some stuff
=method two
Do other things.
Not used often.
=head1 DESCRIPTION
Blah
=head2 subtitle
More blah
=over
=item foo
something
=item bar
=back
and end up with
=pod
=head1 name
Foo::Bar - Do things
=head1 SYNOPSIS
blah blah
blah
=head1 DESCRIPTION
Blah
=head2 subtitle
More blah
=over
=item foo
something
=item bar
=head1 methods
=head2 one
Do things
$self->one
and some stuff
=head2 two
Do other things.
Not used often.
=head1 authors
=over
=item Yanick Champoux
=cut