public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tom Lane <[email protected]>
To: [email protected]
Subject: Re: Switching to XML
Date: Sat, 09 Dec 2006 13:32:41 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <1165655250.2621.10.camel@josh>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<1165655250.2621.10.camel@josh>

>> The french team also uses Docbook XML and they can generate a PDF in 30
>> minutes... it takes us DAYS because of the SGML.

Has anyone looked into actually fixing the performance problem?

oprofile results for jade trying to produce tex output from our docs are
suggestive of a localized performance issue:

samples  %        symbol name
2082917  98.5829  OpenJade_DSSSL::PairNodeListObj::nodeListFirst(OpenJade_DSSSL::EvalContext&, OpenJade_DSSSL::Interpreter&)
9713      0.4597  OpenJade_DSSSL::PairNodeListObj::nodeListRest(OpenJade_DSSSL::EvalContext&, OpenJade_DSSSL::Interpreter&)
5019      0.2375  OpenJade_DSSSL::AppendSosofoObj::traceSubObjects(Collector&) const
3571      0.1690  Collector::collect()
1938      0.0917  OpenJade_DSSSL::FlowObj::traceSubObjects(Collector&) const

I attached to the process with gdb and found it nested four thousand (!)
call levels deep in OpenJade_DSSSL::PairNodeListObj::nodeListFirst and
OpenJade_DSSSL::PairNodeListObj::nodeListRest calls.  Meanwhile, looking
at the output-so-far-emitted makes me think it was working on a fairly
large <programlisting> example.  The last little bit is:

{asis}\def\InputWhitespaceTreatment%
{preserve}}\Seq%
{}\Seq%
{}~~~~\endSeq{}/*
\Seq%
{}~~~~\endSeq{}~*~testlibpq2.c
\Seq%
{}~~~~\endSeq{}~*~~~~~~Test~of~the~asynchronous~notification~interface
\Seq%
{}~~~~\endSeq{}~*
\Seq%
{}~~~~\endSeq{}~*~Start~this~program,~then~from~psql~in~another~window~do
\Seq%
{}~~~~\endSeq{}~*~~~NOTIFY~TBL2;
\Seq%
{}~~~~\endSeq{}~*~Repeat~four~times~to~get~this~program~to~exit.
\Seq%
{}~~~~\endSeq{}~*
\Seq%
{}~~~~\endSeq{}~*~Or,~if~you~want~to~get~fancy,~try~this:
\Seq%
{}~~~~\endSeq{}~*~populate~a~database~with~th

What it looks like to me is that there is some bit of stupidity that is
producing a deeply nested list representation of a <programlisting>
section, probably one list level per character in the text, making the
runtime O(N^2) or worse in the length of the <programlisting>.  (The
particular example it's stuck on here is about 10K characters.)

Since jade does not go into this kind of spiral when producing html
output from the same sources, I suggest that it's not jade's fault,
but rather crummy coding in the sgml-to-tex conversion scripts it's
using.  I don't know enough about those to know where to look, but maybe
someone here does?

			regards, tom lane



view thread (122+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: Switching to XML
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox