2012-03-28

Update: I'd like to revive (and revise) this question a bit because there have been some recent developments, and furthermore I would be happy to encourage some up-to-date discussion.

I'm thinking (from a professional point of view) about fully automatic generation of newspapers from data.

More precisely, the system under consideration would get as input an 'attributed' data stream of articles (subject classification, headers, author info &c, text, images) plus some hints on the way things should be layouted, but only on the level of "lead story", "short message", "weather report".

As output, a complete newspaper would be generated automatically without further user interaction (with a focus on print, not online; i.e. PDF rather than HTML).

Note that I'm not looking for help on how to do this with LaTeX. There won't be technical difficulties with page and article layout using my system DocScape. I'm asking (myself) about the basic algorithm for "geometrically" generating the page layout based on the given content stream. There has to be some 'artificial intelligence' in there to make the newspaper look good also from a professional newspaper editor's point of view.

Of course, any production-quality system would yield a valid answer, including those based on TeX ;-)

googling yields some interesting references, but it's hard to distinguish which of them would really lead to an effective implementation. I'm not talking about an academic exercise here but about a real system which would be used by a publisher to produce hundreds of newspapers each week.

There are further interesting references in the area of floorplanning for VLSI layout, but these lack consideration for specific needs of newspapers, of course ;-)

Now my questions a bit more precisely:

Does a system like described above effectively exist (it doesn't have to be based on TeX)? I'd be interested in pointers to concrete systems as well as publications about them.

Are there publishers who really use a system like this for making newspapers (online would be interesting as well)?

Has anyone here ever worked with such a system and would care to describe how it's used?

What are the most interesting "scientific" publications on this subject which I should consider when designing such a system myself?

I have seen the question Automatic newspaper creation in LaTeX, but it's got a slightly different focus than mine (what LaTeX tools to use), and unfortunately the discussion there wasn't very intense, yielding no pointers which would help me.

Some Literature

Here I'll add a review of literature I've collected on the subject. Note that I have not read all of it, so if I have misrepresended something, please comment.

Schoon, Benjamin Durant
Fishpaper : automatic personalized newspaper layout

Thesis (B.S.)-Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.

More of a historical account than a real contribution to this subject. "automatic personalized newspaper layout" here doesn't include automatically finding a good page layout. The page layout is given by a fixed template, though the system supposedly can account for different text lengths or image sizes of article content, or display alternative content when some element is missing.

It is historically interesting because it falls in the advent of the WWW. The browser Mosaic is explicitly mentioned as a device for electronically presenting news items, but in a time before HTML 2.0, apparently the possibilities for screen formatting were limited. TeX is also explicitly mentioned, in the sense of a somewhat competing product to the software fishpaper presented, which produces PostScript files from a given stream of news content and given page layout templates.

The State of the Art?

Since I'm thinking about this subject, ads and blog posts about systems to make newspapers keep popping out to me ;-)

Without a connection to one specific vendor, I'd just mention two examples which seem to represent the state of the art for systems which make newspaper layout easy:

A tool named "publishing cloud" seems to be a good representative of a large range of almost equivalent editing systems (easy to find with google) based on some easy-to-use web-based layout editor which is, however, template-based with a mostly manual page layouting process. The tools automate several stages of the publishing process, offering import filters for content (mostly to get content from web pages or newswire systems) and export to PDF or digital printing services, but not the part I'm interested in here, namely the process of arranging content on the document pages.

A recent blog post reports on an effort to produce a specific digitally printed edition of The Guardian for a cafe. There is a lot of talk about "experimental, algorithmic newspaper" there, but the tool which was employed, named ARTHR, seems to fall exactly in the category described above. It seems the main new development was a specific connection between the Guardians CMS to the input filter of the ARTHR tool, but layouting the newspaper was still a manual process (taking one hour for one person, they say, which seems reasonable considering the rather uninspired, newsletter-like layout scheme).

I would be interested in any hint that one of the systems in this area offers "real" automtic page layouting for a non-trivial newspaper layout, not just a really easy-to-use web frontend to do it manually.

Last but not least, I should mention that we have implemented a newsletter-generating system for a news agency which is completely automatically generating different types of newsletters every day and every week:

EPD Wochenspiegel, a weekly news compilation which exists in a multitude of local and thematic variants.

EPD Medien, another weekly newsletter with a specific theme (media) and a slightly different layout.

EPD Zentralausgabe, a daily newsletter, again existing in multiple local variants.

Here, everything is fully automatic: Only the compilation of articles has to be selected in the wire service application. But the layout is not what I would consider "Newspaper Layout", so these examples represent the state of the art we can currently produce, but do not answer my question.

Show more