A copy of this post is sent to
soc-coordination@lists.alioth.debian.org
as well as to
debian-bootstrap@lists.mister-muffin.de.
Diary
July 2
playing around with syntactic dependency graphs and how to use them to flatten dependencies
July 4
make work with dose 3.0.2
add linux-amd64 to source architectures
remove printing in build_compile_rounds
catch Not_found exception and print warning
use the whole installation set in crosseverything.ml instead of flattened dependencies
detect infinite loop and quit in crosseverything.ml
use globbing in _tags file
use wildcards and patsubst in makefile
July 5
throw a warning if there exist binary packages without source packages
add string_of_list and string_of_pkglist and adapt print_pkg_list and print_pkg_list_full to use them
fix and extend flatten_deps - now also tested with Debian Sid
July 6
do not exclude the crosscompiled packages from being compiled in crosseverything.ml
clean up basebuildsystem.ml, remove old code, use BootstrapCommon code
clean up basenocycles.ml, remove unused code and commented out code
add option to print statistics about the generated dependency graph
implement most_needed_fast_wrong as well as most_needed_slow_correct and make both available through the menu
July 7
allow to investigate all scc, not only the full graph and the scc containing the investigated package
handle Not_found in src_list_from_bin_list with warning message
handle the event of the whole archive actually being buildable
replace raise Failure with failwith
handle incorrectly typed package names
add first version of reduced_dist.ml to create a self-contained mini distribution out of a big one
July 8
add script to quickly check for binary packages without source package
make Debian Sid default in makefile
add *.d.byte files to .gitignore
README is helpful now
more pattern matching and recursiveness everywhere
July 9
fix termination condition of reduced_dist.ml
have precise as default ubuntu distribution
do not allow to investigate an already installable package
July 10
milestone: show all cycles in a graph
add copyright info (LGPL3+)
July 11
advice to use dose tools in README
July 16
write apt_pkg based python filter script replacing grep-dctrl
July 17
use Depsolver.listcheck more often
add dist_graph.ml
refactor dependency graph code into its own module
July 18
improve package selection for reduced_dist.ml
improve performance of cycle enumeration code
July 20
implement buildprofile support into dose3
July 22
let dist_graph.ml use commandline arguments
July 23
allow dose3 to generate source package lists without Build-{Depends|Conflicts}-Indep
July 29
implement crosscompile support into dose3
Results
Readme
There is not yet a writeup on how everything works and how all the pieces of
the code work together but the current README file provides a short
introduction on how to use the tools.
build and runtime dependencies
compile instructions
execution examples for each program
step by step guide how to analyze the dependency situation
explanation of general commandline options
A detailed writeup about the inner workings of everything will be part of a
final documentation stage.
License
All my code is now released under the terms of the LGPL either version 3, or
(at your option) any later version. A special linking exception is made to the
license which can be read at the top of the provided COPYING file. The
exception is necessary because Ocaml links statically, which means that without
that exception, the conditions of distribution would basically equal GPL3+.
reduced_dist.ml
Especially the Debian archive is huge and one might want to work on a reduced
selection of packages first. Having a smaller selection of the archive would be
significantly faster and would also not add thousands of packages that are not
important for an extended base system.
I call a reduced distribution a set of source packages A and a set of binary
packages B which fulfill the following three properties:
all source packages A must be buildable with only binary packages B being
available
all binary packages B except for architecture:all packages must be buildable
from source packages A
The set of binary packages B and source packages A can be retrieved using the
reduced_dist program. It allows to either build the most minimal reduced
distribution or one that includes a certain package selection.
To filter out the package control stanzas for a reduced distribution from a
full distribution, I originally used a call to grep-dctrl but later replaced
that by a custom python script called filter-packages.py. This script uses
python-apt to filter Packages and Sources files for a certain package
selection.
dist_graph.ml
It soon became obvious that there were not many independent dependency cycle
situation but just one big scc that would contain 96% of the packages that are
involved in build dependency cycles. Therefor it made sense to write a program
that does not iteratively build the dependency graph starting from a single
package, but which builds a dependency graph for a whole archive.
Cycles
I can now enumerate all cycles in the dependency graph. I covered the
theoretical part in another blog post and wrote an email about the
achievement to the list. Both resources contain more links to the respective
sourcecode.
The dependency graph generated for Debian Sid has 39486 vertices. It has only
one central scc with 1027 vertices and only eight other scc with 2 to 7
vertices. All the other source and binary packages in the dependency graph for
the archive are degenerate components of length one.
Obtaining the attached result took 4 hours on my machine (Core i5 @ 2.53GHz).
1.5 h of that were needed to build the dependency graph, the other 2.5 hours
were needed to run johnson's algorithm on the result. Memory consumption of the
program was at about 700 MB.
It is to my joy that apparently the runtime of the cycle finding algorithm for
a whole Debian Sid repository as well as the memory requirements are within
orders of magnitude that are justifiable when being run on off-the-shelf
hardware. It must also be noted that nothing is optimized for performance yet.
A list of all cycles in Debian Sid up to length 4 can be retrieved from this
email. This cycle analysis assumes that only essential packages,
build-essential and dependencies and debhelper are available. Debhelper is not
an essential or build-essential package but 79% of the archive build-depends on
it.
The most interesting cycles are probably those of length 2 that need packages
that they build themselves. Noticeable examples for these situations are vala,
python, mlton, fpc, sbcl and ghc. Languages seem love to need themselves to be
built.
Buildprofiles
There is a long discussion of how to encode staged build dependency
information in source packages. While the initial idea was to use
Build-Depends-StageN fields, this solution would duplicate large parts of the
Build-Depends field, which leads to bitrot as well as it is inflexible to
possible other build "profiles". To remedy the situation it was proposed to use
field names like Build-Depends[stage1 embedded] but this would also duplicate
information and would break with the rfc822 format of package description
files. A document maintained by Guillem Jover gives even more ideas and
details.
Internally, Patrick and me decided for another idea of Guillem Jover to
annotate staged build dependencies. The format reads like:
So each build profile would follow a dependency in <> "brackets" an have a
similar format as architecture options.
Patrick has a patch for dpkg that implements this functionality while I
patched dose3.
Dropping Build-Depends-Indep and Build-Conflicts-Indep
When representing the dependencies of a source package, dose3 concatenates its
Build-Depends and Build-Depends-Indep dependency information.
So up to now, a source package could only be compiled, if it manages to compile
all of its binary packages including architecture:all packages.
But when bootstrapping a new architecture, it should be sufficient to only
build the architecture dependent packages and therefor to only build the
build-arch target in debian/rules and not the build-indep target.
Only considering the Build-Depends field and dismissing the Build-Depends-Indep
field, reduced the main scc from 1027 vertices to 979 vertices. The amount of
cycles up to length four reduced from 276 to 206. Especially the cycles
containing gtk-doc-tools, doxygen, debiandoc-sgml and texlive-latex-base got
much less.
Patrick managed to add a Build-Depends-Indep field to four packages so far
which reduced the scc further by 14 vertices down to 965 vertices.
So besides staged build dependencies and cross building there is now a third
method that can be applied to break dependency cycles: add Build-Depends-Indep
information to them or update existing information.
I submitted a list of packages that have a binary-indep and/or a
build-indep target in their debian/rules to the list.
I also submitted a patch for dose3 to be able to specify to ignore
Build-Depends-Indep and Build-Conflicts-Indep information.
Dose3 crossbuilding
So far I only looked at dependency situations in the native case. While the
native case contains a huge scc of about 1000 packages, the dependency
situation will be much nicer when cross building. But dose3 was so far not able
to simulate cross building of source packages. I wrote a patch that
implements this functionality and will allow me to write programs that help
analyze the cross-situation as well.
Debconf Presentation
Wookey was giving a talk at debconf12 for which I was supplying him with
slides. The slides in their final version can be downloaded here
Future
Patrick maintains a list of "weak" build dependencies. Those are
dependencies that are very likely to be droppable in either a staged build or
using Build-Depends-Indep. I must make use of this list to make it easier to
find packages that can easily be removed of their dependencies.
I will have to implement support for resolving the main scc using staged build
dependencies. Since it is unlikely that Patrick will be fast enough in
supplying me with modified packages, I will need to create myself a database of
dummy packages.
Another open task is to allow to analyze the crossbuilding dependency
situation.
What I'm currently more or less waiting on is the inclusion of my patches into
dose3 as well as a decision on the buildprofile format. More people need to
discuss about it until it can be included into tools as well as policy.
Every maintainer of a package can help making bootstrapping easier by making
sure that as many dependencies as possible are part of the Build-Depends-Indep
field.