Friday was my last day
at Solarflare
Communications, which I joined in 2006 (originally as Level 5
Networks).
When I started, Level 5 Networks (L5N) had a 2-port gigabit Ethernet
controller called EF1 and a user-level network stack called
EtherFabric which ran on Linux. It had nearly completed the Falcon
project to develop a 10-gigabit controller (also supporing a 2-port
gigabit mode) and was porting EtherFabric to Windows. Later that
year, however, L5N merged with Solarflare Communications (SF), which
was developing a 10GBASE-T PHY (for 10-gigabit Ethernet over
twisted-pair cables). The investors and management saw the
opportunity to create an integrated 10GBASE-T LAN-on-motherboard
chip that would be attractive to OEMs and so had the potential for
mass sales. EtherFabric was put on the back-burner, but regular net
drivers for Linux, Windows and other operating systems became more
important.
In a change from my previous jobs, my initial role at L5N was to
maintain and extend the in-house test automation software, Runbench,
written in Python. I worked with Dickon Reed who had written most
of Runbench up to then. One of my first major tasks was to extend
it to cover DUTs running Windows. Aside from changing Runbench
itself, I wrote a simple SSH server to be installed on DUTs, using
the
Twisted.Python framework.
(Cygwin's port of OpenSSH wasn't suitable for running a lot of
native Windows commands.) I rewrote the test-installation script
for the Windows driver itself, converting it to Python (with a C
extension to get around
the WOW64
nonsense). As Windows does not allow loading a driver just once
(like insmod in Linux), except by using the kernel
debugger, I added a time check in the driver so the
test-installation script can make it fail early when reloaded after
a reboot, so that a crasher bug in the probe path would usually be
recoverable.
In 2007 I persuaded management in Cambridge to loan several older
servers to DebConf. I
took them there and back by car, and they were used to convert video
for live streams
and recordings.
(This was repeated in 2009.)
I was also asked to work on other small development tasks, including
some cleanup on the Linux net driver (sfc). In mid-2007, when SF
decided it was time to get sfc upstream (in-tree), I joined Steve
Hodgson and Robert Stonehouse in working on that.
Out-of-tree (separately distributed) drivers for Linux normally need
a large number of preprocessor conditions for compatibility with
Linux's kernel
module API across a range of supported versions. They may also
implement some features with a driver-specific interface that should
be replaced with a standard interface (possibly including the work
to define that standard). They may not comply with the kernel
coding style or unwritten conventions for kernel code. All of these
problems were present and had to be dealt with. As part of this
work, I extended unifdef
so that it could be used to remove all of the backward-compatibility
and not-for-upstream code without introducing extra blank lines. I
could then export the out-of-tree driver from CVS(!) into a kernel
source tree automatically and turn it into a patch or patch series
that would stand a chance of being acceptable upstream.
Our
first
few
attempts
got
little
response,
but eventually we newbies got the message that the driver was simply
too
big for a single submission and that putting patches on a web
site because they're too big for the list was not a solution. In
the next week, I removed about half of the code (on a git branch),
resulting in a relatively lean driver that could be sent to the
netdev list. And finally the 9th (I think) submission
was accepted.
Following this, I continued to work on sfc but was also brought into
the Siena project. Siena (SFL9021) was the LAN-on-motherboard chip
that had been planned following the merger. It combined a dual-port
controller based on Falcon with a 10GBASE-T PHY and a management
controller (MC) to support Lights-Out Management (LOM). (Since
10GBASE-T has not taken off, it is now mostly sold as the SFC9020
variant in which the PHY is disabled.) I worked on some of the test
framework and test cases for validating the controller design in
software simulation and FPGA. In the course of this I learned to
read and (somewhat) understand the chip design written in Verilog,
but only made a single trivial fix to it.
In August 2009, the first Siena ASICs came back from the fab. There
were a few ASIC-specific bugs but all of them were quickly worked
around in firmware, so it would soon be ready for production. But
there was much work still to be done on the driver and firmware.
The firmware had until then been running in a smaller block of RAM
with no peripherals to manage, while sfc had earlier been modified
just enough to configure a sim or FPGA for running a userland test
application. sfc now had many special cases scattered around it,
and would tell the MC firmware to peek and poke registers that were
no longer directly accessible from the host.
The software and firmware developers had agreed that, in order to
support LOM, the MC would be responsible for managing the ports and
all peripherals and drivers would send higher-level requests to the
MC. As a bonus, this removed the need for the driver to know the
details of each new board, so that it would not be necessary to
backport sfc into distribution kernels very often. Over the next
few months, the firmware team did an excellent job of defining and
implementing those operations, including writing the driver-side
functions to invoke them. Meanwhile, Steve and I concentrated on
refactoring sfc so most of the Falcon/Siena differences could be
abstracted through a few structures with function pointers.
The refactoring and new code for Siena were completed and submitted
upstream in several large patch series in October and November. In
fact, there were so many patches that Solarflare appeared in LWN's
table of who wrote Linux
2.6.33 (as did I, thanks also to another patch series adding
firmware metadata to many drivers). Sadly we had missed Linux
2.6.32 which was a longterm stable branch and used in many
distributions, but I was able to get this version backported into
all the major distributions over the next year.
(To be continued.)