The next major release of FreeBSD, version 7, is one of the
most significant so far, with amount of new technologies and
improvement largest since the introduction of 5.0. Since constantly
searching the mailing lists for important changes can be a bit
tedious, I've created this (frequently updated) page to list
some of the more interesting new things in one place.
FreeBSD 7.0 has been released!
I've now started the continuation of this project:
What's cooking for FreeBSD 8.
Also useful are the quarterly Status Reports:
2007 / Q3
2007 / Q2
2007 / Q1
2006 / Q4
If you're interested in how FreeBSD gets developed, you're encouraged to
read the mailing lists
and developer blogs.
Network stack improvements and cleanup
Even though this document mentions only several people, the effort
to improve the network stack and its performance has been carried by
many.
New sendfile() implementation, improved sosend()
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Andre Oppermann, Robert Watson
Homepage: http://people.freebsd.org/~andre/,
announcement message
While working on TSO support, Andre Oppermann has found
several ways to optimize kernel's internal networking support. The new
sendfile() implementation sends larger chunks of data at once
and improves performance up to 5x when used with TSO and other new
enhancements. Improvements to
the sosend() and related functions resulted in lowering the
CPU consumption of sending side of network interfaces almost three
times. Note that these are microbenchmarks and performance
improvements in real usage still needs to be quantified.
TSO and LRO support
Status: Committed or ready for -CURRENT
Will appear in 7.0: sure
Author: Andre Oppermann and Andrew Gallatin
Homepage: http://people.freebsd.org/~andre/
The ongoing effort to improve FreeBSD's network performance
(especially after the hit taken during transition to SMP) has
resulted in the new ability to support TSO (TCP/IP segmentation offload)
and LRO (Large Receive Offload)
hardware on gigabit and faster cards. Some of the drivers
that support TSO include: em, bc, cxgb, ixgbe, msk, mxge, nxge, nfe, re
(or in plain words: Intel, Broadcom, NVidia, Realtek and other
cards, gigabit or better). LRO support is currently in mxge.
TCP socket buffers auto-sizing
Status: Partially committed to -CURRENT
Will appear in 7.0: sure
Author: Andre Oppermann
Homepage: http://people.freebsd.org/~andre/
FreeBSD has a default 32K send socket buffer. This supports a maximal
transfer rate of only slightly more than 2Mbit/s on a 100ms RTT trans-
continental link. Or at 200ms just above 1Mbit/s. With TCP send buffer
auto scaling and the default values below it supports 20Mbit/s at 100ms
and 10Mbit/s at 200ms. Both read and write buffer are auto-sized.
While the support for send buffers auto sizing is committed, patches
for receiving side are still under testing.
Rapid Spanning Tree Protocol (802.1w)
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Andrew Thompson
Homepage: http://people.freebsd.org/~thompsa/
RSTP provides faster spanning tree convergence. The protocol will exchange
information with neighboring switches to quickly transition to forwarding
without creating loops. The code will default to RSTP mode but will downgrade
any port connected to a legacy STP network so is fully backward compatible.
SCTP (Stream Control Transmission Protocol)
Status: Committed to -CURRENT
Will appear in 7.0: sure
Authors: Randall Stewart, George Neville-Neil
Homepage: http://www.sctp.org/
FreeBSD is the reference implementation for the SCTP.
Like TCP, SCTP provides a reliable transport service, ensuring that
data is transported across the network without error and in sequence.
Like TCP, SCTP is a session-oriented mechanism, meaning that a
relationship is created between the endpoints of an SCTP association
prior to data being transmitted, and this relationship is maintained
until all data transmission has been successfully completed.
Unlike TCP, SCTP provides a number of functions that are critical for
telephony signaling transport, and at the same time can potentially
benefit other applications needing transport with additional
performance and reliability.
Link aggregation / trunking
Status: committed to -CURRENT
Will appear in 7.0: sure
Author: Reyk Floeter (from OpenBSD)
Manpage: lagg(4)
OpenBSD's trunk(4) was imported to FreeBSD in time to be shipped
in FreeBSD 7.0. The trunk interface allows aggregation of multiple network interfaces as
one virtual trunk interface for the purpose of providing fault-tolerance
and high-speed links. The driver currently supports the trunk protocols
failover (the default),
fec, lacp, loadbalance, roundrobin, and none.
Improvements to kernel facilities
PMC performance monitoring
Status: Available in -CURRENT, partially available in RELENG_6
Will appear in 7.0: sure
Author: Joseph Koshy
Homepage: http://people.freebsd.org/~jkoshy/projects/perf-measurement
This project implements a kernel module (hwpmc(4)), an application programming
interface (pmc(3)) and a few simple applications (pmcstat(8) and pmccontrol(8))
for measuring system performance using event monitoring hardware in modern CPUs.
Some parts (hwpmc, libpmc, pmcstat) were developed even before
RELENG_6 was branched and new development goals for 7.x include support for
callgraphs and a GUI front end.
Interrupt filtering
Status: Mostly committed to -CURRENT
Will appear in 7.0: sure
Author: Paolo Pisati
Homepage: wiki page
Interrupt filtering is a new method to handle interrupts in FreeBSD
that retains backward compatibility with the previous models (FAST and
ITHREAD), while improving over them in some aspects. With interrupt
filtering, the interrupt handler is divided into 2 parts: the filter
(that checks if the actual interrupt belongs to a device) and a
private per-handler ithread (that is scheduled in case some blocking
work has to be done). The main benefits of this work are:
Feedback from filters (the operating system finally knows what's
the state of an event and can react consequently).
Lower latency/overhead for shared interrupt line.
Previous experiments with interrupt filtering showed an increase
in performance against the plain ithread model in some cases.
General shrink of the machine dependent code - part of the
interrupting handling code was turned into machine independent
code.
Linuxulator for Linux 2.6
Status: Committed to -CURRENT
Will appear in 7.0: sure
Authors: Alexander Leidinger, Roman Divacky
Homepage: blog post,
cvs commit note
FreeBSD includes support for natively executing Linux binaries. This is
done via runtime translation of Linux syscalls to BSD syscalls, with no
performance penalty. The facility is colloquially called the "linuxulator".
Linuxulator in -CURRENT has been updated to run binaries made for Linux
2.6.16 (though the default for 7.0 will still be 2.4), and the official
Linux environment will be Fedora Core 5.
New scheduler: ULE 2.0 / 3.0
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Jeff Roberson
Homepage: CVS file reference,
commit message,
description
The original SCHED_ULE was under-performing and buggy, so it got reworked.
The new scheduler replaces, and has the same name as, SCHED_ULE, but is of a
somewhat different architecture. It replaces the double queue mechanism with
circular queues, and fixes a lot of other things, but it's still an O(1)
scheduler with per-CPU queues.
During SCHED_ULE 2 development there was a brief period where there was a
third (or fourth, depending on how you count) scheduler, named SCHED_SMP,
forked from SCHED_ULE 2 and heavily optimized for configurations with
large number of CPUs (8+).
This SCHED_SMP has been renamed and committed as SCHED_ULE. While the new
scheduler will really shine for multi-CPU machines, it's now also recommended
for single processor systems as it has much better interactive performance
(mixing of processes with different requirements for IO vs CPU time). ULE
will not be enabled by default for 7.0 but it's an officially recommended
performance optimization.
Improved accounting file format
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Diomidis Spinellis
Manpage: acct(5)
The accounting record format has been revised to store time values with
microsecond precision. This allows the recording of meaningful values
for short-running commands on modern fast processors. The adoption of
the IEEE 754 float format for storing time and usage values greatly
increases their range and precision, and also simplifies the processing
of accounting records by third party tools. The new record format and
the tools lastcomm(1) and sa(8) maintain backwards compatibility with
the original accounting format.
Storage subsystems' improvements
ZFS
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Pawel Jakub Dawidek
Homepage: announcement message,
commit announcement message
Sun's ZFS is in the process of being ported to FreeBSD, with the
intention of offering most (or all) features found in the original
implementation. It's integrated with FreeBSD's existing features like
UFS and GEOM, thus offering the possibility of creating FreeBSD UFS
file systems on ZFS volumes, and using GEOM providers to host ZFS
file systems.
ZFS is an advanced file system (actually, a combination of file
system and volume manager) with many interesting features built-in:
snapshots, copy-on-write, dynamic striping and RAID5, up to 128-bit
file system size (limited to 64 bits in practice
even in Solaris
- there's no 128-bit integer type in standard C language), and
globally optimal I/O sorting and aggregation. It's marked
EXPERIMENTAL in 7.0-RELEASE.
ZFS is still experimental on FreeBSD, and it's recommented that
users get familiar with FreeBSD
ZFS documentation before using it. For a more light-hearted
introduction see this
presentation by Pawel.
tmpfs
Status: Committed to -CURRENT
Will appear in 7.0: sure
Authors: Julio M. Merino Vidal, Rohit Jalan, Howard Su, Glen Leeder
Homepage: TMPFS page on FreeBSD wiki,
TMPFS at NetBSD
TMPFS is a memory file system designed to efficiently allocate (and
deallocate) memory used for the file system itself, as contrasted to the
"usual" way of creating memory file systems by creating memory storage
devices ("RAM drives"). It's marked EXPERIMENTAL for 7.0-RELEASE.
gjournal
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Pawel Jakub Dawidek
Homepage: http://bsdblogs.droso.org/pjd,
announcement message
Gjournal is a GEOM storage class that provides data journaling
facilities to any providers (and consumers) the user needs. As a
special case it has support in UFS file system code, and in this
combination it makes UFS a journaled file system. In itself,
gjournal consumes two devices (one for the data, one for the
journal) and provides one. Since it takes special care to work
well with disk drive hardware caches, it can be used to accelerate
and provide reliability in many other uses, such as GELI and GBDE
encrypted device providers.
I'm proud to say current gjournal is a continuation of my idea
implemented for Google's Summer of Code 2005.
gvirstor
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Ivan Voras
Homepage: http://wikitest.freebsd.org/gvirstor
Gvirstor is a GEOM storage class that provides a
storage device of arbitrary size in "overcommit" mode (i.e. larger
than physically available storage). Providers can be
added to the virstor device on-line (while used, e.g. mounted),
and removed if unused and at the end of the list of components.
This work was created by me, with Pawel Jakub Dawidek as mentor
and sponsored by Google in Summer of Code 2006.
gmultipath
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Matt Jacob
Homepage: CVS message
Gmultipath allows failover between multiple
devices that represent the same storage device.
This is an active/passive{/passive...}
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
New platforms
New platform: ARM architecture
Status: Committed to -CURRENT, MFC-ed to RELENG_6
Will appear in 7.0: sure
Authors: Olivier Houchard, Warner Losh & more
Homepage: http://www.freebsd.org/platforms/arm.html,
http://bsdimp.blogspot.com/
Support for ARM embedded architecture has been under development since
6.0, enabling FreeBSD presence in the embedded markets.
The support is now MFC-ed to 6.x and is available in 6.2-RELEASE.
It's still under development and will likely support more boards in the
future.
New platform: sun4v (Niagara / T1)
Status: Committed to -CURRENT
Will appear in 7.0: probably
Authors: Kip Macy, John Birrell & more
Homepage: CVS announcement
There's still a long way to fully supporting Sun's Niagara/sun4v platform,
but progress is slowly being made. Niagara offers advanced features such as
eight cores and 32 threads per CPU, and hardware public key cryptography
acceleration. Unfortunately, this architecture is not supported out-of-the-box
in 7.0.
Security features
Security event auditing
Status: Committed to -CURRENT, MFC-ed to RELENG_6
Will appear in 7.0: sure
Authors: Robert Watson & more
Homepage: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/audit.html
Event auditing allows the reliable, fine-grained, and configurable
logging of a variety of security-relevant system events, including logins,
configuration changes, and file and network access. These log records
can be invaluable for live system monitoring, intrusion detection, and
postmortem analysis. FreeBSD implements Sun's published BSM API and file
format, and is interoperable with both Sun's Solaris and Apple's
Mac OS X audit implementations.
Audit framework was MFC-ed to RELENG_6 and is available in 6.2-RELEASE.
New privilege separation capabilities
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Robert Watson
Homepage: list announcement
This is a framework which can be used together with MAC
to creating policies similar to RBAC (as seen in Solaris & others)
which allow the root privilege to be separated into several fine
grained capabilities such as "can access the network" or "can bypass
file system quotas". This is work in progress and no shipped
policy modules directly implement all of the functionality yet.
Multimedia features
Hi-def audio
Status: Mostly committed to -CURRENT
Will appear in 7.0: sure
Author: Ariff Abdullah
Homepage: http://people.freebsd.org/~ariff/HDA/
Newly developed driver, snd_hda has been developed to
support professional sound equipment and new hardware on the
market. HDA hardware is capable of delivering 192 kHz/32 bit
quality for two and 96 kHz/32 bit for up to eight channels.
Latency has been reduced for many cases.
Related to this, new drivers for envy24(ht) sound hardware
is committed to -CURRENT, and multichannel audio support is due
to be finished soon.
Userland enhancements
jemalloc
Status: Committed to -CURRENT
Will appear in 7.0: sure
Author: Jason Evans
Homepage: http://people.freebsd.org/~jasone/jemalloc/
The currently used malloc() library, called phkmalloc
since its creator is Poul-Henning Kamp, is almost a decade old in
its present implementation. It was designed for a time when memory
was scarce, the priorities considered in memory allocation were
different, and multithreading was still an academic idea. Even so, it's
one of the more popular malloc() implementations, used in all BSDs
and even some historical Linux distributions.
Because of its inefficiency when used in multithreaded applications
running on multiprocessor systems, a new userland memory allocator was
created, named jemalloc after Jason Evans, its creator. The
improvements in computer speed and memory availability mean that
compared to phkmalloc, which only needed to be conservative in memory
usage, jemalloc needed to be more sophisticated and account for
low-level properties such as CPU cache locality and parallel execution.
The result is an allocator which is optimized for multithreading,
using multiple allocation arenas to help concurrency. On single
processor systems there's only one arena, while on multi-processor
or multi-core systems there are four times as many arenas as there
are processors. Allocations are divided into broad classes based on
their size and those classes are further subdivided. Benchmarks show
that jemalloc does significantly better in multithreaded
applications (like MySQL) and for applications that make many small
allocations.
Bits & pieces
Authors: many
Here are some additional changes for 7.0 that are not so
glamorous or are smaller in scope:
Lots of performance improvements on SMP machines (see
MySQL read performance,
MySQL write performance and
BIND performance
graphs.)
Significantly increased scalability on SMP machines, mainly from
extraordinary work done by David Xu (the libthr threading library),
Jeff Roberson
(scheduler, flock locking), Atillio Rao (improved
kernel locking performance) and Robert Watson (file descriptor locking,
unix sockets locking and more).
Significantly increased
network scalability,
resulting mostly from switch
to direct dispatch of the network stack from netisr. This is
especially helpful for 10 Gbit/s NICs and was mainly done by
Robert Watson and Kip Macy.
GIANT lock has been pushed further back, and almost all kernel
subsystems are now finely locked (e.g. VM, VFS, Net).
Some of the recent improvements are: locking the CAM subsystem
and many SCSI drivers (by Scott Long), and similar
locking work has been done on the NFS client and the Firewire
implementation.
iSCSI initiator (iSCSI target is available in ports)
SATA support
Read-only access to XFS file systems
Added support for MSI/MSI-X extensions to PCI
Support for Apple (Mac) hardware is being worked on
pf firewall updated to 4.1
X.Org 7.2 - things like beryl now work if you have the right
drivers
gcc 4.2
Implemented symbol versioning for many base OS libraries
libthr becomes the default threading library
Things that didn't make it
Despite plans and best efforts, some things won't make it into FreeBSD 7.0-RELEASE.
These are:
SCHED_CORE - Doesn't perform as well as SCHED_ULE2
DTrace - MFCed into 7.2.
Superpages - MFCed into 7.2.
Of course, this much new technology will need much testing before it's
ready for use. You can help by installing a snapshot of -CURRENT and
running it on as close to your regular load as possible. Disable debugging
features (which are enabled by default during development) before
benchmarking.