[pcap-ng-format] Hone introduction and PCAP-NG additions
Carpenter, Brandon J
brandon.carpenter at pnnl.gov
Tue Jun 26 12:54:46 PDT 2012
Hello,
Richard asked that I provide some information on the additions PNNL made
to the PCAP-NG format to support Hone data. Below is a quick
introduction and explanation of the changes. More information can be
found on our github page [https://github.com/HoneProject/Linux-Sensor].
Hone is a tool for performing packet-process correlation. It does so by
intercepting process and socket creation and destruction events,
correlating the two, and then bridging the gap in the network stack to
associate the packet to the socket, thereby expressing the full
relationship from process to packet: a process creates a socket that
sends and receives packets. There is currently an open source Linux
kernel module available on github, including a fuller description and
instructions for building and using the tool.
When developing the tool, we wanted to use a dump format that was well
supported and that would allow Hone captures to be used within Wireshark
(and other analysis tools) with minimal fuss. The original PCAP dump
format could have worked, but would have required much hacking.
PCAP-NG, on the other hand, was perfect and could easily store our
additional data blocks without breaking older/unmodified versions of
Wireshark (or other tools with support for PCAP-NG).
Hone adds some additional options to existing block types as well as two
additional block types. These are additions and only extend the
original PCAP-NG specification; they do not change any existing blocks
or options, but only add additional information. They are described
completely on the github wiki
[https://github.com/HoneProject/Linux-Sensor/wiki/Augmented-PCAP-Next-Generation-Dump-File-Format].
The first addition is a GUID option in the section header block that
provides a method of uniquely identifying the capture source. In hind
sight, I think it would be better to make it a general ID field of
variable length, possibly with the first byte indicating the ID type
(similar to the epb_hash option), followed by the appropriate data.
The second addition is a process event block used to describe processes
running on a system. The process ID (PID) and a timestamp are required
and options include the event (exec, fork, or exit), the full path of
the executable, the arguments, the PID of the parent, and user IDs and
names. While working on support for these additions in Wireshark, I
realized that the timestamps, which should work like timestamps on
enhanced packet blocks, require an interface to realize the offset
(if_tsoffset) and resolution (if_tsresol). I am assuming that the
offset and resolution values would be the same for all interfaces in a
section and that using the first interface should be sufficient. Does
that sound reasonable? The option of directly associating the process
with an interface seems all wrong. Perhaps there should be global
timestamp options in the section header block which can be overridden by
interfaces for packet blocks? Or perhaps my assumption should become
the spec for this block type? The Linux sensor does not capture packets
off the interface, but uses the netfilter framework instead; so a
pseudo-interface is the first and only interface description block in
the section. But that might not be the case for the Windows sensor.
The next addition is the connection event block. A connection record is
generated whenever a socket is created or closed and an identifying
integer, unique over the lifetime of the socket, is stored along with
the creating process's PID and a timestamp. An optional event describes
whether the socket is being created or destroyed. The same timestamp
problem exists with this block type as for the process event block.
Finally, two options are added to the enhanced packet block indicating
the connection and/or process with which the packet is associated. If
only the connection ID is given (as is the case with the Linux sensor),
the connection event records can be searched to find the process ID.
Other sensors, such as the Windows version, may already know the process
when the packet event block is written and can include it, obviating the
need to search for it. It is also possible to exclude connection event
blocks altogether to save space.
So what are the benefits of these additions? By capturing and storing
the process information gathered during the execution and use of a
process/socket, the need to guess what application was responsible for a
packet is eliminated. By including the connection/socket record,
packets can be correlated to process during analysis allowing for faster
capture and eliminating the need for keeping lookup records in memory
during capture freeing more memory for use by other applications. It
also exposes applications which may be opening sockets without sending
or receiving anything. Connection records may also be used to provide
connection lifetimes for connectionless protocols, such as UDP.
Correlating Hone data from two machines communicating on the network
exposes a complete end to end view: host1 <-> process <-> connection <->
packet <-> connection <-> process <-> host2.
I look forward to working with you all on improving the PCAP-NG
specification.
Thank you,
Brandon Carpenter
More information about the pcap-ng-format
mailing list