[pcap-ng-format] Hone introduction and PCAP-NG additions

Tue Jun 26 12:54:46 PDT 2012

Hello,

Richard asked that I provide some information on the additions PNNL made 
to the PCAP-NG format to support Hone data.  Below is a quick 
introduction and explanation of the changes.  More information can be 
found on our github page [https://github.com/HoneProject/Linux-Sensor].

Hone is a tool for performing packet-process correlation.  It does so by 
intercepting process and socket creation and destruction events, 
correlating the two, and then bridging the gap in the network stack to 
associate the packet to the socket, thereby expressing the full 
relationship from process to packet: a process creates a socket that 
sends and receives packets.  There is currently an open source Linux 
kernel module available on github, including a fuller description and 
instructions for building and using the tool.

When developing the tool, we wanted to use a dump format that was well 
supported and that would allow Hone captures to be used within Wireshark 
(and other analysis tools) with minimal fuss.  The original PCAP dump 
format could have worked, but would have required much hacking. 
PCAP-NG, on the other hand, was perfect and could easily store our 
additional data blocks without breaking older/unmodified versions of 
Wireshark (or other tools with support for PCAP-NG).

Hone adds some additional options to existing block types as well as two 
additional block types.  These are additions and only extend the 
original PCAP-NG specification; they do not change any existing blocks 
or options, but only add additional information.  They are described 
completely on the github wiki 
[https://github.com/HoneProject/Linux-Sensor/wiki/Augmented-PCAP-Next-Generation-Dump-File-Format].

The first addition is a GUID option in the section header block that 
provides a method of uniquely identifying the capture source.  In hind 
sight, I think it would be better to make it a general ID field of 
variable length, possibly with the first byte indicating the ID type 
(similar to the epb_hash option), followed by the appropriate data.

The second addition is a process event block used to describe processes 
running on a system.  The process ID (PID) and a timestamp are required 
and options include the event (exec, fork, or exit), the full path of 
the executable, the arguments, the PID of the parent, and user IDs and 
names.  While working on support for these additions in Wireshark, I 
realized that the timestamps, which should work like timestamps on 
enhanced packet blocks, require an interface to realize the offset 
(if_tsoffset) and resolution (if_tsresol).  I am assuming that the 
offset and resolution values would be the same for all interfaces in a 
section and that using the first interface should be sufficient.  Does 
that sound reasonable?  The option of directly associating the process 
with an interface seems all wrong. Perhaps there should be global 
timestamp options in the section header block which can be overridden by 
interfaces for packet blocks?  Or perhaps my assumption should become 
the spec for this block type?  The Linux sensor does not capture packets 
off the interface, but uses the netfilter framework instead; so a 
pseudo-interface is the first and only interface description block in 
the section.  But that might not be the case for the Windows sensor.

The next addition is the connection event block.  A connection record is 
generated whenever a socket is created or closed and an identifying 
integer, unique over the lifetime of the socket, is stored along with 
the creating process's PID and a timestamp.  An optional event describes 
whether the socket is being created or destroyed.  The same timestamp 
problem exists with this block type as for the process event block.

Finally, two options are added to the enhanced packet block indicating 
the connection and/or process with which the packet is associated.  If 
only the connection ID is given (as is the case with the Linux sensor), 
the connection event records can be searched to find the process ID. 
Other sensors, such as the Windows version, may already know the process 
when the packet event block is written and can include it, obviating the 
need to search for it.  It is also possible to exclude connection event 
blocks altogether to save space.

So what are the benefits of these additions?  By capturing and storing 
the process information gathered during the execution and use of a 
process/socket, the need to guess what application was responsible for a 
packet is eliminated.  By including the connection/socket record, 
packets can be correlated to process during analysis allowing for faster 
capture and eliminating the need for keeping lookup records in memory 
during capture freeing more memory for use by other applications.  It 
also exposes applications which may be opening sockets without sending 
or receiving anything. Connection records may also be used to provide 
connection lifetimes for connectionless protocols, such as UDP. 
Correlating Hone data from two machines communicating on the network 
exposes a complete end to end view: host1 <-> process <-> connection <-> 
packet <-> connection <-> process <-> host2.

I look forward to working with you all on improving the PCAP-NG 
specification.

Thank you,

Brandon Carpenter