Inspired by this StackOverflow question, “How do I sniff on a port for log messages using python?”, I decided to figure out how to capture and process packets in Python. It turns out to be quite easy once you work out the kinks. Except the kinks were a pain to determine.
Sample output:
1 2 3 4 5 |
$ sudo python capture.py 10:27:44.016601 hello ('127.0.0.1', 61129) 10:27:44.016614 hello ('127.0.0.1', 61129) 10:27:54.019731 hello ('127.0.0.1', 61137) 10:27:54.019741 hello ('127.0.0.1', 61137) |
- Note
- It may appear that the same packets are printed twice here, but what you’re really seeing is the same packet going out and then coming back in. The timestamps give it away. Also, if you have an OSX version of tcpdump, the -k ID option will print something like this: (lo0, out) .
Basic, high-level usage:
118 119 120 121 122 123 124 125 |
def print_tcp_data_from_filter(**kwargs): for timestamp, data in tcp_data_from_filter(**kwargs): print "{} {}".format(timestring(timestamp), data) # Only show packets containing actual data, i.e. no protocol-only # packets, coming from my server on port 9988. filter = 'tcp src port 9988 and (tcp[tcpflags] & tcp-push != 0)' print_tcp_data_from_filter(filter=filter) |
And then the actual generator:
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
def tcp_data_from_filter(filter="", interface=None): # interface notes: # iptap and pktap alone act like ",any" is appended # 'any' is a synonym for 'pktap,any' # pktap and iptap do not work with permiscuous mode # iptap seems to take no more than 23 characters # pktap only takes 8 interfaces # pcap.findalldevs() will return a list of interfaces # Using iptap makes coding easier since pcap will only # return the IP portion of the packet if not interface: interface="iptap" if DEBUG: print 'Capturing on interface(s):',interface # You must set timeout_ms. Not sure why the default doesn't work. pc = pcap.pcap(name=interface, # default: None snaplen=256 * 1024, # default: 64k, but tcpdump uses 256k timeout_ms=500) # defailt: 500, but tcpdump uses 1000 pc.setfilter(filter) for capture in pc: if not capture: continue timestamp, packet_data = capture if DEBUG: hex_dump_packet(packet_data) tcp_data = tcp_data_from_packet_data(packet_data) if tcp_data is not None: yield timestamp, tcp_data |
Depending on the interface you capture from, you may receive Ethernet, Loopback, or IP packets. This means that a little deduction must be done to determine how to decode them. We figure this out once, and save the decoding function for all future packets. Although, if you never need to enter promiscuous mode1, you use the “iptap” interface and you will only ever receive IP packets.
67 68 69 70 71 72 73 74 75 |
def determine_packet_function(packet_data): type_functions = [get_tcp_from_ethernet, get_tcp_from_loopback, get_tcp_from_ip] for fn in type_functions: if fn(packet_data) is not None: if DEBUG: print 'Packet type:', fn.__name__.split('_')[-1] return fn return None |
All of the decode functions return either data or None if a parsing error occurs. Here’s the one for Ethernet packets:
49 50 51 52 53 |
def get_tcp_from_ethernet(data): packet = dpkt.ethernet.Ethernet(data) if isinstance(packet.data, dpkt.ip.IP): return packet.data.data.data return None |
As a bonus, here’s a couple of methods that print out binary data similar to the hexdump -C command, but without the trailing ASCII.
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
def print_hex_string_nicely(hex_string): index = 0 result = '' while hex_string: result += '{:08x}: '.format(index) index += 16 line, hex_string = hex_string[:32], hex_string[32:] while line: two_bytes, line = line[:4], line[4:] if two_bytes: result += two_bytes + ' ' result = result[:-1] + '\n' print result def hex_dump_packet(packet_data): print_hex_string_nicely(binascii.hexlify(packet_data)) |
With debugging turned on, the output is much more verbose:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Capturing on interface(s): iptap 00000000: 4500 004e 0f7a 4000 4006 0000 7f00 0001 00000010: 7f00 0001 2704 f1ab 5a99 b667 c79a 4159 00000020: 8018 31d7 fe42 0000 0101 080a 4969 b251 00000030: 4969 b251 6865 6c6c 6f20 2827 3132 372e 00000040: 302e 302e 3127 2c20 3631 3836 3729 Packet type: ip 10:45:44.428427 hello ('127.0.0.1', 61867) 00000000: 4500 004e 0f7a 4000 4006 2d2e 7f00 0001 00000010: 7f00 0001 2704 f1ab 5a99 b667 c79a 4159 00000020: 8018 31d7 fe42 0000 0101 080a 4969 b251 00000030: 4969 b251 6865 6c6c 6f20 2827 3132 372e 00000040: 302e 302e 3127 2c20 3631 3836 3729 10:45:44.428447 hello ('127.0.0.1', 61867) |
-
Ethernet interfaces normally filter out any network traffic not destined for the current machine. Setting the interface to be promiscuous, tells it to pass through all packets it receives. ↩