As I mentioned before, I do a fair amount of work in the Net/Sec space as well. While most of the time in Net/Sec, we think of active attacks, Pen Testing and red team activities, there's passive recon which can provide some good information on our targets as well. There are a lot of excellent tools out there which can do the job for us and gather information however I feel that you should know why you're looking at something, not just what you're looking at.
While I don't do a lot of DevOps myself, I work along side of application developers who are great at what they do. What (some of them) aren't that great at, is understanding the network components and connectivity. If you're programming a network-connected application, using packet captures will help you not only understand what's being sent across the wire, but also help you troubleshoot and debug your application from a network point of view.
System administrators can benefit from packet captures as well both from an information perspective as well as a troubleshooting perspective. Let's say you're in charge of a server and want to know what is being sent out of the system for security purposes, you can use packet captures to examine what your server is sending out. From there, you can track down any offending application and deal with it. Packet captures are useful in service troubleshooting when something's not working leading to more information combined with service debugs as well as host-based firewall rules.
Let's start with something simple. Take a look at the following (extremely basic) network diagram:
It's hard to get more basic than this. Here we have a webserver (10.20.30.40) directly connected (no switch) to a firewall (10.20.30.254). On the other side of the firewall we have another network where the firewall has 1.2.3.254 and a client using 1.2.3.4 as their IP address. Although not seen in the screenshot, we're using a /24 (255.255.255.0) subnet mask for both networks and that our webserver is running on the standard TCP/80 port. We're also not concerned with DNS resolution for now. Our objective is to have the client connect through the firewall to the webserver to load a page with a funny cat picture. 'Cause why not?
Let's get our microscopes out and really dig into this. Yes, this is pretty long but it's very (!) important to understand. Here's what should happen:
Well... That was a mouthful. Let's look at it again but this time with happy little pictures to make things a little easier to digest. :) The client will be in red, the firewall in blue and the server in green.
Keep in mind that this happens every time a packet needs to cross a routing device (with the exception of ARP/ND cache).
Below are actual packet captures to illustrate what has actually happened. Just note that the time on all three machines is a little off so the timestamps may appear to be from the future:
Client sends an ARP request for 1.2.3.254 - Take note of the destination MAC address being the Layer-2 broadcast address:
11:44:11.990523 0a:00:27:00:00:00 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.254 tell 1.2.3.4, length 28
0x0000: 0001 0800 0604 0001 0a00 2700 0000 0102 ..........'.....
0x0010: 0304 0000 0000 0000 0102 03fe ............
The firewall sees this ARP request and the replies to it. Again, pay attention to the Layer-2 address as the firewall will only reply to the requestor:
11:44:10.573901 0a:00:27:00:00:00 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 1.2.3.254 tell 1.2.3.4, length 46
0x0000: 0001 0800 0604 0001 0a00 2700 0000 0102 ..........'.....
0x0010: 0304 0000 0000 0000 0102 03fe 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
11:44:10.573915 08:00:27:09:fb:93 > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 1.2.3.254 is-at 08:00:27:09:fb:93, length 28
0x0000: 0001 0800 0604 0002 0800 2709 fb93 0102 ..........'.....
0x0010: 03fe 0a00 2700 0000 0102 0304 ....'.......
The client receives the ARP tell:
11:44:11.990707 08:00:27:09:fb:93 > 0a:00:27:00:00:00, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 1.2.3.254 is-at 08:00:27:09:fb:93, length 46
0x0000: 0001 0800 0604 0002 0800 2709 fb93 0102 ..........'.....
0x0010: 03fe 0a00 2700 0000 0102 0304 0000 0000 ....'...........
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
Once received, the client sends the first SYN packet. Notice the destination MAC address is that of the firewall:
11:44:11.990713 0a:00:27:00:00:00 > 08:00:27:09:fb:93, ethertype IPv4 (0x0800), length 74: (tos 0x10, ttl 64, id 34414, offset 0, flags [DF], proto TCP (6), length 60)
1.2.3.4.39308 > 10.20.30.40.80: Flags [S], cksum 0x8dda (correct), seq 407892468, win 29200, options [mss 1460,sackOK,TS val 1669018 ecr 0,nop,wscale 7], length 0
0x0000: 4510 003c 866e 4000 4006 87fc 0102 0304 E..<.n@.@.......
0x0010: 0a14 1e28 998c 0050 184f f1f4 0000 0000 ...(...P.O......
0x0020: a002 7210 8dda 0000 0204 05b4 0402 080a ..r.............
0x0030: 0019 779a 0000 0000 0103 0307 ..w.........
The firewall receives this packet from the client:
11:44:10.574019 0a:00:27:00:00:00 > 08:00:27:09:fb:93, ethertype IPv4 (0x0800), length 74: (tos 0x10, ttl 64, id 34414, offset 0, flags [DF], proto TCP (6), length 60)
1.2.3.4.39308 > 10.20.30.40.80: Flags [S], cksum 0x8dda (correct), seq 407892468, win 29200, options [mss 1460,sackOK,TS val 1669018 ecr 0,nop,wscale 7], length 0
0x0000: 4510 003c 866e 4000 4006 87fc 0102 0304 E..<.n@.@.......
0x0010: 0a14 1e28 998c 0050 184f f1f4 0000 0000 ...(...P.O......
0x0020: a002 7210 8dda 0000 0204 05b4 0402 080a ..r.............
0x0030: 0019 779a 0000 0000 0103 0307 ..w.........
After the firewall checks it's routing table, it knows that it has to send it out the other interface. Since the destination is a connected network, we send out another ARP request:
11:43:54.534937 08:00:27:ea:2e:72 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.30.254 tell 10.20.30.40, length 46
0x0000: 0001 0800 0604 0001 0800 27ea 2e72 0a14 ..........'..r..
0x0010: 1e28 0000 0000 0000 0a14 1efe 0000 0000 .(..............
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
The server receives it:
11:43:54.365615 08:00:27:4e:2d:4e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 10.20.30.40 tell 10.20.30.254, length 46
0x0000: 0001 0800 0604 0001 0800 274e 2d4e 0a14 ..........'N-N..
0x0010: 1efe 0000 0000 0000 0a14 1e28 0000 0000 ...........(....
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
And the reply is sent from the server:
11:43:54.365971 08:00:27:4e:2d:4e > 08:00:27:ea:2e:72, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Reply 10.20.30.254 is-at 08:00:27:4e:2d:4e, length 46
0x0000: 0001 0800 0604 0002 0800 274e 2d4e 0a14 ..........'N-N..
0x0010: 1efe 0800 27ea 2e72 0a14 1e28 0000 0000 ....'..r...(....
0x0020: 0000 0000 0000 0000 0000 0000 0000 ..............
And the firewall receives it:
11:43:54.534971 08:00:27:4e:2d:4e > 08:00:27:ea:2e:72, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 10.20.30.254 is-at 08:00:27:4e:2d:4e, length 28
0x0000: 0001 0800 0604 0002 0800 274e 2d4e 0a14 ..........'N-N..
0x0010: 1efe 0800 27ea 2e72 0a14 1e28 ....'..r...(
The firewall then takes the original SYN packet and sends it to the server:
11:44:10.574026 08:00:27:4e:2d:4e > 08:00:27:ea:2e:72, ethertype IPv4 (0x0800), length 74: (tos 0x10, ttl 63, id 34414, offset 0, flags [DF], proto TCP (6), length 60)
1.2.3.4.39308 > 10.20.30.40.80: Flags [S], cksum 0x8dda (correct), seq 407892468, win 29200, options [mss 1460,sackOK,TS val 1669018 ecr 0,nop,wscale 7], length 0
0x0000: 4510 003c 866e 4000 3f06 88fc 0102 0304 E..<.n@.?.......
0x0010: 0a14 1e28 998c 0050 184f f1f4 0000 0000 ...(...P.O......
0x0020: a002 7210 8dda 0000 0204 05b4 0402 080a ..r.............
0x0030: 0019 779a 0000 0000 0103 0307 ..w.........
And the server receives it:
11:44:10.405495 08:00:27:4e:2d:4e > 08:00:27:ea:2e:72, ethertype IPv4 (0x0800), length 74: (tos 0x10, ttl 63, id 34414, offset 0, flags [DF], proto TCP (6), length 60)
1.2.3.4.39308 > 10.20.30.40.80: Flags [S], cksum 0x8dda (correct), seq 407892468, win 29200, options [mss 1460,sackOK,TS val 1669018 ecr 0,nop,wscale 7], length 0
0x0000: 4510 003c 866e 4000 3f06 88fc 0102 0304 E..<.n@.?.......
0x0010: 0a14 1e28 998c 0050 184f f1f4 0000 0000 ...(...P.O......
0x0020: a002 7210 8dda 0000 0204 05b4 0402 080a ..r.............
0x0030: 0019 779a 0000 0000 0103 0307 ..w.........
Since the port is open and there is no firewalling involved, the server sends a SYN/ACK packet. Take a look at the ACK number which is the Initial Sequence Number (ISN) 407892468+1 (407892469):
11:44:10.405548 08:00:27:ea:2e:72 > 08:00:27:4e:2d:4e, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.20.30.40.80 > 1.2.3.4.39308: Flags [S.], cksum 0x2c70 (incorrect -> 0x88ca), seq 74455689, ack 407892469, win 28960, options [mss 1460,sackOK,TS val 2206557040 ecr 1669018,nop,wscale 7], length 0
0x0000: 4500 003c 0000 4000 4006 0e7b 0a14 1e28 E..<..@.@..{...(
0x0010: 0102 0304 0050 998c 0470 1a89 184f f1f5 .....P...p...O..
0x0020: a012 7120 2c70 0000 0204 05b4 0402 080a ..q.,p..........
0x0030: 8385 6370 0019 779a 0103 0307 ..cp..w.....
The firewall receives this packet on it's interface and routes it back toward the client on the other interface:
11:44:10.574245 08:00:27:ea:2e:72 > 08:00:27:4e:2d:4e, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.20.30.40.80 > 1.2.3.4.39308: Flags [S.], cksum 0x88ca (correct), seq 74455689, ack 407892469, win 28960, options [mss 1460,sackOK,TS val 2206557040 ecr 1669018,nop,wscale 7], length 0
0x0000: 4500 003c 0000 4000 4006 0e7b 0a14 1e28 E..<..@.@..{...(
0x0010: 0102 0304 0050 998c 0470 1a89 184f f1f5 .....P...p...O..
0x0020: a012 7120 88ca 0000 0204 05b4 0402 080a ..q.............
0x0030: 8385 6370 0019 779a 0103 0307 ..cp..w.....
And:
11:44:10.574249 08:00:27:09:fb:93 > 0a:00:27:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.20.30.40.80 > 1.2.3.4.39308: Flags [S.], cksum 0x88ca (correct), seq 74455689, ack 407892469, win 28960, options [mss 1460,sackOK,TS val 2206557040 ecr 1669018,nop,wscale 7], length 0
0x0000: 4500 003c 0000 4000 3f06 0f7b 0a14 1e28 E..<..@.?..{...(
0x0010: 0102 0304 0050 998c 0470 1a89 184f f1f5 .....P...p...O..
0x0020: a012 7120 88ca 0000 0204 05b4 0402 080a ..q.............
0x0030: 8385 6370 0019 779a 0103 0307 ..cp..w.....
Whereby the client receives it and then ACK's the second SYN flag:
11:44:11.991079 08:00:27:09:fb:93 > 0a:00:27:00:00:00, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.20.30.40.80 > 1.2.3.4.39308: Flags [S.], cksum 0x88ca (correct), seq 74455689, ack 407892469, win 28960, options [mss 1460,sackOK,TS val 2206557040 ecr 1669018,nop,wscale 7], length 0
0x0000: 4500 003c 0000 4000 3f06 0f7b 0a14 1e28 E..<..@.?..{...(
0x0010: 0102 0304 0050 998c 0470 1a89 184f f1f5 .....P...p...O..
0x0020: a012 7120 88ca 0000 0204 05b4 0402 080a ..q.............
0x0030: 8385 6370 0019 779a 0103 0307 ..cp..w.....
11:44:11.991149 0a:00:27:00:00:00 > 08:00:27:09:fb:93, ethertype IPv4 (0x0800), length 66: (tos 0x10, ttl 64, id 34415, offset 0, flags [DF], proto TCP (6), length 52)
1.2.3.4.39308 > 10.20.30.40.80: Flags [.], cksum 0x27d2 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 1669018 ecr 2206557040], length 0
0x0000: 4510 0034 866f 4000 4006 8803 0102 0304 E..4.o@.@.......
0x0010: 0a14 1e28 998c 0050 184f f1f5 0470 1a8a ...(...P.O...p..
0x0020: 8010 00e5 27d2 0000 0101 080a 0019 779a ....'.........w.
0x0030: 8385 6370 ..cp
Now that we have the three-way TCP handshake completed, we can send data. For this testing, I just typed the word "Hello". See if you can find it in the ASCII output:
11:44:13.251291 0a:00:27:00:00:00 > 08:00:27:09:fb:93, ethertype IPv4 (0x0800), length 73: (tos 0x10, ttl 64, id 34416, offset 0, flags [DF], proto TCP (6), length 59)
1.2.3.4.39308 > 10.20.30.40.80: Flags [P.], cksum 0xf8a8 (correct), seq 1:8, ack 1, win 229, options [nop,nop,TS val 1669333 ecr 2206557040], length 7: HTTP
0x0000: 4510 003b 8670 4000 4006 87fb 0102 0304 E..;.p@.@.......
0x0010: 0a14 1e28 998c 0050 184f f1f5 0470 1a8a ...(...P.O...p..
0x0020: 8018 00e5 f8a8 0000 0101 080a 0019 78d5 ..............x.
0x0030: 8385 6370 4865 6c6c 6f0d 0a ..cpHello..
--I'm going to skip ahead and gloss over the firewall portion now since we've demonstrated how it works--
The server receives the packet and then acknowledges it:
11:44:11.666867 08:00:27:4e:2d:4e > 08:00:27:ea:2e:72, ethertype IPv4 (0x0800), length 73: (tos 0x10, ttl 63, id 34416, offset 0, flags [DF], proto TCP (6), length 59)
1.2.3.4.39308 > 10.20.30.40.80: Flags [P.], cksum 0xf8a8 (correct), seq 1:8, ack 1, win 229, options [nop,nop,TS val 1669333 ecr 2206557040], length 7: HTTP
0x0000: 4510 003b 8670 4000 3f06 88fb 0102 0304 E..;.p@.?.......
0x0010: 0a14 1e28 998c 0050 184f f1f5 0470 1a8a ...(...P.O...p..
0x0020: 8018 00e5 f8a8 0000 0101 080a 0019 78d5 ..............x.
0x0030: 8385 6370 4865 6c6c 6f0d 0a ..cpHello..
11:44:11.666943 08:00:27:ea:2e:72 > 08:00:27:4e:2d:4e, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 53412, offset 0, flags [DF], proto TCP (6), length 52)
10.20.30.40.80 > 1.2.3.4.39308: Flags [.], cksum 0x2c68 (incorrect -> 0x21a5), seq 1, ack 8, win 227, options [nop,nop,TS val 2206558301 ecr 1669333], length 0
0x0000: 4500 0034 d0a4 4000 4006 3dde 0a14 1e28 E..4..@.@.=....(
0x0010: 0102 0304 0050 998c 0470 1a8a 184f f1fc .....P...p...O..
0x0020: 8010 00e3 2c68 0000 0101 080a 8385 685d ....,h........h]
0x0030: 0019 78d5 ..x.
Notice that the ACK packet contains the same sequence sent in the "Hello" packet (8). TCP is an error correcting protocol. If the ACK value was (4) then the client knows that four bytes were not received and will be sent in the next packet.
To close the connection, our client sends a FIN packet which is then FIN/ACK'd by the server. The following three packets are only shown from the client side to keep this a bit shorter:
11:44:14.415770 0a:00:27:00:00:00 > 08:00:27:09:fb:93, ethertype IPv4 (0x0800), length 66: (tos 0x10, ttl 64, id 34417, offset 0, flags [DF], proto TCP (6), length 52)
1.2.3.4.39308 > 10.20.30.40.80: Flags [F.], cksum 0x207f (correct), seq 8, ack 1, win 229, options [nop,nop,TS val 1669624 ecr 2206558301], length 0
0x0000: 4510 0034 8671 4000 4006 8801 0102 0304 E..4.q@.@.......
0x0010: 0a14 1e28 998c 0050 184f f1fc 0470 1a8a ...(...P.O...p..
0x0020: 8011 00e5 207f 0000 0101 080a 0019 79f8 ..............y.
0x0030: 8385 685d ..h]
11:44:14.417944 08:00:27:09:fb:93 > 0a:00:27:00:00:00, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 63, id 53413, offset 0, flags [DF], proto TCP (6), length 52)
10.20.30.40.80 > 1.2.3.4.39308: Flags [F.], cksum 0x1bf3 (correct), seq 1, ack 9, win 227, options [nop,nop,TS val 2206559466 ecr 1669624], length 0
0x0000: 4500 0034 d0a5 4000 3f06 3edd 0a14 1e28 E..4..@.?.>....(
0x0010: 0102 0304 0050 998c 0470 1a8a 184f f1fd .....P...p...O..
0x0020: 8011 00e3 1bf3 0000 0101 080a 8385 6cea ..............l.
0x0030: 0019 79f8 ..y.
11:44:14.418039 0a:00:27:00:00:00 > 08:00:27:09:fb:93, ethertype IPv4 (0x0800), length 66: (tos 0x10, ttl 64, id 34418, offset 0, flags [DF], proto TCP (6), length 52)
1.2.3.4.39308 > 10.20.30.40.80: Flags [.], cksum 0x1bf0 (correct), seq 9, ack 2, win 229, options [nop,nop,TS val 1669625 ecr 2206559466], length 0
0x0000: 4510 0034 8672 4000 4006 8800 0102 0304 E..4.r@.@.......
0x0010: 0a14 1e28 998c 0050 184f f1fd 0470 1a8b ...(...P.O...p..
0x0020: 8010 00e5 1bf0 0000 0101 080a 0019 79f9 ..............y.
0x0030: 8385 6cea ..l.
When It Doesn't Work
This is why we're here. If it doesn't work, we need to figure out why. There are a lot of parts in this simple example but where do we start? We start at the first point we are able to capture packets. In this example, let's say we control all three items in the equation: The client, the firewall and the server. From a networking perspective, the first thing we need to look at is the ARP request from the client. Assuming the network interface is eth0 on the client, here's what I would run:
tcpdump -nn -vvv -e -s 0 -X -c 100 -i eth0 arp
Let's break this down a bit:
- tcpdump - This is the command to run.
- -nn - Do not resolve hostnames or service names. This will change alice.http to 1.2.3.4.80
- -vvv - Be as verbose as possible.
- -e - Print the Ethernet headers in the packet
- -s 0 - Set the snaplength of the capture to all 65535 bytes.
- -X - Print hexadecimal output on the left and ASCII output on the right.
- -c 100 - Capture 100 matching packets and then stop.
- -i eth0 - Capture on the interface named eth0
- arp - This is our first filter we've encountered. This tells tcpdump to only capture packets which are ARP packets and ignore any other packets going across this interface. Since ARP will match the who-has and the tell, both of which we're interested in to start with, this is the filter we want to use.
Now that we have our filter in place, what are we looking for? We're looking for the output of "what should happen" and compare it with what we know. What should happen in this situation?
- The client should send out an ARP who-has packet.
- The firewall should send out an ARP tell packet.
If the client doesn't send out an ARP who-has packet, what could some of the causes be for this?
- Maybe the client has the MAC address in its ARP cache. So let's check that with "arp -na | grep 1.2.3.254" on the client machine. If there is a matching (and correct) entry, then there's no reason for the client to ARP for the gateway.
- If there's no ARP cache entry, then there's a problem with the configuration on the machine. Check the routes with "netstat -nr" to ensure that the client has both a valid routing table and that there is a matching route for our destination (or, at least, a default gateway).
- If the routing is in place and there is no ARP cache on the machine, then there is a breakdown before the NIC is involved and we know that our first point of investigation is the client machine itself.
BAM! Packet captures have now just told you where the (first) problem is with your network and packets didn't even leave the machine. How's that for you? :)
Okay, but what if the ARP request goes out the NIC and it's still not working? Glad you asked. Take a look and see if you get an ARP reply. If there is no ARP reply, then there's an issue with either the intermediary connection between the client and server (in this case, a cable) or there's an issue with the firewall. So let's look there next...
We'd log on to the firewall at this point and run the same command with the same "arp" filter. We're going to have one of two outcomes (you'll hear this a lot throughout the document):
- We see the ARP request come in or
- We won't see the ARP request come in.
That's it. If we don't see the ARP request come in but we see it leaving the client, then the most likely culprit is the intermediary device. But if we see the ARP request come in to the firewall, what should we be looking for next? The "arp-tell" coming from the firewall back to the client. Again, with our packet capture running, we're going to have one of two outcomes:
- The firewall responds to the ARP request or
- It doesn't.
If it doesn't reply to the ARP request, there's likely a configuration issue on the firewall. BAM! Just found another potential issue and investigation point. If it does reply but we don't see that on the client, the issue is with any intermediary device. BAM! Problem solved (so far).
Tangent Number One
I feel now is a good time to break off into a non-technical note about troubleshooting and getting things fixed. A lot (!) of times working in Net/Eng I've come across people who will be very stubborn when accepting that a device they control could be problematic. How many times have you heard this:
"It can't be my device since it's configured properly. It has to be an issue with the firewall so you need to fix it."
I'm guessing you have heard this more times than you care to count. Even more so when it turns out that their device was the issue. The problem is that people want to pass the buck and either (or both) not do the work or seem like they're the problem.
The issue with this mentality is that nothing will ever get solved. I'll be brutally honest here - If I've screwed up, I will accept the fault and move on with the learning experience. If I'm wrong, that's okay. If I have a professional disagreement with a colleague, we'll still go for coffee afterwards. But one thing I won't do, is pawn off work to other people until I've proven that my device isn't the issue. Here's my troubleshooting mantra:
"I assume the problem is mine and it's my job to prove it's not my problem."
By doing this, I accomplish a few things:
- I will validate whether or not this is a legitimate networking issue (using packet captures) and fix it if it is. At this point, you've identified the (first) issue and can start working on fixing the issue.
- I look good in front of customers. Seriously, this is a valid point. :)
- If it's not a networking issue, I have concrete proof that the issue is (currently) not one of my devices I control. Maybe it will be later in the flow, but right now, it's not.
Look at that last point again. I can take my packet captures, send them out over e-mail and nobody can refute that the packets are passing.
All four fields of technology I mentioned at the start of this document can (and should) adopt the above mantra and take ownership of their part of the troubleshooting process. Things will get fixed faster and less time is wasted when dealing with pure conjecture.
Back to it...
Now, where were we... Right. We have ARP established between the client and the firewall - Time to move on to our TCP SYN packet. Our client will need to send this packet to the firewall so let's make sure that the packet leaves properly and arrives at the firewall correctly. As per usual, we're going to run a packet capture but this time, just a little different:
tcpdump -nn -vvv -e -s 0 -X -c 100 -i eth0 host 10.20.30.40 and port 80
You'll see that we now have new filters at the end. The "host 10.20.30.40" tells tcpdump to only capture packets where the IPv4 address is 10.20.30.40 for either the source or destination. We've also stated that we want "port 80" which means that the packet has to have a Layer-4 port (source or destination) as 80. Notice that we used the operand "and" between the two. This logic gate means that both conditions must match for a packet to be captured and displayed.
The same rules as the ARP condition apply however now we're looking for a TCP packet matching our source and destination with the SYN flag set. We need to compare, again, with what should happen versus what we're actually seeing (or not seeing). If we don't see the TCP packet leave even though ARP has been configured properly, there's a problem with the client itself. Maybe there's a host-based firewall preventing our traffic from leaving. Assuming that the packet does, in fact, leave, we need to focus our attention to the next item in our chain, in this case, the firewall.
We can use the exact same packet capture syntax we used above to verify that the TCP packet arrives. Just like everything else, we're going to have one of two outcomes:
- The TCP packet arrives or
- The TCP packet doesn't arrive.
So if it doesn't arrive at the firewall but leaves the client, the issue is with an intermediary device. How can we be so sure of this? Because we've verified with a packet capture that the packet has left the client but not arrived at the firewall. Something in between is the issue.
Now, if the packet arrives, we're actually going to skip ahead just a little bit by running the same packet capture on the other interface on the firewall (in our example, eth1). We expect to see the packet leaving properly out this interface and, again, we have two possible outcomes:
- The TCP packet leaves properly or
- The TCP packet doesn't leave properly.
If it leaves properly, we're golden (for now). But if it doesn't, there are a few more things we need to check since we're now dealing with a routing device:
- Check for destination routes on the firewall. Using "netstat -nr" you will see where the packet is going to go. If the network is directly connected, your gateway may just be 0.0.0.0 but will list the interface it's going to.
- Check for any sort of packet filtering (firewalling) on the device. It is possible that there is a firewall rule preventing the traffic from passing. Take a look at your logs.
- If the routing and firewalling are correct, take a look at the ARP table on the firewall to see if 10.20.30.40 is listed there and, if it is, make sure the MAC address lines up with the actual MAC address of the server.
I'm not going to repeat the same steps as before but if traffic isn't passing properly on our firewall and the firewall rules allow the traffic to pass, start back at the ARP troubleshooting and work your way back here. As I've mentioned before, every time a packet goes somewhere, the routing steps are checked which will be followed by ARP and finally, pushing the packet to the next hop. With that being said, let's take a look at our last hop with this direction.
The packet should now be leaving eth1 of the firewall and we can do our packet capture on the web server itself. Again, all the same rules apply as before. The packet checks routing which is now destined for itself so the web server doesn't route it - Instead it will process it. We expect that TCP/80 is in a LISTEN state (check with: netstat -nap | grep LIST | grep -v unix for all Layer-4 socket states and, if you want, pipe it through "grep 80" to get any ports with "80" in their port) and that there are no firewall rules on the web server blocking the traffic. If this is the case and everything works, the socket state will change from LISTEN to SYN_RECEIVED and it will reply back with a SYN/ACK packet. But what if it doesn't...?
Check the log files (/var/log/apache2/error_log for example) of the application, take a look at the firewall rules on the host and the routing on the host to make sure that it's not multi-homed and the return packet is leaving out the incorrect interface.
Once the SYN/ACK is sent and if you're still running your packet captures on the firewall, you should see this SYN/ACK being sent back across both interfaces. If you've stopped your packet captures on the firewall, start them back up (simultaneously would be best) with the filters we spoke about earlier (host 10.20.30.40 and port 80) to capture the return traffic.
Where are we now?
All of this we've just gone through has outlined one of the most basic connections I could write about... We have three boxes (A client, a routing device and a server) all directly connected with a cable (no switches), no Network Address Translation (NAT) and no firewall policies blocking anything. But let's be honest - There is nobody running anything remotely as basic as this. In the real world, we have switches, routers, NAT, PAT, GRE, dynamic routing, route redistribution, IPSec, NAC, wireless and on and on and on...
You may be thinking to yourself: "Gr@ve... When I look at my network or my customers network, it has so many parts to it! How can I use packet captures to troubleshoot something so massive?" Glad you asked. Sometimes, not always, when you encounter large, foreboding networks, they're also messy. A mixture of static routes pushed into dynamic routing protocols, firewalls improperly configured, VSS partially set up and what not. Regardless of the size or complexity of the network, here is the flow chart for you. Remember: There are only ever two outcomes and with this in mind, everything is less daunting.
If you take the most complex, messy or otherwise daunting network, just take your troubleshooting hop-by-hop and you'll start to narrow down where the issues are. When dealing with unicast traffic, you've always got one source going to one destination. Sure, your packets may flow through load balancers or LACP bonded interfaces but when you boil it down, it comes back to one source and one destination.
Where to go From Here
I strongly suggest getting a few Virtual Machines put together (VirtualBox anyone?) and lab out some basic servers. From here, start running packet captures so you can understand what's happening on the wire. Then, break part of the server configuration to see how that affects the network traffic.
One thing I suggest to people is to start by learning one thing really well and then move on to another subject. In our case, ICMP (IP/1) Echo Request and Echo Reply (ping) is probably the simplest to learn. Learn about ICMP Requests and Replies then move on to another ICMP code like Destination Host Unreachable. What does this code represent? In what situations would you see this? How is it different than ICMP Destination Network Unreachable? Once you're comfortable with that, move on to setting up an Apache web server and start learning HTTP. What do the packets look like when everything works? What's the difference between a 403 and a 404? How can you produce these codes and what does it mean for troubleshooting? Now you're in the land of TCP (IP/6) so there are more things to learn at this layer as well - What is MSS? What different TCP flags are there and why would you see an RST instead of a FIN for example?
The other thing people sometimes have issues with is understanding your filters. You need to build your filters according to the situation. If you're troubleshooting a web server which is behind a Static NAT, you'll have to change your "host" filter on the inside from what you're using on the outside. I think we can take a look at that for the next episode.
.plan
https://tcpdump101.com
Twitter:
https://twitter.com/Grave_Rose
PayPal:
https://www.paypal.me/tcpdump101
Reddit:
https://www.reddit.com/r/tcpdump101