Wednesday, 2 May 2012

WCCP Redirect ACLs and Masks


This article is about WCCP redirect ACLs, masks and how they relate to TCAM usage on Cisco switches. It's quite important to understand if doing WCCP as you want to ensure forwarding is done in hardware which runs at wire speed and not software which will cause considerable CPU usage and potentially performance issues.

This is quite a difficult subject to explain and I'm not entirely sure I've done it that well here, the info has been pulled in from a variety of sources and I'm also not entirely sure it's correct as a few bits don't quite tie together. It's been re-written several times and I'm still not entirely happy, however here is the info warts n all.

A very basic recap on WCCP.

WCCP redirects traffic as it passes through a switch or router which act as a WCCP server. This is for things like proxy servers or WAN optimisers which are the WCCP clients. The server has redirect ACLs that specifies what traffic will be sent to the WCCP client device. On Cisco routers/switches these ACLs are not stateful and you have to capture traffic flows going in both directions.
This diagram shows the example setup, the Proxy server is the WCCP client, the switch is the WCCP server. 

For example to grab HTTP from LAN to WAN you would have:

ip wccp 100 redirect-acl HTTP_LAN_TO_WAN
ip access-list extended HTTP_LAN_TO_WAN
 permit tcp 10.0.0.0 0.0.0.255 any eq 80


Then to grab the return traffic:

ip wccp 200 redirect-acl HTTP_WAN_TO_LAN
ip access-list extended HTTP_WAN_TO_LAN
 permit tcp any eq 80 10.0.0.0 0.0.0.255


These are then configured on interfaces to capture traffic, Cisco supports both ingress and egress however the switches will only do hardware forwarding for ingress WCCP sessions.

int gi0/1
 description LAN
 ip wccp 100 redirect in


int gi0/2
 description WAN
 ip wccp 200 redirect in


With this configuration alone nothing will happen. You need to add a WCCP client and tell it to communicate with the WCCP server. i.e. you need to configure WCCP on the proxy server and tell it to talk to the switch, it will then start chatting and negotiate certain parameters.


Once that is done the WCCP server will start redirecting traffic. If no WCCP clients are active then the server will just forward traffic as per normal. If one or more WCCP clients are active then the switch will load balance traffic between them depending on configuration.

TCAM

Stands for Ternary Content-addressable Memory. It is used for hardware forwarding, packets are compared against the TCAM table and it tells the switch or router how to forward them. If an entry isn't found in the TCAM table then the packet must be software routed which is not desirable.
Ternary means there are three values, 0, 1 and don't care. "don't care" is represented by an x in this doc and just to really confuse you I'll use 0x to prefix any hex values.


Redirect ACLs

The redirection ACL tells the WCCP server what traffic to intercept and divert to the WCCP client/s, any traffic not matching is passed as normal. As this ACL is likely to be applied on an interface seeing a lot of traffic (probably all transit traffic for the network) then you want it to run entirely in hardware and be as fast as possible. There are a couple of rules with regards to TCAM usage and this ACL:

  • Each permit statement in the ACL requires at least one TCAM entry.
  • Each load balanced path requires at least one TCAM entry.
  • The number of load balanced paths can be calculated with the number of bits in the assignment mask (see below).
  • In all cases except where the mask is 0x0, Deny statements use less TCAM entries than Permit statements. This is because you don't need to load balance traffic being dropped so a Deny statement will only take up 1 TCAM entry.




The Mask.

Cisco switches only support hardware forwarding for WCCP mask based assignments, not using the hash method.The mask is a hexadecimal value that does several things:
  • Restricts how many WCCP clients can be part of the load balancing arrangement.
  • Affects the TCAM usage by WCCP.
  • Defines what IP addresses are load balanced to which WCCP clients.
The last point is critical for WAN optimisers which work in pairs by forming shared byte caches, if you have a farm of WAN optimisers, e.g. in a data centre, then you want remote sites to always speak to the same member of the farm to avoid having to maintain multiple shared caches. i.e. all hosts within a certain subnet will be load balanced to the same WCCP client.

The masks are written in hex and usually configured in hex, but I found to make sense of them it's best to convert them to binary. Also convert the IP addresses to binary and think of the mask being applied bit-by-bit. 

The mask is configured on the WCCP client, e.g. the WAN Optimiser or Proxy Server, which then informs the server during the WCCP session negotiation.

How the Switch Uses the Mask

On Cisco switches all combinations of bits in the mask are used to create different values. These values are applied to the IP addresses in the redirect ACL to create entries in the forwarding table (TCAM), which the switch uses to forward the traffic to WCCP clients.

For example a mask of 0x10 in binary is represented as 0001 0000.
The available combinations of bits are: 0000 0000 and 0001 0000
Because it's a ternary mask we are only interested in the specific bit used in the original mask, the other zero's all become "don't care" values, so the two masks the forwarding table will end up using are:
xxx1 xxxx
xxx0 xxxx

These masks would be applied against the ACL and used to create the TCAM forwarding paths for the traffic, any IP address with a 1 in the 5th position would match the first mask and any with a 0 in the 5th position would match the second. If you configure this mask then look at the WCCP session it appears as below:

Switch#show ip wccp 100 detail
WCCP Client information:
   WCCP Client ID:    192.168.0.100
   Protocol Version:   2.0
   State:     Usable
   Redirection:    L2
   Packet Return:    L2
   Packets Redirected:   0
   Connect Time:     00:01:07
   Assignment:     MASK

   Mask SrcAddr DstAddr SrcPort DstPort
   ---- ------- ------- ------- -------
   0000: 0x00000010 0x00000000 0x0000 0x0000

   Value SrcAddr DstAddr SrcPort DstPort CE-IP
   ----- ------- ------- ------- ------- -----
   0000: 0x00000000 0x00000000 0x0000 0x0000 0xC0A80064 (192.168.0.100)
   0001: 0x00000010 0x00000000 0x0000 0x0000 0xC0A80064 (192.168.0.100)



Mask Load Balancing.

The number of bits in the mask determines how many devices you can load balance traffic between.

A mask of 0x0 does not allow load balancing and will give a single path only (useful if you only have a single WCCP client and are short on TCAM).

A mask of 0x1 allows for load balancing between 2 WCCP clients only. The binary mask values can be either 0 or 1.

A mask of 0x3 allows for up to 4 WCCP clients as it's made up from 2 bits and available mask values can be 00, 01, 10 and 11.


The default mask is 0x1741. In binary that is 0001 0111 0100 0001. 6 bits are used. That allows for 2^6 WCCP clients. I have no idea why Cisco chose this number, even their own WAAS troubleshooting guide recommends you don't use it. Because it has a bit in the leftmost position it will load balance alternating every single IP address and if they wanted a 6 bit mask then 0011 1111 would make more sense, 0x3F. Possibly there is some mathematical significance I haven't seen, possibly it works best with their hardware, possibly it was made up at random or possibly this entire article is wrong and I don't understand the masks at all. Take your pick.

Mask IP Address Matching.

The simplest example is a mask of 0x1. As these masks are used against IP addresses the value would be converted to 32 bits and represented in TCAM as xxxxxxxx.xxxxxxxx.xxxxxxxx.xxxxxxx1


The ACL is "permit tcp 10.0.0.0 0.0.0.255 any eq http". With a mask of 0x1 it would produce two forwarding paths which will match IP traffic as follows:
Path 1  - mask 0 - 10.0.0.2, 10.0.0.4, 10.0.0.6, 10.0.0.8....
Path 2 - mask 1 - 10.0.0.1, 10.0.0.3, 10.0.0.5, 10.0.0.9...


With two WCCP clients, one would receive HTTP traffic from hosts with IPs matching path 1, the second client would receive path 2 clients.


With a mask of 0x10, the binary value is 10000 (in TCAM this would be xxxxxxxx.xxxxxxxx.xxxxxxxx.xxx1xxxx). This will load balance clients in "chunks" of 16 addresses.

If the ACL is "permit tcp 10.0.0.0 0.0.0.255 any eq http" then this will create two groups and distribute traffic as follows:
Mask 0 - 10.0.0.0 to 10.0.0.15, 10.0.0.32 to 10.0.0.47, 10.0.0.64 to 10.0.0.79.....
Mask 1 - 10.0.0.16 to 10.0.0.31, 10.0.0.48 to 10.0.0.63, 10.0.0.80 to 10.0.0.95......

If there were two active WCCP clients then you'd see the traffic distributed as above.

For large solutions you may want to distribute using a different pattern, with WAN optimisers you want the same optimisers to speak to each other rather than have a branch office device communicate with several different data centre devices as it'd either have to maintain several different copies of the byte caching tables or you'd end up with the optimiser cluster forwarding traffic internally to keep the same device peerings. For a system where you wanted to split subnets on a /21 boundary and have up to 4 WCCP clients in your farm then you'd choose a mask as follows:

/21 in binary would look like this: 11111111.11111111.11111xxx.xxxxxxxx
The WCCP mask could be xxxxxxxx.xxxxxxxx.xxxx1xxx.xxxxxxxx
But this would only allow for 2 possible mask values, so only 2 WCCP clients.
To allow 4 WCCP clients you need 2 bits, the mask becomes xxxxxxxx.xxxxxxxx.xxx11xxx.xxxxxxxx
In hex that is shown as 0x1800

This would give four available combinations/masks of:
xxxxxxxx.xxxxxxxx.xxx00xxx.xxxxxxxx shortened to mask 00
xxxxxxxx.xxxxxxxx.xxx01xxx.xxxxxxxx shortened to mask 01
xxxxxxxx.xxxxxxxx.xxx10xxx.xxxxxxxx shortened to mask 10
xxxxxxxx.xxxxxxxx.xxx11xxx.xxxxxxxx shortened to mask 11


With an ACL of "permit tcp 10.0.0.0 0.255.255.255 any eq http" the split would be:

00 - 10.0.0.0 - 10.0.7.255, 10.0.32.0 - 10.0.39.255...
01 - 10.0.8.0 - 10.0.15.255, 10.0.40.0 - 10.0.47.255...
10 - 10.0.16.0 - 10.0.23.255, 10.0.48.0 - 10.0.55.255...
11 - 10.0.24.0 - 10.0.31.255, 10.0.56.0 - 10.0.63.255...


Weighted Load Balancing.

I've said above that each available forwarding path equals a single WCCP client. This is not necessarily the case as you can weight WCCP clients. Consider a case with two WAN optimisers (A and B) of different specifications where A can process twice as much traffic as B. In that event you would want at least 3 forwarding paths, 2 of them pointing to A and 1 to B. Your mask needs to use at least 2 bits. This is another area I'm a bit hazy on, I would think you'd need a multiple of 3 to make this work properly but you can only ever have an even number of forwarding paths...

TCAM Usage

The equation for working out TCAM usage is defined as:

2^<mask bits> * <acl entries>

To include all entries in the ACL the full definition would be:
( 2^<mask bits> * <number of permit statements in redirect acl> ) +   <number of deny statements in redirect ACL>


On a 3750 the WCCP TCAM is shared with the ACL TCAM. You have to run the router SDM template to support WCCP and it supports a maximum of 1024 entries. So if using the default mask you can have up to 1024/6 = 170 entries in the redirect ACL and no other ACLs on the switch.

If you wanted to capture HTTP and HTTPS traffic, split the network by /24 and allow for 8 forwarding paths in your farm then your ACL may be:

permit tcp 10.0.0.0 0.255.255.255 any eq 80
permit tcp 10.0.0.0 0.255.255.255 any eq 443

And your mask may be 0x70 (xxxxxxxx.xxxxxxxx.xxxxxxxx.x111xxxx)


This would result in 8 forwarding paths, each being created for both of the ACL entries, a total usage of 16 TCAM entries. If you are matching traffic in both directions it's a total of 32 TCAM entries used for WCCP.


8 comments:

  1. Amazing article. I just got done with a very long Tax call after upgrading my 6513. Upon reboot my CPU utilization was through the roof. The tram wasbeing overrun by the was redirection. I wish I would have found this article a number of hours ago - would have saved me hours. All my was boxes have that default of 1741 on them. As soon as I changed it to the recommended f00 (why that's the current rec value I have no idea) a show FM summary shows all mylinks in an active hardware state.

    I still think I should change the mask on the was units to a 1 since I o my have 1 was box per site.

    1741 as the default.... that's some messed up cisco logic there.

    ReplyDelete
    Replies
    1. Glad it was of help, if you only have a single device and don't think you'll ever need more then you could even set the mask to 0x0. A mask of 0xF00 will balance alternative /24 subnets to up to 16 hosts...each "permit" statement in the redirect ACL will create 16 TCAM entries.

      I still can't work out the logic behind 1741, Alaska was discovered in 1741 but that seems a bit of a tenuous link!

      Delete
  2. Excellent post, thanks a lot for that.

    One thing though - you say "And your mask may be 0x70", however, your binary conversion looks to be 0x700.

    ReplyDelete
    Replies
    1. Ah good spot, I'll fix that in the article, thanks!

      Delete
  3. Excellent document . I have taken more than three days to understand the WCCP mask relates with TCAM utilization, finally I got this document.. Thanks for your help ..

    In the document /21 boundary and to have up to 4 WCCP clients you have updated that HEX maks value would be as 0x1F00. I hope this is wrong since 0X1F00 would have 5 bit mask( x.x.00011111.X) . Correct me if am wrong ..

    ReplyDelete
    Replies
    1. Quite correct that should be 0x1800, you can see why I found it confusing! That 5 bit mask would allow 2^5 values and result in quite a lot of TCAM usage!

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. In late 90s cisco resolve all well-known web server on internet and tried to find what would be the ideal distribution of bucket for all these website 1741 mask value was derived. This is really bad to use in production environment to accelerate the traffic.

    ReplyDelete