SSG5 suddenly stopped outgoing sessions
mun24 last edited by
We have changed firewall from NS-5GT to SSG5.
In 15 days time it happened 2 times that this firewall suddenly stopped outgoing sessions. Then I have to reset the firewall to make it work again. Sessions count in this firewall is never more then 300.
I never had this problem on NS-5GT.
How to troubleshoot this problem.
rowlandg last edited by
we had an issue on 6.0.2 and 6.0.3 where the box would either stop matching policies or stop NATTING and put them in the global cleanup rule even though for the last hour the same devices where working. Only a reboot fixes this or an upgrade to 6.0.4
Another customer had a buffer issue when multiple 10k 20k or 50k files going through ok but if they went to 60k - 100k it would work for 30mins then just give up. We checked/adjusted the MSS and MTU settings and this made no difference we upgraded wednesday and the customer has not had any issues andthey have put through loads.
!!!6.1 you can limit sessions per policy so you could use this with some QOS
Hmm, I had noticed that there is a difference about default route when one does it during initial configuration or after logging in and then doing it in Network->Routing->Destination. The “old” one appears when you do it with initial configuration, the “new” one when you set the route via webui after logging in. That’s why it is there. That article referred to Netscreens, not Juniper SSG5 and screenos 4.0, not 6.0… Don’t know, what to think about that now.
Anyway, even after taking down the src-ip session limit and replacing test-manageable switch with their old one the problem hasn’t appeared anymore.
But there’s still a problem with SSG-140 and that’s probably not the session issue - some 400 sessions out of 48 000 possible can’t make Juniper slow, I think. If it appears again and I can’t find solution, I have to start a thread about it.
JuniperGuy last edited by
I think ahfaris is referring to a “set route 0.0.0.0/0 gateway x.x.x.x” statement, or lack thereof. I did see the “set interface ethernet0/0 gateway E.F.G.209” statement, (I had a hard time digging it up) it is a legacy command from ScreenOS 3.0 and was dropped when 4.0 come along.
Issues with command are
Packets don’t get routed out untrust interface properly
Packets routed out default route instead of specific route
Here is a KB article explaining that you should use a route statement:
Yea, it certainly does exist! If it doesn’t, then there would be no internet availability for that company at all. Actually, internet from inside out, because from outside in it may work well even without that routing.
ahfaris last edited by
See the routing for for 0.0.0.0/0 at networking - > routing -> destination , it must exist and point to your router or default gateway .
Well, that screening option to limit source-ip based sessions helped to let the firewall running yesterday so that the session table didn’t grow too large. I raised the limit from 50 to 100 because is saw that even tcp port 80 was limited many times.
But yea, the alarm log is completely full (2047 logs) with announcements that the src-ip session limit was reached. But no complaints so far about bad internet connection, bad skype quality or smth…
Hi greg1c, big thanks for the tips! In trust zone I set the source IP-based session limit to 50 sessions and we’ll see, what happens. Also in trust zone I lowered the UDP flood setting to 500 PPS, but I’m not quite sure how much this usually is when UDP traffic is normal. About number of sessions - it already has 8064 sessions available with 6.0.0r3.0 (with 5.4.0 it has 4064 as much as I know). Did you mean that it may have some 16000 sessions with appropriate license?
greg1c last edited by
You could limit the the number of sessions from one ip using the screening options (best option), you could limit bandwidth (although this will do nothing to sessions) for that protocol. You could change the UDP flood options lower, to automatically block this UDP traffic once it gets to so many PPS. You could get the extended license and double the sessions of your SSG 5 to 8,192.
Thanks muppet for letting me know about wireshark! My problem suddenly stopped so I didn’t try it on that comupter, but on mine
Anyway, after some 10 days of (Juniper) working nothing bad happened. Then I made restart, and at the moment it has worked for 9 days and there were never more than 200 sessions in that time, usually around 100. I can’t understand. The only thing I did on that one computer I mentioned above was that I manually ran the Spybot Search & Destroy which was scheduled to run in Fridays, and it found nothing to remove except one thing: with group policy or what there were some notifications about security problems just removed - nothing special. And I remind that it has been before that outgoing traffic slowed down more than once a week, not just after Fridays, when Spybot perhaps couldn’t automatically download new definitions and hanged somehow.
At the moment in Saturdays night it has ran for 9 days 9 hours and it has only 16 sessions.
But… Yesterday I replaced a Zyxel Zywall 5 with Juniper SSG5 in another company and things were worse! Within hours the session table was so full that outgoing traffic just didn’t exist anymore. When I pinged some public IP from inner network I didn’t get anything, but when I pinged from SSG5, it pinged with 100% success.
I made the session analyze with Tim’s good tool and it also turned out that more than 99% was UDP and there were 6 machines where from the traffic came from. Disabling Trust->Untrust default traffic for some seconds made everything clear, and then the number of sessions started going up pretty fast again (about 10 sessions per 1 second) and then slowed down a bit. After hours (half an hour ago) I made “session clear src-ip <ip>” for all the “best” six machines (when SSG5 had 1700 sessions and the best of them 461!) and it looks like it is going to start again, but not that fast.
I searched about Skype and Juniper but didn’t find nothing special. Only that there was exactly the same case somewhere in a hotel when a client opened his computer with Skype windows open and immediately his MAC was blocked by hotels firewall because that Skype UDP traffic looked like an attack from inside.
Could it be that Skype as an example of P2P program, takes down Juniper SSG5? Because when Skype keeps running, whoever else can use this company’s internet resource to redirect Skype traffic (by pinching holes into firewall as it does)?</ip>
Sorry you’re right - I didn’t read the full thread and I will bash myself on the head accordingly.
The next step to troubleshoot this if I was in your shoes would be to install Wireshark on the machine in question and leave it logging in the background for a couple of hours. Clear the sessions on the firewall (clear session) and see if they reappear.
If they do, have a look at your wireshark capture and see if you can figure out what the traffic is! If it doesn’t appear in Wireshark, then make sure you
a) Don’t have a rootkit installed
b) You really have the right machine (check the arp addresses “get arp” to make sure there’s not multiple machines with the same IP)
Hope this helps more than my last boneheaded post
Thanks but I’ve done that already with Tim’s Analyzer. I scanned the computer which had most connections - no spyware or viruses - or thay weren’t detected. The user who uses that machine is a quiet elder woman. I didn’t detect any torrents installed in that machine. And the best thing is that she has Skype installed, but never uses it! And still 272 sessions, yea… And shuts computer down after work every day. Really odd. There is a scheduled task by which Spybot Search & Destroy updates and scans machine every friday, but I couldn’t detect that that was the creator of UDP-s.
If the sessions aren’t timing out, they must be in use.
ssh to the console of the device and do a “get session” - You can then examine in detail what the ports are.
Most probably someone running Skype or Bittorrent.
Well, I found that when I unchecked the (default) policy from trust to untrust which allows all the traffic inside out, and after 10 seconds put it back, all the stale sessions were cleared and the number went down from 979 to 114 first, then 160 after some minutes and after that 93. Very interesting - but still no clue why are sessions being held if the initiator computer is not in the private network anymore…
That machine with 272 sessions yesterday was back today (laptop) and after half of the workday there were exactly 272 sessions still.
Well, that information was really interesting! Thank you very much greg1c! I saw that one computer which is not in the network at the moment has 272 connections and another one which is in the network has 190 and the number is changing (once more, then less again). 99% of all those 900 sessions were UDP. People make PPTP there but this is just TCP 1723 and GRE - so RRAS’s stale records or smth. cannot be blamed? If UDP default timeout is 1 minute then all these stale (I think) sessions should be ended after one minute! I saw that from top 5 source IP-addresses all of them were in private subnet and 4 of them could not be accessed at all - why do these sessions hang then? Perhaps tomorrow when people are actively working I can make some better results with NSSA.
greg1c last edited by
Sessions are active until they are closed or until the protocol times out, if you have not changed any of the timers on protocols, then most TCP timers are set to 30 minutes and most UDP timers are 1 minute. You can get a list of sessions by doing a get session command, if you go to http://performanceclassifieds.net/NSSA.zip you can download the session analyzer program written by Tim Eberhard (Thanks Tim!).
The SSG-5 supports 4096 sessions and the extended version supports 8192 sessions.
Hmm, interesting idea, but I haven’t defined and port-forwarded any Skype ports anywhere…
Ah, probably you have defined a huge time-out on skype, or worse a never time out! I’ve seen a NS500 (250000 session) come down on such a thing! UDP sessions will * * only * * end on time-out. TCP on the other end has a teardown sequence the firewall recognizes and closes the session on.
I noticed that the number of sessions grows during time:
Up time: 0 day 12:38:29, System time: 2008-01-29 23:40:39
Up time: 0 day 22:00:29, System time: 2008-01-30 09:02:37
Up time: 1 day 00:29:44, System time: 2008-01-30 11:31:53
Up time: 1 day 03:06:54, System time: 2008-01-30 14:09:03
Up time: 1 day 07:13:12, System time: 2008-01-30 18:15:20
Up time: 1 day 09:01:10, System time: 2008-01-30 20:03:17
Up time: 1 day 22:25:58, System time: 2008-01-31 09:28:05
Up time: 2 days 00:59:59, System time: 2008-01-31 12:02:05
Up time: 2 days 03:13:30, System time: 2008-01-31 14:15:36
Up time: 2 days 10:40:56, System time: 2008-01-31 21:43:01
Up time: 2 days 22:02:53, System time: 2008-02-01 09:04:57
Up time: 0 day 02:49:17, System time: 2008-02-01 12:09:49
Up time: 0 day 07:12:17, System time: 2008-02-01 16:32:48
Up time: 0 day 08:48:27, System time: 2008-02-01 18:08:57
It is 5.4.0r8.0 at the moment with maximum 4064 sessions.
Could it be that some half-full 10-100 duplexing causes this? I really don’t know because I don’t have that experience.
Is there any way to check out all the sessions that are not completed and are currently active? I think I have to do that because I can’t see anything strange from some specific computer in the outgoing traffic log.
But there are random connections to anywhere in the world - I realized that Skype causes that kind of UDP traffic, and many of them use Skype.
Sadly with unmanaged switches you are at the mercy of it just “working” with no real ability to troubleshoot it or verify that it’s healthy that is a very large unknown here…
I would say that I have never come across what you are describing on a netscreen. I would monitor memory/cpu/sessions and attempt to rule out the firewall itself. If possible try to swap the dumb switch out for either a known good dumb switch or a worth while switch.
Alan is dead on with pinpointing the physical duplex settings, that will get you again and again and will be an issue.
Good luck in troubleshooting this…