IDP crashing

  • administrators

    It’s fine with the default policy, but something in the custom policy is causing it to crash.  It’s in passive mode, so my thought was to put a sniffer on another SPAN port and capture that traffic, but so much of it is coming through, unless I kill the sniffer right when it dies, it’s impossible to tell which packet caused the crash.

    Does anyone have any thoughts?  Juniper already has all the logs from the unit, but they need the data on the wire from when it crashes to be able to figure out what’s causing it.

  • Our specific situation at this moment points to a bug that is hit by Riverbed WAN accelerator hardware. It seems that this equipment uses GRE and TCP compresion and when specific traffic is run through the IDP it hits a kernel panic.

    We are working closely working together with JTAC to resolve the issue.

  • I have heard there is a problem with session……If you exceed the boxes sessions the boxes just crash they do’t fail over and the NICs don’t fail open. The only way around it was to buy a bigger box or put a SSG infront and restrict the sessions

  • The problems we run in to are on 4.1. User ttnurmi suggest a workaround that was in the 4.0 release notes.

    In the meanwhile Juniper has suggested us to disable “Application Identification” from NSM on the specific IDP device. This seems to be a new feature since 4.1. Maybe the code has maturity problems.

    We also got a nice suggestion for continously capturing traffic:

    There is an easy way to continuously capture the traffic and rotate the capture traffic.

    Here is the procedure to capture traffic that is passing through the eth2 & eth3 interface pair:

    scio const set sc_pcap_outbound_pkts 1

    mkdir /var/idp/eth2

    cd /var/idp/eth2

    nohup /usr/idp/device/utils/dumpLoop -i eth2 -s 100000000 -k 10 &

    This will capture max 100mb x 10 = 1GB of traffic on eth2.

  • Are we talking about releases in the 4.1 code or 4.0 code?  I can’t tell by the thread.

  • Hi
    Document is Known Issues update for release 4.0. r3 day was 01-03-2008. and when i do that ignore rule for rulebase. No crash situtation after that…we use then release 4.0 r4. So it could be also version 4.1.x which work same way. so try to do that ignore rule!

  • We’ve been having crash issues with our IDP running 4.1r2 - console locks up and everything.  It may have had to do with timing issues between the IDP and NSM.  Where in the 4.1r2 release notes do you see a solution talked about?  Is 4.1r3 available?

  • Hi

    In release notes 4.1 r2 there are solution for this. Which helps our problem also

    you have to do ignore rule to IDP rules…
    source: any
    service:tcp 1521

    so try that, also try to update idp detector engine

  • I can confirm crashes too. With latest 4.1r3 software and up to date detector engines. Same behavior: a complete lockup with no access to the console (kernel crash aparantly).

    Signal 15: in which mode are you operating?

  • The latest updates to the release notes indicate that there are 2 problems that can cause a crash on the IDP units. No software updates. Do these cover the problems reported here??

  • Hi

    Our version is 4.0r4, and now we think that reason might be reassembly packet. Which cause some overflow issue to sensor. But we see that today. Do you mean kerner file or  ksymoops file

  • Global Moderator

    When even the console dies it must something running in kernelmode. What version are you running? If 4.1r2 did you try r1 or 4.0? What soes the messages file show? Did you do a find on the disk for a core file?

    Sounds like a hard one to debug

  • Hi i have a similar problem with idp1100F device. We change device also, and that not help.Do you find solutins abiut this

  • Has juniper had you do a stack trace on the IDP processes to see what the heck is causing them to bomb?

  • administrators

    Both units are still crashing, but several times a day now.  We cannot figure this out, and neither can Juniper support.

  • Any new info about the policy which might cause the crash?

  • administrators

    We think it’s a software problem because we tried two different units with the same policy.  My though also was to have a script the monitors when the IDP dies and then kill the sniffer.

    What we’re doing now is disabling half the rules to figure out which half the rule having trouble is in and then keep doing this until we pin it down to a single rule.  It will take a couple of weeks though.

  • The way I would troubleshoot this would be to have a box logging all traffic from the same span as your IDP sensor and run a cron script (or use your monitoring software, is my fav :-D) to know what time your sensor dies, notify you, and check out the dumps to see what traffic is causing the crash (if that is the problem).  Off the top of my head sounds like a hardware issue if the box is completely dead, software problem should only cause the idp procs to die methinks.

  • administrators

    The box completely locks up.  No ping, no console, nothing.

  • What exactly happens when it crashes?  THe IDP procs die?