Infinite loop when updating Roster cause the client to freeze/disconnect #102


  • Defect
  • Fixed
Closed
Assigned to _ForgeUser65911
  • coaleyed created this issue Feb 12, 2010

    What steps will reproduce the problem?
    1. Change in raid composition
    2. Leaving raids
    3.

    What is the expected output? What do you see instead?
    Game to stay working.  Hour glass wait window cursor, then poof...WoW closes.

    What version of the product are you using?
    337-360  Has happened in all versions.

    Do you have an error log of what happened?
    No, program closes instantly.

    Please provide any additional information below.
    Doesn't happen all the time, trying to figure out whats exactly causing it.

    PC Information:
    Windows 7 64-bit
    Intel i7 920
    Rampage II Extreme MB
    6 GB memory
    Raptor 300gb WD HD
    295 GTX Nvidia Graphics

  • coaleyed added the tags New Defect Feb 12, 2010
  • _ForgeUser65911 posted a comment Feb 12, 2010

    Mmm, so an infinite loop during raid member additions / deletions? I will glance over the code to see if there is something obvious.


    Edited Feb 12, 2010
  • _ForgeUser65911 removed a tag New Feb 12, 2010
  • _ForgeUser65911 added a tag Accepted Feb 12, 2010
  • _ForgeUser178300 posted a comment Feb 13, 2010

    This has been happening to me too, never occurred it was grid causing it, but reading over this bug made me try and look for clues. It also happens to me when leaving a raid and also occurred today when i converted my 5 man party to a raid.

  • _ForgeUser65911 posted a comment Feb 14, 2010

    I am not seeing anything directly but perhaps jerry can comment on this: many statuses listen for "Grid_UnitChanged", "Grid_UnitJoined", "Grid_UnitLeft" at which point they call self:UpdateIndicators. This iterates through all the frames calling:

    indicator:Update(parent, unit)
    

    This in turn iterates through all the statuses on an indicator figuring out the current one. So exponential based on number of statuses.

    Instead these probably need to be ripped out of statuses. They already cause an update of all the frames associated with a unit.

    The only use for them in statuses should probably be for housekeeping duty: remove / update / delete cached items (if any). The Aura ones are a good example.


    Edited Feb 15, 2010
  • coaleyed posted a comment Feb 14, 2010

    It seems to happen in one particular situation when the game creates a raid from a party. I mean when wow does it automatically. Like say a battleground.

    Ive also had it happen, all though randomly... during the entrance to an arena after the party has been converted to a raid. (done so marks will stick in arenas)


    Edited Feb 14, 2010
  • _ForgeUser193590 posted a comment Feb 15, 2010

    I have a *small* lag while converting from party to raid, but I think this is normal..
    But I have this kind of freeze often during questing / flying with grid2 being hidden. Sometimes in a heroic or a raid out of combat, but never had a single freeze / hang in a raid IN COMBAT though... So I think this doesn't have anything to do with it right? ^^

  • _ForgeUser1990418 posted a comment Feb 15, 2010

    I've noticed severe lag and the client lockups (where the wow client goes to 100% cpu usage and I end up killing it after 5 minutes with no update). This happens when leaving large raids, particularly wintergrasp, or (less often) when the enough people leave the raid that grid2 decides to turn it from a 25 man raid to 10 man.

    I'm pretty sure there's a gremlin in there that's causing it to spin though I've not had a chance to investigate. I'm not sure how I'd investigate this either, maybe pepper the code with printfs and repeatedly join / leave wintergrasp. I always view pets so maybe that's part of the problem?

  • _ForgeUser65911 posted a comment Feb 16, 2010

    I updated Grid2 and the plugins to only use "Grid_UnitChanged", "Grid_UnitJoined", "Grid_UnitLeft" for updating a staus cache (if any).

    The calling code now also emits a Grid_UnitUpdate at the end so that after all that maintenance is complete a single UpdateIndicators gets fired on the frames. This removes the ordering dependency that prevented updates b4 and made it look like watching Grid_UnitChanged and then updating indicators was necessary. I do not think it solves the issue, however I am looking into RaidDebuffs which now cleans its cache correctly. It probably needs to react to "Grid_UnitChanged", "Grid_UnitJoined" as well. Tomorrow I will go through them all and make sure there are no other cached statuses that do not keep their cache correct.

    Jerry, you probably want to sanity check this.

  • _ForgeUser117147 posted a comment Feb 16, 2010

    I'll look into it next week.

    While there are possibly some waste of processing power when several status are listening to Grid_Unit<Something>, I'm not sure it's easily avoidable, nor that the waste is significant. But this issue deserve careful study.

  • _ForgeUser65911 posted a comment Feb 16, 2010

    Looking at it from what changed in the code its shortly after the multiple frames / unit stuff went in. So slightly before the 337 reported above.

  • coaleyed posted a comment Feb 18, 2010

    Had to reinstall Grid basic so i would be able to do arenas, so far i've lost 5 games thinking that the problem was resolved. I just turn grid2 off now when i do the games.

    When it happens in arenas, the load screen completes, and I see myself for about 2 seconds. Then poof, hourglass...wow disappers

  • _ForgeUser1843524 posted a comment Feb 20, 2010

    I've had this exact problem on a mac.  Usually isn't an issue in raids, but leaving or joining groups causes the hang.  I can give a dump from a debugger, but it probably won't be too much use.

  • _ForgeUser65911 posted a comment Feb 20, 2010

    Another thing I noticed is that GridRoster does not seem to be clearing the guids. So IterateRoster has massive duplication after a while.

  • _ForgeUser492667 posted a comment Feb 22, 2010

    My client just freezes when leaving raids, people joining, etc. Not *always*, but extremely often.

    Been trying Grid2 for a few weeks and it just started happening within the last week, not sure which version #

  • _ForgeUser65911 posted a comment Feb 22, 2010

    So I have been dealing with potential issues and it is happening less frequently. I had a recipe for a hard crash by inviting someone to party, quitting, then reinviting. Now I can not get that to crash, however I do see an error in my recount now. If you are experiencing this do you also use recount?

  • _ForgeUser178300 posted a comment Feb 22, 2010

    Using Skada here.

  • _ForgeUser110308 posted a comment Feb 22, 2010

    For anything it might be worth, I'm also using Skada and have been having this sporadic issue.

  • _ForgeUser65911 posted a comment Feb 23, 2010

    All right, just a random coincidence then.

  • _ForgeUser632017 posted a comment Feb 25, 2010

    Layout switching does seem to occasionally crash wow , I crashed about 9 to 12x doing 60-80 arena games with grid2 , as soon as I set grid2 Arena Layout to none the crashes stopped.

  • _ForgeUser110308 posted a comment Feb 25, 2010

    I just had a thought after a lockup from this issue. Could this possibly occur only when something happens to someone else using grid2?

    I noticed that it happened tonight as soon as one of our healers who is also using grid2 left the raid. Thinking back, this isn't the first time it happened when he specifically left the raid.

    Again, just a thought to toss out there in hopes of this getting resolved quickly.


    Edited Feb 25, 2010
  • _ForgeUser492667 posted a comment Feb 26, 2010

    I don't know if this is related but I noticed this behavior tonight:

    I had a raid set to 25 man. We had 24 players in group 1-5, and one offline player in group 6. When I invited one more player (26th total player in raid), my WoW 'froze' up for about 5 seconds. It also "froze' up for about 5 seconds when I then removed the offline player in group 6.

    Perhaps there is something funky going on when you have more players than the current raid difficulty setting? Or maybe it has something to do with having a players in groups that aren't being displayed by the current layout. No errors were thrown, but something definitely wasn't right.

  • Torvalds45 posted a comment Mar 1, 2010

    Have not looked at that code but if nothing really changed in Grid2 perhaps it is a library like Incoming Heals? I don't have this problem much in 5 mans on my healers. But when I'm on my warrior I get it quite a bit. Which means to me someone else is using Grid2 or perhaps an Incoming Heals library?

  • _ForgeUser117147 edited title Mar 3, 2010
  • _ForgeUser110308 posted a comment Mar 3, 2010

    Just making a note here, but I just had this happen for the first time mid-fight in combat. Before, I'd never had it happen while I was in combat, but always out of combat.

    Is this anywhere close to having a fix?

  • _ForgeUser117147 posted a comment Mar 4, 2010

    This bug is non obvious. I see no reason or explanation behind a infinite loop in the roster code. Triggering the bug sadly kills the client, so debugging it as it occurs is not possible.

    As of right now, it would be helpful (i.e. We need) more information on the cause of this issue. Pointing out the changeset that started this behaviour would be helpful, finding out that a particular status causes this would be helpful too.

    Without more information, trying to correct this bug feels like searching for a needle in a haystack.

  • _ForgeUser1990418 posted a comment Mar 4, 2010

    Do you know at all which parts of the code are looping?

  • _ForgeUser117147 posted a comment Mar 4, 2010

    No. Because it seems to happen when the roster changes, its kinda restrict the candidates, but except if some internal data structure are completely screwed, none should loop forever.

  • _ForgeUser1167297 posted a comment Mar 5, 2010

    Ok, maybe this helps. I use pitbull4 for group and grid2 for raids/battlegrounds. As long pitbull's group layout is activated for 5 man raid i can enter arena, pitbull and grid2 frames are visible together. If i disable 5 man raid in pitbull and leave grid2 alone wow crashed just on enter arena. The had same behaviour with shadowed unit frames.

    German client, Win7-64, tested on group with 2 ppl.

    Keep up good work guys!

  • _ForgeUser117147 posted a comment Mar 5, 2010

    If people with some coding experience and willing to help debugging this issue might want to try this:

    http://wow-jerry-stuff.googlecode.com/files/g2check-1.0.zip

    This small addon will load after Grid2 and hook a few functions to try to check the internal state of this addon before and after the call. If this state becomes corrupted, I hope it will detect and report the faulty piece of code.

    Note that this addon might reduce your framerate, as the check is quite expensive. The checking code itself is somewhat straigthforward, but I'm not sure if it's going to find the problem.


    Edited Mar 5, 2010
  • _ForgeUser65911 posted a comment Mar 6, 2010

    I added some print statements in GridRoster around Grid_UnitLeft, Grid_UnitJoined, Grid_UnitChanged to see what is going on. Grid_UnitLeft is very spammy. After a 5 man heroic with some pets the final events after party disbanded were 2 PARTY_MEMBERS_CHANGED. The first generated 15 Grid_UnitLeft events, the next one a couple less. This is after 4 previous leave events also generated their own batches (they got worse the more people left).

    I checked in a fix for this (using wipe instead of nilling the iterated pairs). Not sure if it explains the crash though. At least in 5 man the roster lists seem more accurate now.

  • _ForgeUser117147 posted a comment Mar 8, 2010

    Looking at changeset 417, I saw that units_to_remove was not cleaned up because of a bug :

    for unit, guid in next, units_to_remove do
    	roster_names[unit] = nil
    	roster_realms[unit] = nil
    	roster_guids[unit] = nil
    	roster_units[guid] = nil
    	self:SendMessage("Grid_UnitLeft", unit, guid)
    	units_to_remove[guid] = nil -- *BUG IS HERE* the key is "unit", not "guid"
    end
    

    The effect of this was that, each unit appearing once in the raid would trigger "Grid_UnitLeft" each time RAID_ROSTER_CHANGE was trigger and the unit was absent.

    There can be at most 89 units in units_to_remove, and Grid_UnitLeft processing is minimal, so I'm not sure this bug explains completely the client freeze, which implies that a lot more work is being done.

    Another issue from this bug, I think, is that the loop above clears up roster_units[guid], which means that valid GUIDs can be removed from the roster_units table. I fail to see how that would create an infinite loop, though.

    I suggest to revert 417 and simply fix the invalid line so that units_to_remove is cleared. But I'd prefer to wait a little to see if the Ticket is fixed by 417 first.

  • _ForgeUser65911 posted a comment Mar 8, 2010

    Thats pretty funny, in all the times I stared at it, it always seemed so reasonable and correct! Even after I figured out that block was misbehaving.

    The problem is that when leaving a full raid (WG was almost a guaranteed crash), each person leaving triggers this, so if 10 (counting pets) leave at once thats 445. 30 would be 1335. If it does cause a significant slowdown (5 + seconds) then that last scenario is highly likely as about that many leave a WG raid all at once.

    If you survive that then later on at the end of a random you have 4 people plus pets leaving at the end of loot. So 4-8 leaves for about 356 - 712 messages firing.

    Granted the processing is now trivial (purely cache management) but that is still a fair amount of function calls, table lookups and nilling. That certainly would explain why the change to make them only cache management messages reduced the problem though.

    I think we should leave the wipe calls in. They are more efficient than iterating and nilling.

    I am not seeing the issue with clearing roster_units[guid]. However, only StatusHeals uses functions based on guid, so it would be hard to notice errors.

  • _ForgeUser65911 posted a comment Mar 9, 2010

    One more thought. There are two possibly related bugs (assuming this was not the cause). On reload buff / debuff indicators do not update right away (probably just an initial condition problem with checking too soon). Occasionally, when people get moved around some indicators do not update or update incorrectly. So for instance they may show a buff as missing when it is there. Casting a spell on the unit refreshes it.

    I have not seen the latter since the fix but not enough time has passed either as it seems fairly rare.

  • _ForgeUser65911 posted a comment Mar 10, 2010

    Looks like this fixed the problem. Anyone crash during party / raid disbanding lately?

  • _ForgeUser65911 removed a tag Accepted Mar 10, 2010
  • _ForgeUser65911 added a tag Waiting Mar 10, 2010
  • professionaltart posted a comment Mar 10, 2010

    I haven't been experiencing the crashes lately, tho' I've been running less random heroics, too.

  • professionaltart removed a tag Waiting Mar 10, 2010
  • professionaltart added a tag Replied Mar 10, 2010
  • coaleyed posted a comment Mar 11, 2010

    I believe you have fixed it my friend. I've tried several things. I have had one 1-2 second hang, but could have been anything.

  • _ForgeUser65911 posted a comment Mar 12, 2010

    All right, I think the only outstanding issue then is what jerry noticed: "the loop above clears up roster_units[guid]". This one probably explains why res sometimes fails to find the res target. Moving that to another issue.

  • _ForgeUser65911 unassigned issue from _ForgeUser117147 Mar 12, 2010
  • _ForgeUser65911 self-assigned this issue Mar 12, 2010
  • _ForgeUser65911 removed a tag Replied Mar 12, 2010
  • _ForgeUser65911 added a tag Fixed Mar 12, 2010
  • _ForgeUser65911 closed issue Mar 12, 2010

To post a comment, please login or register a new account.