Closed Bug 634343 Opened 13 years ago Closed 13 years ago

Run a mapreduce job to find crash reports for frankeninstalls

Categories

(Socorro :: General, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: christian, Assigned: aphadke)

References

Details

Attachments

(7 files)

In bug 633869 we noticed the Firefox dlls are mismatched. These are invalid installs and people are crashing on startup. We need to see how pervasive the problem is and if similar mismatches have nbeen seen before.

We would like a map-reduce job to search through crashes and count how many crash reports are from installs like this (add app/dll versions I guess). I'll get more specifics shortly.
Probably the simplest reduction of this is to provide a list of modules that we ship, and find any crash reports where those modules do not have the exact same version numbers. (Modulo bug 634282.)
We ship some dlls with different version numbers for various reasons. nss dll's contain the nss version, nspr has its own version, as does sqlite. In bug 633869 we saw firefox.exe version 1.9.2.4038 (3.6.14 build 1) and brwsrcmp.dll
1.9.2.4055 (3.6.14 build 2) so maybe we can start with those two. Or restrict it to dlls whose versions start "1.9.2." for 3.6 and "1.9.1." for 3.5 if we're checking those, too (if we're doing this exercise it's probably worth knowing about 3.5).
Blocks: 633869
from bug 634351 comment 0:

bug 633869 seems to be caused by people having Firefox components from two
separate builds. It would help us to know if this is a new problem or something
that happens regularly. Is it possible to construct queries of this kind?

Not sure what the best evidence might be, but let's go with what we see in bug
633869: search for Firefox 3.6.x crashes where the module version of
firefox.exe does not match brwsrcmp.dll.  For example, in bug 633869 we're
seeing firefox.exe version 1.9.2.4038 (3.6.14 build 1) and brwsrcmp.dll
1.9.2.4055 (3.6.14 build 2).

1) of people with Firefox 3.6.14, what percentage of crashes show these
specific versions, what percent show both 1.9.2.4038 and what percent show both
1.9.2.4055

2) Of Firefox 3.6.x in general, what percentage of crashes have different
module versions?

3) Can we get a count by Firefox version (or buildID) of how many crashes show
mismatched modules. We're looking to see if this is an ongoing persistent issue
or if it got worse at some point. Maybe percentage would be better than count
since the vast majority of 3.6 users will be using 3.6.13 and that will swamp
any numerical results.

I'm assuming you'd run this query over a small time range for sanity's sake. A
couple of days or a week would be plenty of data. Maybe even just one day if
the query takes a long time to run.
Assignee: nobody → aphadke
data for 1. (comment #4)

date: 2011-02-13 to 2011-02-14
total firefox_windows_crash	639942

format:
DLL_1 && DLL_2 count\n
firefox.exe|1.9.2.4038	brwsrcmp.dll|1.9.2.4055	107
firefox.exe|1.9.2.4038	brwsrcmp.dll|1.9.2.4038	3360
firefox.exe|1.9.2.4055	brwsrcmp.dll|1.9.2.4055	4457
So about 57% upgraded to build2, and 2.4% failure rate on the upgrades. I'll guess that hugely overstates the number of people with broken builds because as a start-up crash people will try/crash multiple times before giving up -- call it a "2.4% unhappiness rate". Probably way more than we want to ship with.
aphadke,  I think this needs to be a map/reduce thing since it involves looking at a large sample of module lists.  the only way I have at getting at that data new is to do screen scrape of a collection of reports from crash-stats.

it would be nice if we could run a number of jobs that run analysis over all  the module lists of all crash reports.  this would be one report, dbaron's module correlations would be another, malware analysis another, and stuff like in Bug 634097] Compare beta 10 hardware acceleration usage to beta 11 hardware acceleration usage still another example.
I ran a sample of 1000 reports from 3.6.14 build crashes on feb 14.

here are rough counts of the differ combinations of firefox.exe and brwsrcmp.dll that I came up with.

 391 firefox.exe 1.9.2.4055 brwsrcmp.dll 1.9.2.4055 
 278 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4038 

 255     

  27 firefox.exe 1.9.2.4038                            ( brwsrcmp.dll missing?)
  20 firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055     missmatch
  15 firefox.exe 1.9.2.4055                            ( brwsrcmp.dll missing?)
  10 firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038     mismatch
   5 FIREFOX.EXE 1.9.2.4038 brwsrcmp.dll 1.9.2.4038     mismatch
   1 firefox.exe 1.9.2.4021 brwsrcmp.dll 1.9.2.4021 

raw data with links to reports are attached.
chofmann - comment #5 is actually a MR job that goes through the entire dataset to get the version mismatches.

I am currently running another MR job for item 2 in comment #4. Will update the ticket once its done.

I agree on your suggestion of doing analysis over all module list of all crash reports. Can you file a separate bug and assign it to me or lemme know the bug-id so I can take a look at it.
(In reply to comment #8)
>  255     

255 Firefox crashes have neither firefox.exe nor brwsrcmp.dll?

>   27 firefox.exe 1.9.2.4038                 ( brwsrcmp.dll missing?)
>   15 firefox.exe 1.9.2.4055                 ( brwsrcmp.dll missing?)

Probably startup crashes before component loading. Inconclusive.

>   10 firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038     mismatch

Pretty sure .3989 is the stock 3.6.13 release, in which case it's an old problem? Might still jibe with bug 466778 making it tons worse though.

>    5 FIREFOX.EXE 1.9.2.4038 brwsrcmp.dll 1.9.2.4038     mismatch

Those match, don't know why the case changed on the filename.

>    1 firefox.exe 1.9.2.4021 brwsrcmp.dll 1.9.2.4021

Was that a nightly? .4021 is 17 days before our "build1" candidate.
So we already had frankenbuilds in the upgrade from 3.6.13 to 3.6.14 build1
bp-08e0fd1d-fc6e-4c04-8152-b0a222110214 -- the same mixed set of modules as the ones in bug 633869 (except for an earlier build).

The signature is js_DestroyScriptsToGC -- the new topcrash that caused us to back out bug 599610. Were we misled, and all of those were frankenbuilds too?
(In reply to comment #10)
> >   10 firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038     mismatch
> 
> Pretty sure .3989 is the stock 3.6.13 release, in which case it's an old
> problem? Might still jibe with bug 466778 making it tons worse though.

Awesome: all but one of those 10 are the js_DestroyScriptsToGC crash (bug 631105) that prompted us to back-out bug 599610 and do build2.
(In reply to comment #9)
> 
> I agree on your suggestion of doing analysis over all module list of all crash
> reports. Can you file a separate bug and assign it to me or lemme know the
> bug-id so I can take a look at it.

Bug 634498
full days crash analysis for feature 2. in comment 4
date: 20110211

Firefox:3.6.13	firefox.exe:null	brwsrcmp.dll:null	1773803
Firefox:3.6.3	firefox.exe:null	brwsrcmp.dll:null	49191
Firefox:3.6.8	firefox.exe:null	brwsrcmp.dll:null	41788
Firefox:3.6.10	firefox.exe:null	brwsrcmp.dll:null	39043
Firefox:3.6	firefox.exe:null	brwsrcmp.dll:null	39009
Firefox:3.6.12	firefox.exe:null	brwsrcmp.dll:null	38315
Firefox:3.6.6	firefox.exe:null	brwsrcmp.dll:null	21417
Firefox:3.6.4	firefox.exe:null	brwsrcmp.dll:null	12540
Firefox:3.6.2	firefox.exe:null	brwsrcmp.dll:null	8028
Firefox:3.6.11	firefox.exe:null	brwsrcmp.dll:null	7943
Firefox:3.6.9	firefox.exe:null	brwsrcmp.dll:null	7529
Firefox:3.6.7	firefox.exe:null	brwsrcmp.dll:null	4423
Firefox:3.6.14	firefox.exe:null	brwsrcmp.dll:null	2268
Firefox:3.6.15pre	firefox.exe:null	brwsrcmp.dll:null	128
Firefox:3.6.14	firefox.exe:1.9.2.4038	brwsrcmp.dll:1.9.2.4055	90
Firefox:3.6b4	firefox.exe:null	brwsrcmp.dll:null	73
Firefox:3.6b5	firefox.exe:null	brwsrcmp.dll:null	68
Firefox:3.6b1	firefox.exe:null	brwsrcmp.dll:null	49
Firefox:3.6.13	firefox.exe:1.9.2.3951	brwsrcmp.dll:1.9.2.3989	48
Firefox:3.6b2	firefox.exe:null	brwsrcmp.dll:null	45
Firefox:3.6.13	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3989	29
Firefox:3.6b3	firefox.exe:null	brwsrcmp.dll:null	25
Firefox:3.6.14pre	firefox.exe:null	brwsrcmp.dll:null	18
Firefox:3.6.3plugin1	firefox.exe:null	brwsrcmp.dll:null	14
Firefox:3.6.13	firefox.exe:null	brwsrcmp.dll:1.9.2.3989	13
Firefox:3.6.13	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3989	13
Firefox:3.6.14	firefox.exe:1.9.2.3615	brwsrcmp.dll:1.9.2.4038	11
Firefox:3.6.10	firefox.exe:null	brwsrcmp.dll:1.9.2.3909	11
Firefox:3.6a1pre	firefox.exe:null	brwsrcmp.dll:null	5
Firefox:3.6a1	firefox.exe:null	brwsrcmp.dll:null	5
Firefox:3.6.8	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3855	5
Firefox:3.6.13	firefox.exe:1.9.2.3855	brwsrcmp.dll:1.9.2.3989	5
Firefox:3.6.13	firefox.exe:1.9.2.3727	brwsrcmp.dll:1.9.2.3989	5
Firefox:3.6.12	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3951	5
Firefox:3.6	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3989	4
Firefox:3.6.6	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3828	4
Firefox:3.6.12	firefox.exe:1.9.2.3937	brwsrcmp.dll:1.9.2.3951	4
Firefox:3.6.10	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3909	4
Firefox:3.6.13	firefox.exe:1.9.2.3909	brwsrcmp.dll:1.9.2.3989	3
Firefox:3.6.6pre	firefox.exe:null	brwsrcmp.dll:null	2
Firefox:3.6.4	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3814	2
Firefox:3.6.13pre	firefox.exe:null	brwsrcmp.dll:null	2
Firefox:3.6.13	firefox.exe:1.9.2.3989	brwsrcmp.dll:	2
Firefox:3.6.12	firefox.exe:null	brwsrcmp.dll:1.9.2.3951	2
Firefox:3.6.12	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3951	2
Firefox:3.6.10pre	firefox.exe:null	brwsrcmp.dll:null	2
Firefox:3.6.10	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3909	2
Firefox:3.6.8	firefox.exe:null	brwsrcmp.dll:1.9.2.3855	1
Firefox:3.6.8	firefox.exe:1.9.2.3855	brwsrcmp.dll:1.9.2.3909	1
Firefox:3.6.80	firefox.exe:null	brwsrcmp.dll:null	1
Firefox:3.6.3	firefox.exe:null	brwsrcmp.dll:1.9.2.3743	1
Firefox:3.6.3	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3909	1
Firefox:3.6.3	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3743	1
Firefox:3.6.13	firefox.exe:null	brwsrcmp.dll:1.9.0.3071	1
Firefox:3.6.13	firefox.exe:1.9.2.3989	brwsrcmp.dll:1.9.2.3667	1
Firefox:3.6.13	firefox.exe:1.9.2.3989	brwsrcmp.dll:1.9.1.3642	1
Firefox:3.6.13	firefox.exe:1.9.2.3814	brwsrcmp.dll:1.9.2.3989	1
Firefox:3.6.12pre	firefox.exe:1.9.2.3926	brwsrcmp.dll:1.9.2.3933	1
Firefox:3.6.12	firefox.exe:1.9.2.3989	brwsrcmp.dll:1.9.2.3951	1
Firefox:3.6.12	firefox.exe:1.9.2.3951	brwsrcmp.dll:1.9.2.3989	1
Firefox:3.6.11pre	firefox.exe:null	brwsrcmp.dll:null	1
Firefox:3.6.10	firefox.exe:1.9.0.3831	brwsrcmp.dll:1.9.2.3909	1
total_firefox_crash	2193305
full days crash analysis for feature 2. in comment 4
date: 20110211

Firefox:3.6.13	firefox.exe:null	brwsrcmp.dll:null	1773803
Firefox:3.6.3	firefox.exe:null	brwsrcmp.dll:null	49191
Firefox:3.6.8	firefox.exe:null	brwsrcmp.dll:null	41788
Firefox:3.6.10	firefox.exe:null	brwsrcmp.dll:null	39043
Firefox:3.6	firefox.exe:null	brwsrcmp.dll:null	39009
Firefox:3.6.12	firefox.exe:null	brwsrcmp.dll:null	38315
Firefox:3.6.6	firefox.exe:null	brwsrcmp.dll:null	21417
Firefox:3.6.4	firefox.exe:null	brwsrcmp.dll:null	12540
Firefox:3.6.2	firefox.exe:null	brwsrcmp.dll:null	8028
Firefox:3.6.11	firefox.exe:null	brwsrcmp.dll:null	7943
Firefox:3.6.9	firefox.exe:null	brwsrcmp.dll:null	7529
Firefox:3.6.7	firefox.exe:null	brwsrcmp.dll:null	4423
Firefox:3.6.14	firefox.exe:null	brwsrcmp.dll:null	2268
Firefox:3.6.15pre	firefox.exe:null	brwsrcmp.dll:null	128
Firefox:3.6.14	firefox.exe:1.9.2.4038	brwsrcmp.dll:1.9.2.4055	90
Firefox:3.6b4	firefox.exe:null	brwsrcmp.dll:null	73
Firefox:3.6b5	firefox.exe:null	brwsrcmp.dll:null	68
Firefox:3.6b1	firefox.exe:null	brwsrcmp.dll:null	49
Firefox:3.6.13	firefox.exe:1.9.2.3951	brwsrcmp.dll:1.9.2.3989	48
Firefox:3.6b2	firefox.exe:null	brwsrcmp.dll:null	45
Firefox:3.6.13	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3989	29
Firefox:3.6b3	firefox.exe:null	brwsrcmp.dll:null	25
Firefox:3.6.14pre	firefox.exe:null	brwsrcmp.dll:null	18
Firefox:3.6.3plugin1	firefox.exe:null	brwsrcmp.dll:null	14
Firefox:3.6.13	firefox.exe:null	brwsrcmp.dll:1.9.2.3989	13
Firefox:3.6.13	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3989	13
Firefox:3.6.14	firefox.exe:1.9.2.3615	brwsrcmp.dll:1.9.2.4038	11
Firefox:3.6.10	firefox.exe:null	brwsrcmp.dll:1.9.2.3909	11
Firefox:3.6a1pre	firefox.exe:null	brwsrcmp.dll:null	5
Firefox:3.6a1	firefox.exe:null	brwsrcmp.dll:null	5
Firefox:3.6.8	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3855	5
Firefox:3.6.13	firefox.exe:1.9.2.3855	brwsrcmp.dll:1.9.2.3989	5
Firefox:3.6.13	firefox.exe:1.9.2.3727	brwsrcmp.dll:1.9.2.3989	5
Firefox:3.6.12	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3951	5
Firefox:3.6	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3989	4
Firefox:3.6.6	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3828	4
Firefox:3.6.12	firefox.exe:1.9.2.3937	brwsrcmp.dll:1.9.2.3951	4
Firefox:3.6.10	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3909	4
Firefox:3.6.13	firefox.exe:1.9.2.3909	brwsrcmp.dll:1.9.2.3989	3
Firefox:3.6.6pre	firefox.exe:null	brwsrcmp.dll:null	2
Firefox:3.6.4	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3814	2
Firefox:3.6.13pre	firefox.exe:null	brwsrcmp.dll:null	2
Firefox:3.6.13	firefox.exe:1.9.2.3989	brwsrcmp.dll:	2
Firefox:3.6.12	firefox.exe:null	brwsrcmp.dll:1.9.2.3951	2
Firefox:3.6.12	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3951	2
Firefox:3.6.10pre	firefox.exe:null	brwsrcmp.dll:null	2
Firefox:3.6.10	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3909	2
Firefox:3.6.8	firefox.exe:null	brwsrcmp.dll:1.9.2.3855	1
Firefox:3.6.8	firefox.exe:1.9.2.3855	brwsrcmp.dll:1.9.2.3909	1
Firefox:3.6.80	firefox.exe:null	brwsrcmp.dll:null	1
Firefox:3.6.3	firefox.exe:null	brwsrcmp.dll:1.9.2.3743	1
Firefox:3.6.3	firefox.exe:1.9.2.3743	brwsrcmp.dll:1.9.2.3909	1
Firefox:3.6.3	firefox.exe:1.9.2.3667	brwsrcmp.dll:1.9.2.3743	1
Firefox:3.6.13	firefox.exe:null	brwsrcmp.dll:1.9.0.3071	1
Firefox:3.6.13	firefox.exe:1.9.2.3989	brwsrcmp.dll:1.9.2.3667	1
Firefox:3.6.13	firefox.exe:1.9.2.3989	brwsrcmp.dll:1.9.1.3642	1
Firefox:3.6.13	firefox.exe:1.9.2.3814	brwsrcmp.dll:1.9.2.3989	1
Firefox:3.6.12pre	firefox.exe:1.9.2.3926	brwsrcmp.dll:1.9.2.3933	1
Firefox:3.6.12	firefox.exe:1.9.2.3989	brwsrcmp.dll:1.9.2.3951	1
Firefox:3.6.12	firefox.exe:1.9.2.3951	brwsrcmp.dll:1.9.2.3989	1
Firefox:3.6.11pre	firefox.exe:null	brwsrcmp.dll:null	1
Firefox:3.6.10	firefox.exe:1.9.0.3831	brwsrcmp.dll:1.9.2.3909	1
total_firefox_crash	2193305
full days crash analysis for feature 2. in comment 4
date: 20110211 (only restricted to firefox 4.0b11)

Firefox:4.0b11	firefox.exe:null	brwsrcmp.dll:null	50501
Firefox:4.0b11pre	firefox.exe:null	brwsrcmp.dll:null	164
total_firefox_crash	50665
full days crash analysis for feature 2. in comment 4
date: 20110215 (only restricted to firefox 4.0b11)

Firefox:4.0b11	firefox.exe:null	browsercomps.dll:null	27232
Firefox:4.0b11pre	firefox.exe:null	browsercomps.dll:null	38
Firefox:4.0b11	firefox.exe:2.0.0.4038	browsercomps.dll:2.0.0.4051	3
Firefox:4.0b11	firefox.exe:2.0.0.4027	browsercomps.dll:2.0.0.4051	2
Firefox:4.0b11	firefox.exe:2.0.0.4051	browsercomps.dll:	1
total_firefox_crash	59540
date: 20110215 (only restricted to firefox 4.0b11)

Firefox:4.0b11	firefox.exe:2.0.0.4051	browsercomps.dll:2.0.0.4051	32771
Firefox:4.0b11	firefox.exe:2.0.0.4050	browsercomps.dll:2.0.0.4050	6
Firefox:4.0b11	firefox.exe:2.0.0.4038	browsercomps.dll:2.0.0.4051	3
Firefox:4.0b11	firefox.exe:2.0.0.4027	browsercomps.dll:2.0.0.4051	2
Firefox:4.0b11	firefox.exe:2.0.0.4051	browsercomps.dll:	1
total_firefox_crash    32783
So frankenbuilds still happen in FF4, but mostly gone. Not like 3.6 at all.

Are the firefox.exe:null crashes plugin-container.exe crashes? Maybe, but that wouldn't explain

Firefox:3.6.3    firefox.exe:null    brwsrcmp.dll:null    49191
Firefox:3.6.2    firefox.exe:null    brwsrcmp.dll:null    8028
Firefox:3.6    firefox.exe:null    brwsrcmp.dll:null    39009
Stripping out the "firefox.exe:null" lines from comment 14 on the theory they were mostly plugin crashes (ignoring the evidence of comment 19, but in any case I don't know what to do with them) and then sorting by release I get

Firefox:3.6    firefox.exe:1.9.2.3667    brwsrcmp.dll:1.9.2.3989    4
Firefox:3.6.3    firefox.exe:1.9.2.3667    brwsrcmp.dll:1.9.2.3743    1
Firefox:3.6.3    firefox.exe:1.9.2.3743    brwsrcmp.dll:1.9.2.3909    1
Firefox:3.6.4    firefox.exe:1.9.2.3743    brwsrcmp.dll:1.9.2.3814    2
Firefox:3.6.6    firefox.exe:1.9.2.3743    brwsrcmp.dll:1.9.2.3828    4
Firefox:3.6.8    firefox.exe:1.9.2.3743    brwsrcmp.dll:1.9.2.3855    5
Firefox:3.6.8    firefox.exe:1.9.2.3855    brwsrcmp.dll:1.9.2.3909    1
Firefox:3.6.10    firefox.exe:1.9.0.3831    brwsrcmp.dll:1.9.2.3909    1
Firefox:3.6.10    firefox.exe:1.9.2.3667    brwsrcmp.dll:1.9.2.3909    2
Firefox:3.6.10    firefox.exe:1.9.2.3743    brwsrcmp.dll:1.9.2.3909    4
Firefox:3.6.12    firefox.exe:1.9.2.3667    brwsrcmp.dll:1.9.2.3951    5
Firefox:3.6.12    firefox.exe:1.9.2.3743    brwsrcmp.dll:1.9.2.3951    2
Firefox:3.6.12    firefox.exe:1.9.2.3937    brwsrcmp.dll:1.9.2.3951    4
Firefox:3.6.12    firefox.exe:1.9.2.3951    brwsrcmp.dll:1.9.2.3989    1
Firefox:3.6.12    firefox.exe:1.9.2.3989    brwsrcmp.dll:1.9.2.3951    1
Firefox:3.6.12pre    firefox.exe:1.9.2.3926    brwsrcmp.dll:1.9.2.3933    1
Firefox:3.6.13    firefox.exe:1.9.2.3667    brwsrcmp.dll:1.9.2.3989    29
Firefox:3.6.13    firefox.exe:1.9.2.3727    brwsrcmp.dll:1.9.2.3989    5
Firefox:3.6.13    firefox.exe:1.9.2.3743    brwsrcmp.dll:1.9.2.3989    13
Firefox:3.6.13    firefox.exe:1.9.2.3814    brwsrcmp.dll:1.9.2.3989    1
Firefox:3.6.13    firefox.exe:1.9.2.3855    brwsrcmp.dll:1.9.2.3989    5
Firefox:3.6.13    firefox.exe:1.9.2.3909    brwsrcmp.dll:1.9.2.3989    3
Firefox:3.6.13    firefox.exe:1.9.2.3951    brwsrcmp.dll:1.9.2.3989    48
Firefox:3.6.13    firefox.exe:1.9.2.3989    brwsrcmp.dll:    2
Firefox:3.6.13    firefox.exe:1.9.2.3989    brwsrcmp.dll:1.9.1.3642    1
Firefox:3.6.13    firefox.exe:1.9.2.3989    brwsrcmp.dll:1.9.2.3667    1
Firefox:3.6.14    firefox.exe:1.9.2.3615    brwsrcmp.dll:1.9.2.4038    11
Firefox:3.6.14    firefox.exe:1.9.2.4038    brwsrcmp.dll:1.9.2.4055    90

3.6.10 - 3
3.6.12 - 14
3.6.13 - 108
3.6.14 - 101

I think 3.6.14 crashes are unthrottled now. If so that 3.6.13 number is more like 1080 crashes from frankenbuilds. But the number of 3.6.13 users is more than 300 times 3.6.14 beta users, not just 10 times. Appears to be a serious uptick in frankenbuilds.

But maybe not. That set of 11 3.6.14 crashes with a 3.6 beta(!!) firefox.exe and a 3.6.14 component is an odd combination. Does that happen to a lot of people or is it one guy crashing 11 times before giving up? Probably the latter. Maybe we're not getting any more frankenbuilds than we always do, but the results in this case were a little more noticeable in a crash spike.
(In reply to comment #20)

> 
> I think 3.6.14 crashes are unthrottled now. 

Just confirming:  yes, as per bug 632171.
> Firefox:3.6.14    firefox.exe:1.9.2.3615    brwsrcmp.dll:1.9.2.4038    11
> Firefox:3.6.14    firefox.exe:1.9.2.4038    brwsrcmp.dll:1.9.2.4055    90

I'm not seeing any firefox-1.9.2.3989/brwsrcmp-1.9.2.4038 in this dataset but we did see them on earlier days in chofmann's sample (comment 12). Maybe it's a self-limiting problem as people give up, and not really new.
> 3.6.10 - 3

Missed a row, there were 7 frankenbuild crashes in 3.6.10 in the dataset.
1) We would like to run a similar job to the above, but we want to get a count of what groups of dlls are mismatched and their versions (to see if there are more than just the exe and brwsrcmp.dll). I'd like the report something like

[count] [FF version] [firefox.exe vers] [mismatched dll#1 vers] [mismatched dll#2 vers]

For example:

580   Firefox:3.6.10   firefox.exe:1.9.2.999   dll#1:1.9.2.888   dll#2:1.9.2.888 

2) I'd like a report to see if the level of the frankenbuilds is the same over the 3.6.13 and the 3.6.14 beta period. The beta period for 3.6.13 was 2010-12-01 through 2010-12-09

3) I'd like a report to see if #2 shows the level on beta is the same what level to expect for release. I'd like the report to query 3.6.13 from 2010-12-09 to now. Bonus points to track it over time so we can graph what crash curve looks like.

Please let me know if more information is needed for these. 

This is very high priority as this data will help us determine if we go out with what we have now for 3.6.14 or if we rebuild / go a different direction.
we should probably also look at early stages of 3.6.13 and other release deployment.  the highest pct. of the problem would most likely occur when release upgrades happen, so looking at what's happening with 3.6.13 now, isn't as much value as looking at the week after December 9 when most of the updates where happening.
Yep, that's why in #3 I want it tracked over time. Do you think we need to do it for #2 as well?
sample one day report for feature 1) in comment #24:
http://people.mozilla.com/~aphadke/top.100.txt

legneato - thoughts?
(In reply to comment #26)
> Yep, that's why in #3 I want it tracked over time. Do you think we need to do
> it for #2 as well?

we collided and I didn't read your comment closely.  yeah, the plan in comment 24 sounds good.  one sugggestion is to output the data with date and adu's to help correlate frequency or mismatches per 100 users or some other similar metric.

date adu's count firefox_version dll_mismatches, ...

           580 Fx:3.6.10 firefox.exe:1.9.2.999 dll#1:1.9.2.888  dll#2:1.9.2.888
(In reply to comment #27)
> sample one day report for feature 1) in comment #24:
> http://people.mozilla.com/~aphadke/top.100.txt
> 
> legneato - thoughts?

Looks great! A couple of things:

* I would like to get the FF exe version in there so I can compare the mismatch without having to cross-reference with the main Firefox verion

* We should probably filter out any dll version that isn't 1.9.* (is this what you asked me about via IRC?)

* In the tsv's it'd be nice if the dlls were prepended to their versions for easier sorting (xul.dll:1.9.2.999). Not a big deal as we can do that in post-processing
Woo, that looks good!

Were we going to add the firefox.exe version as its own column after the Firefox:x.y.z version?

Also, would it be too much time / stress to run that query for the past year? Is that too much? If so, can we do the last 6 months? 3? Not sure what the sweet spot for time vs data is.
(17:25) < LegNeato> aphadke: Sorry, should have been clearer. Only want records where there is at least one mismatched dll with a version matching 1.9.*
(17:26) < LegNeato> (and only want the mismatched dlls and the firefox.exe in that case)
Looks good, let's run it on 3 months of data.
I managed to bring down the hadoop cluster yday while running a single job for 3 months. The job has been modified since then to do 1 week at a time for 3 months, combine and print the results. The job is running, results should be available in next 2-3 hours..
mismatch dll data for 8 weeks at http://people.mozilla.com/~aphadke/nov_dec_jan_dll_mismatch.txt

Secondary cluster will be up and running soon, this will allow us to run MR jobs on a much wider time-range..
Ok, that's enough data for his, thanks!

Can we get query #2 run? It has a lot less data / a more specific time range
that's enough data for *this* that is. I'm not sure having another month of data will tell us anything more.
Data for 2010-12-01 to 2010-12-09, firefox 3.6.13 and firefox 3.6.14 pre build (see comment #24, 2)
http://people.mozilla.com/~aphadke/mismatchdll.20101201.20101209.txt
This takes the data from comment 37 and strips out the lines that don't have any Firefox .dlls in them. Makes it easier to focus on the various mismatched firefox groupings. Interesting that sometimes firefox.exe is newer than the dlls, not 100% older as I'd expect if the firefox process was locked.

I think my favorite is the 3.7.a1pre build with a reasonable-sounding "1.9.3.3568" xpcom.dll and a Firefox 3.5 firefox.exe (1.9.1.3593).
Another way to slice the data in comment 37. Again stripping out lines that only have non-Firefox dlls, then combining the crash counts for each Firefox version with mismatched firefox dlls.  The second column is the number of different dll version groupings for that version of Firefox.

The second column slightly overcounts the number of groupings because I did not coalesce lines whose only difference is a non-Firefox .dll. You can see these in attachment 513356 [details] which was the raw data for this one. It's not too big an effect.
Considering that 3.5.x has only 10-15% of the users that 3.6.x does the counts make 3.5 looks incredibly infested with frankenbuilds. But remember that 3.5 doesn't have OOPP, while in 3.6.4+ plugin crashes won't have a firefox.exe and will be excluded from the data set.

To make more sense of it we'd have to add ADU and crash-per-user columns.
Takes the comment 41 data and strips out the lines with no Firefox .dlls, similar to comment 41 / attachment 513356 [details]
Daniel - wrt comment #43, I assume we are looking for 3.6.13 and 3.6.14pre-build ADUs for 2010-12-01 through 2010-12-09?
for crash-per-user, in addition to the above constraints, we want avg. # of crashes/user?

btw, the ADUs reside at a completely different data source, so I'll have to do some manual data marshaling out here....
Daniel, bug 525390 should make frankenbuilds of Firefox 3.6 much less likely which I am certain is a major factor as to why there are less frankenbuilds with Firefox 3.6 when compared to Firefox 3.5
(In reply to comment #43)
> Considering that 3.5.x has only 10-15% of the users that 3.6.x does the counts
> make 3.5 looks incredibly infested with frankenbuilds. But remember that 3.5
> doesn't have OOPP, while in 3.6.4+ plugin crashes won't have a firefox.exe and
> will be excluded from the data set.
Unless I am mistaken, the majority of frankenfox crashes are startup crashes well before OOPP comes into play.
> Takes the comment 41 data and strips out the lines with no Firefox .dlls,

Comment 40 data, I mean. It shows 39 mixed-dll crashes in 3.6.13 during its week of beta.

The nov-jan data shows 37 mixed-dll crashes in 3.6.14 during the last six days of Jan when it was available on the beta channel. Comfortingly similar, but that comfort could go out the window if socorro throttling was set differently.
(In reply to comment #45)
> Daniel - wrt comment #43, I assume we are looking for [...]
> so I'll have to do some manual data marshaling out here....

My comment was not a request, just an opinion. If Christian thinks we need that additional data he will ask for it in a clear manner. Thanks for volunteering though!

(In reply to comment #47)
> Unless I am mistaken, the majority of frankenfox crashes are startup crashes
> well before OOPP comes into play.

That seemed to be the case in bug 633869 and bug 631105, but I don't think that's generally true. Some of these combinations are so old that they must be stable for these users. The user just happened to crash from some other cause and left traces of their frankenfox for us to find. What we're measuring is "people who crash who happen to have a frankenfox", but we didn't capture data on whether they were startup crashes or not.

We're making guesses about the likelihood of frankenfox creation because JS changes in 3.6.14 made this an unstable, unusable combination. From the data I'm starting to think we didn't do anything to make frankenfoxes more likely, but since the effects are worse (guaranteed startup crash) we're noticing it a lot more this time around. If frankenfoxes are really common then we're screwed. If they're rare enough we can plow ahead with the release and hope the affected people will figure out that they should download a fresh copy.
I suspect that at the very least some of the mismatched crashes are due to updating, ending up with mismatched dll's (which can be due to updating from a very old build), and then crashing on startup.

Having Uptime included in the reports would tell us the number of these crashes are startup crashes and I'd appreciate this data though it can wait if it interferes with getting 3.6.14 out the door.

After comparing the data for 3.5.x and 3.6.x I filed bug 635161 which should reduce mismatches from happening even more.
I'm re-running one of my scans and here is some preliminary data about uptime

count last_crash uptime

   1 \N     0  firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 
   1 72470  6  firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 
   1 63262 19  firefox.exe 1.9.2.3855 brwsrcmp.dll 1.9.2.4038 
   1 2      0  firefox.exe 1.9.2.3989 brwsrcmp.dll 1.9.2.4038 
   1 12     0  firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 
   1 11     0  firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 
   1 10     0  firefox.exe 1.9.2.4038 brwsrcmp.dll 1.9.2.4055 

They are all pretty close to startup, but its also interesting that if the time since last crash is a long time, it takes longer to hit the crash.  If it looks like a retry the crash is immediate.
(In reply to comment #51)
>...
> They are all pretty close to startup, but its also interesting that if the time
> since last crash is a long time, it takes longer to hit the crash.  If it looks
> like a retry the crash is immediate.
The longer times are likely due to the work that is done such as extension checks after a version change and the shorter times are likely due to the same install trying to start again.
(In reply to comment #37)
> mismatch dll data for 8 weeks at
> http://people.mozilla.com/~aphadke/nov_dec_jan_dll_mismatch.txt
> 
> Secondary cluster will be up and running soon, this will allow us to run MR
> jobs on a much wider time-range..
The following entry seems incorrect since the executable and the dll's are all 1.9.1

4	Firefox:3.6.13	firefox.exe:1.9.1.3951	xpcom.dll:1.9.1.3685	xul.dll:1.9.1.3685
I reproduced the js_Enumerate startup crash by taking a 3.6.14 build2 and then copying firefox.exe, xpcom.dll, and xul.dll from a build1. Also got a js_PurgeCachedNativeEnumerators crash

bp-3627645d-56a6-47ad-bce0-2b79e2110222
bp-2ef01596-4480-4b86-af60-152262110222
Oops, first one should be bp-3627645d-56a5-47ad-bce0-2b79e2110222

Also reproduced the js_DestroyScriptsToGC crash with a frankenfox 3.6.14-build1 plus firefox.exe, xpcom.dll, and xul.dll from 3.6.13

bp-7ebddbde-100f-4994-914e-67b992110222
bp-5f011c57-ff8f-4d34-b020-02c2d2110222
bp-0c5ef72e-9bcd-4a11-a9f2-d82592110222
bp-e5375cbe-4a05-4143-b1d6-4122e2110222
bp-abe417f3-fe51-45ec-a9f7-245a62110222

case closed: crash spikes bug 631105 and bug 633869 are caused by frankenfoxes.
Sweet!
do u guys need anything from my end or can we close this bug?
https://bugzilla.mozilla.org/show_bug.cgi?id=633869

im assuming it can be closed since that one is fixed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Nope, I think we still want data from #3 in comment 24. I owe a proper description I think
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, we are getting data for #3 (in a google spreadsheet), great! Now, we need to run it every day to see how 3.6.14 is tracking so we can turn off the updates if we need to.

* Does 4:00 PM PST every day sound ok? Or is early in the morning preferred? 
* How resource intensive is the query on the cluster?
* How long does the query take? If we started it at 4:00 pm would it be done at 5:00 pm so that we can take action?

Please note that we won't need to operate on anything other than 3.6.14 crashes as the 3.6.13 data should be 100% the same.

This is very important data that impacts millions of users so it should be treated as a priority.
* Does 4:00 PM PST every day sound ok? Or is early in the morning preferred? 
A: we have brought down the # of map tasks to 1. Given the CRITICAl nature of bug, I think we should be fine running it @ 4pm but I leave the final decision to dre and xstevens.

* How resource intensive is the query on the cluster?
A: Not too much as we are only doing it for a day and for a specific build. 

* How long does the query take? If we started it at 4:00 pm would it be done at
5:00 pm so that we can take action?
A: It takes roughly 10 minutes for query to run to completion.
I'm fine with this.  dre/x can comment on the timing issue.
#of frankenstein builds for fx 3.6.14:
3870	20110301-20110302
Out of those 3.6.14 frankenfox crashes what were the version for the firefox.exe, xpcom.dll, and xul.dll files?
the report only calculates the aggregate, if needed, I can modify the current process to output the mismatched DLL's similar to: https://bug634343.bugzilla.mozilla.org/attachment.cgi?id=513361
It would be helpful since some number of those would be the earlier firefox.exe version which doesn't cause a crash and will hopefully be fixed by bug 635161.
#of frankenstein builds for fx 3.6.14 (removed crashes with non Mozilla dlls):
2541    20110301-20110302
I didn't remove the non Mozilla dlls from the entries that also included Mozilla dlls.
Counts comparing firefox.exe version and the common dll version without the questionable crashes (e.g. tbb-firefox.exe, incorrect dll filename case, and AccessibleMarshal.dll where only one file can be registered for all installations). There was only one crash where there were multiple dll versions. Out of the remaining count of 2538 only 13 (around 0.5%) had a newer version of firefox.exe which should be improved by fixing bug 635161.

count  firefox.exe-ver  dll-ver1    dll-ver2
2503    1.9.2.3989     1.9.2.4066
12      1.9.2.4066     1.9.2.3989
6       1.9.2.3606     1.9.2.4066
3       1.9.2.3667     1.9.2.4066
3       1.9.2.3743     1.9.2.4066
2       1.9.2.3855     1.9.2.4055
1       1.9.2.3615     1.9.2.4055
1       1.9.2.3615     1.9.2.4066
1       1.9.2.3855     1.9.2.4066
1       1.9.2.3909     1.9.2.4066
1       1.9.2.4038     1.9.2.4055
1       1.9.2.4038     1.9.2.4066
1       1.9.2.4055     1.9.2.4066   1.9.2.3909
1       1.9.2.4055     1.9.2.4066
1       1.9.2.4066     1.9.2.3846
Verified that using a 3.6.13 profile with a 3.6.14 build (updated from 3.6.13) with a firefox.exe with a version of 1.9.2.3989 and all other files up to date there was no crash. This covers the common case for builds with mismatched dlls.
aphadke, could you generate a report for mismatched dll's (version 2.0.0.x) for Firefox Beta 12? I'd like to get an idea if the changes to the updater on trunk have affected the number of frankenbuilds.
rstrong -
date: 03/07 to 03/08 
firefox version: 4.0b12
dll: 2.0.0.x

report at: http://people.mozilla.com/~aphadke/frankenstein/firefox.4.0b12.20110307.20110308.sort.txt

let me know if you would like to run it for a different date range.
(In reply to comment #74)
> rstrong -
> date: 03/07 to 03/08 
> firefox version: 4.0b12
> dll: 2.0.0.x
> 
> report at:
> http://people.mozilla.com/~aphadke/frankenstein/firefox.4.0b12.20110307.20110308.sort.txt
> 
> let me know if you would like to run it for a different date range.
Could I get the same report from February 23rd onward?
will be running this @ 7pm PST once the load on Socorro starts tapering....
aphadke, Thanks! Additional reports for beta 12 won't be needed in case you set up a job.
#of frankenstein builds for fx 4.0b12 and 4.0b12pre (removed crashes with non Mozilla dlls):
285    20110222.20110308

Only 1 had a firefox.exe version greater than the dll version
count  Firefox ver       firefox.exe ver   dll ver
243    Firefox:4.0b12      2.0.0.4051     2.0.0.4070
19     Firefox:4.0b12      2.0.0.4038     2.0.0.4070
6      Firefox:4.0b12      2.0.0.3960     2.0.0.4070
5      Firefox:4.0b12pre   2.0.0.4060     2.0.0.4068
3      Firefox:4.0b12      2.0.0.3882     2.0.0.4070
2      Firefox:4.0b12      2.0.0.4000     2.0.0.4070
1      Firefox:4.0b12      2.0.0.3869     2.0.0.4070
1      Firefox:4.0b12      2.0.0.4027     2.0.0.4070
1      Firefox:4.0b12pre   2.0.0.4028     2.0.0.4069
1      Firefox:4.0b12pre   2.0.0.4063     2.0.0.4060
1      Firefox:4.0b12pre   2.0.0.4066     2.0.0.4067
1      Firefox:4.0b12pre   2.0.0.4068     2.0.0.4069
1      Firefox:4.0b12pre   2.0.0.4069     2.0.0.4070
should we close this bug?
closing for now..
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Blocks: 671348
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: