In a Socorro meeting in September in Washington DC, we started a discussion about how
Socorro could be offered as a service. This is posting is a sketch of
my vision for such a service.
Would the developers with apps in the Mozilla Marketplace benefit from stats and aggregate reports from Socorro?
Socorro processes and reports on binary crashes from the Firefox
platform itself. Is that useful information to the Web developer? Well,
they might want it to just see what Firefox problems they're triggering
so that they may recode around our problems.
It might
be more valuable for app developers to have Javascript stack information
about their unhandled exceptions. Socorro doesn't do this right now.
An unhandled Javascript exception is not something that brings down
Firefox and triggers a breakpad crash submission. So the first step in
getting a version of Socorro to help Web developers is to create and
install a hook into our Firefox platform to submit an unhandled
Javascript exception with stack information to a waiting instance of
Socorro.
Since Socorro is so modular, we could yank
out the breakpad binary crash handling and replace it with some
Javascript stack analysis code. That implies that these crashes are
submitted to a Socorro instance substantially different than the one
that we use. I'll save the speculation as to what Javascript stack
processing would look like for a future posting.
In
this case, offering Socorro as a service means just offering a minimally
configured Socorro (no Hadoop/HBase, file system storage only) in,
perhaps, a VM. Otherwise, it's up to the web developer to host their
own Socorro. If offering Socorro to Web developers means reporting on
Javascript unhandled exceptions, then modifications will be required for
both the Firefox platform and Socorro.
Does Mozilla
itself have interest in a Javascript aware Socorro? If Mozilla is dog
fooding their own platform and writing Marketplace apps (or the default
apps such as the phone app), then I could see a Javascript aware Socorro
as a valuable tool in tracking our own app problems.
Assuming that the regular binary crash information is of interest to app developers, how could Socorro be offered as a service?
Our instance of Socorro shouldn't be burdened with all the aggregation
and overhead of maintaining stats on potentially thousands of WebApps.
I'm thinking about expanding Socorro in a different dimension:
cooperating multiple instances of Socorro.
Imagine
this: an app developer has his own instance of Socorro that only
receives crashes associated with his own app. This instance of Socorro
is either hosted by us on a VM, or hosted on some external system
arranged by the app developer himself.
Whoa Nelly, can we share binary crash dumps with external developers?
This is a critical question that is likely answered with a
resounding, "no". The binary contains potentially "sensitive information" about the user. Privacy concerns would prevent us from sending the binary to third parties. Therefore the rest of this missive should be
considered just a self indulgent speculation of which an implementation
will never see the light of day.
How does the privately owned Socorro get crashes?
Our Socorro will continue to receive all the crashes from our
products. If Socorro senses that a crash happened in a specific WebApp,
during our own save-the-crash-phase of Socorro processing, we just
forward the crash on to the registered URL for that WebApp's private
instance of Socorro. Once thrown over the wall, our Socorro continues
with our normal crash processing. It is up to the WebApp developer to
use his private Socorro to the benefit his own app.
Where does the private Socorro live?
It really doesn't matter. We could offer the hosting. They could host
it on Amazon or their own servers. If they want a full blown HBase
enabled Socorro, they can host on their own cluster. If they only need a
tiny instance, a fully functional Socorro can be configured sans HBase
and run on a VM.
How does our Socorro know about the private Socorro so that it can forward crashes?
This can be implemented in several ways. If a WebApp has a GUID and
that GUID is included in the Breakpad metadata submitted with the crash,
we could look up a registered crash submission URL and use it for
forwarding the crash in exactly the same manner that our products submit
crashes to us.
An alternative that I find more
attractive, is having the private Socorro crash submission url included
in the breakpad crash submission metadata. If the key is present in the
metadata, then our master Socorro would just submit the crash directly
to that URL. That way there are no lookups and we're not having to
maintain a registry. It would be up to the app developer to insure the
URL is up to date and correct. A failed submission is lost.
The
changes to Socorro to enable this crash forwarding scheme are minimal.
The Socorro CrashStorage hierarchy already has the ability to do an
http submit of crashes (this is used in the testing app, SubmitterApp).
There would be only minor modifications required, likely handled
smoothly by inheritance. Including that class in the CrashMoverApp
configuration is all that is needed.
So why would the WebApp developer want our Firefox binary crash reports?
It would also might be possible to hook into the Javascript
unhandled exception code to do the crash report submission without
involving breakpad at all. In such a case, the submission could by-pass
our master Socorro entirely and directly submitting to the private
Socorro.
This also leads to the idea that the app
developer could skip Socorro entirely, roll their own crash reporting
system that'll just receive crashes using the Socorro styled submission
technique.