You mention internal code repositories and, when possible, open source. Is there a good reason that the default shouldn't be open source and where compelling reasons exist internal?
I'd expect something like "it deals with PII" or "it deals with classified data/systems" would be compelling reasons, but anything related to public data should default to open.
As an example, why aren't the data sets and code used to generate the BLS jobs report sitting in a public repository somewhere?
My personal list would also include "we'd like to work with organizations with good models that are their own intellectual property" and "if we disclose our models we'll be making it easier for people to game them."
But if I were trying to answer this more specifically, I'd ask folks who've done a lot of work on this problem. (For instance, Jordan Kasper is going to be talking about open source software in government on Friday at DEFCON: https://hackertracker.app/event/?conf=DEFCON33&event=61546)
Interesting. It might be useful to abstract out the model in some way - and there's a big difference between "we tell you which off the shelf model we use" and "we give you unlimited access to it". It's not that uncommon for open source software to call proprietary APIs.
As to disclosure leading to people gaming the system - if it matters, that may fall under one of the exceptions, but something like the public comment analysis could enable people to check their comments before submitting and ensure that they're interpreted as intended. (Though, there's also a big win for a structured response).
If you mean something like having the exact model that BLS uses to generate the jobs report out in public could lead to somehow manipulating it, that seems interesting. I'd expect sticking all of the analysis in a public repository would allow independent verification as well as give people insight into how the model changes (ex: it could serve to counter any "they're presenting bad data for political reasons" claims.)
You mention internal code repositories and, when possible, open source. Is there a good reason that the default shouldn't be open source and where compelling reasons exist internal?
I'd expect something like "it deals with PII" or "it deals with classified data/systems" would be compelling reasons, but anything related to public data should default to open.
As an example, why aren't the data sets and code used to generate the BLS jobs report sitting in a public repository somewhere?
My personal list would also include "we'd like to work with organizations with good models that are their own intellectual property" and "if we disclose our models we'll be making it easier for people to game them."
But if I were trying to answer this more specifically, I'd ask folks who've done a lot of work on this problem. (For instance, Jordan Kasper is going to be talking about open source software in government on Friday at DEFCON: https://hackertracker.app/event/?conf=DEFCON33&event=61546)
Interesting. It might be useful to abstract out the model in some way - and there's a big difference between "we tell you which off the shelf model we use" and "we give you unlimited access to it". It's not that uncommon for open source software to call proprietary APIs.
As to disclosure leading to people gaming the system - if it matters, that may fall under one of the exceptions, but something like the public comment analysis could enable people to check their comments before submitting and ensure that they're interpreted as intended. (Though, there's also a big win for a structured response).
If you mean something like having the exact model that BLS uses to generate the jobs report out in public could lead to somehow manipulating it, that seems interesting. I'd expect sticking all of the analysis in a public repository would allow independent verification as well as give people insight into how the model changes (ex: it could serve to counter any "they're presenting bad data for political reasons" claims.)