SAN FRANCISCO — An informal working group of frontier AI developers and outside researchers has, over the past four months, converged on a draft set of baseline disclosure requirements for the capability and safety evaluations that the largest model developers run on their frontier systems before release.
The disclosure framework, which the working group expects to publish in the coming weeks, sits below the level of binding regulation but represents a meaningful step in the direction of comparable disclosure across the industry. Several of the largest developers have signalled that they would adopt the baseline as published.
What the disclosures cover
The baseline disclosures cover three categories. The first is the methodology under which the developers' evaluations are run — the prompts used, the scoring framework, the categories of capabilities under review. The second is the headline outcomes of the evaluations across a defined set of benchmark categories. The third is the developers' own assessment of how the outcomes compare to prior models the same developer has released.
The framework deliberately stops short of requiring disclosure of underlying training data, model architectures, or other elements that the developers consider competitively sensitive. The working group's view is that the disclosure baseline can be useful without crossing those lines.
Where the working group has converged
The working group has converged most firmly on the methodology disclosures. The argument that the methodology is the foundation of any meaningful comparison has been accepted across the participating developers. Where the working group has had more difficulty is on the definition of the benchmark categories.
The benchmark categories are technically contested in ways that the working group has been working through with painstaking specificity. The current draft includes a defined list of capability benchmarks and a separate defined list of safety benchmarks; both lists are intended to evolve over time as the underlying research matures.
The third-party evaluation question
The third-party evaluation question is the part of the disclosure framework that has been most actively debated. The current draft permits, but does not require, third-party evaluation. The working group's view is that third-party evaluation is desirable but not yet operationally mature enough to be a baseline requirement.
That position has been criticised by some outside researchers as too cautious. The working group's response has been that requiring third-party evaluation before the third-party evaluation infrastructure is ready would produce evaluations that lacked the rigour the framework's purpose requires.
The regulatory backdrop
The regulatory backdrop is, as ever, a complicated context. Several jurisdictions have, with varying degrees of formality, signalled they expect to require some version of evaluation disclosure for frontier models in the coming several years. The working group's framework is, in part, an attempt to define a baseline that the eventual regulatory frameworks could draw on.
Whether the regulatory frameworks accept the working group's baseline as a starting point or whether they construct different frameworks of their own is the next-cycle question that the developers are watching most carefully.