Prioritizing Security above All: An Examination
The UK's AI Safety Institute (AISI) is making significant strides in addressing the critical issue of AI safety. With the powers to block the release of models or products that seem too unsafe, such as voice cloning systems that could enable widespread fraud, the AISI is taking a proactive approach to mitigating potential risks [1].
However, evaluations of foundation models like GPT-4 tell us very little about the overall safety of a product built on them [1]. Existing evaluation methods like red teaming and benchmarking have technical and practical limitations [1]. To bridge this gap, there is a need for independent public interest research into AI safety as a counterweight to industry-driven research [1].
The AISI's "safety case" approach is particularly valuable. It systematically combines specific safety claims, supporting evidence, and logical argumentation to ensure AI systems meet defined safety standards [2]. This approach is crucial for assessing whether powerful open-weight AI models meet appropriate safety thresholds before release [2].
The effectiveness of the UK's approach lies in its structured, evidence-based framework that aims to provide clarity and rigor in safety evaluations, addressing deficiencies in current industry practices [2]. However, AI Safety Institutes globally have varying focuses. The UK emphasizes frontier AI safety at the system level, while other nations like Singapore focus more on application-specific outcomes [2]. This diversity underscores the importance of international collaboration. The International AI Safety Network is seen as a promising facilitator to coordinate expertise and harmonize evaluation standards across borders [2].
However, recent reporting has highlighted that voluntary agreements are fraying, with three of the four major foundation model developers failing to provide the requested pre-release access to the UK's AISI for their latest cutting-edge models [1]. Making a determination about the safety of an AI system should involve assessing its impacts on its specific environment [1].
To be effective, the UK AISI needs to be integrated into a regulatory structure with empowered and resourced sectoral regulators for context-specific testing of AI products [1]. Comprehensive legislation will be necessary to provide the statutory powers mentioned above, as well as to fix other gaps in the UK's regulatory framework [1].
The UK AISI will need new capabilities underpinned by legislation, including powers to compel companies to provide access to AI models, training data, and accompanying documentation [1]. Established safety-driven regulatory systems typically cost more than £100 million a year to run effectively, and the skills and compute demanded by effective AI regulation may drive this figure even higher [1].
The European Union's AI Office has begun to set itself up with a mandate to evaluate 'frontier models' [1]. The limits of the voluntary regime extend beyond access and also affect the design of evaluations, with current evaluation practices being better suited to the interests of companies than publics or regulators [1]. AI safety is not an established term, and there is little agreement on what risks it covers [1].
In summary, while the UK's AI Safety Institute provides a promising structured and evidence-driven model for AI safety evaluation, the overall AI safety landscape still struggles with gaps in effective risk management and evaluation. The crucial role of national institutes and international coordination cannot be overstated in improving AI governance [1][2]. AI safety should mean keeping people and society safe from the range of risks and harms that AI systems cause [1].
The AISI's approach to AI safety, emphasizing systemic evaluation and the combination of safety claims, evidence, and logical arguments, is crucial for ensuring technology meets defined safety standards [2]. However, the limitations of existing evaluation methods, such as red teaming and benchmarking, necessitate independent public interest research to bridge the gap and provide a counterweight to industry-driven research [1].