ARTIFICIAL INTELLIGENCE ACT: TECHNICAL ASSISTANCE FOR AI SAFETY

Contract Value:: EUR 9M - 9M
Notice Type:: Contract Notice

Published Date:: 10 July 2025
Closing Date:: 25 August 2025

Location(s):: BE BELGIQUE-BELGIË (BE Belgium/BELGIQUE-BELGIË)

Description:

The AI Office is seeking third-party contractors to provide technical assistance for monitoring compliance with the Artificial Intelligence Act, focusing on the assessment of systemic risks posed by General-Purpose AI models through various specialized lots.

Regulation (EU) 2024/1689 (Artificial Intelligence Act), which has entered into force on the 1st of August 2024, and for which certain rules will become applicable on the 2nd of August 2025 establishes a comprehensive legal framework governing the area of Artificial Intelligence, and in particular relevance to this procurement, establishes responsibilities for providers of General-Purpose Artificial Intelligence (GPAI) models, aiming to ensure transparency, safety, and accountability in the deployment and use of AI technologies.

Articles 89 and 92 of the AI Act respectively allow the AI Office to monitor the compliance with regards to the AI Act and perform evaluations of GPAI models to assess the compliance and investigate systemic risks at Union level. Article 93 of the AI Act allows the Commission to request providers to take appropriate measures to comply with the obligations, to implement mitigation measures, or to restrict the availability of the model.

In order to carry out these enforcement tasks in relation the AI Act, the AI Office is seeking support of suitable third-party contractors to provide technical assistance aimed at supporting the monitoring of compliance, in particular in its assessment of risks posed by GPAI models at Union level.

This tender has been split into six lots. Five of those lots dedicated to specific systemic risks, namely (i) CBRN risks, (ii) cyber offence risks, (iii) loss of control risks, (iv) harmful manipulation risks and (v) sociotechnical risks. These lots all share a similar tasks and services:
• The organisation of multiple risk modelling workshops together with the AI Office, producing a risk model, risk scenarios, as well as corresponding thresholds, informing further work on evaluations.
• The development of newly evaluation tools that are private to the Commission.
• The onboarding of existing evaluations that are publicly available into the developed risk framework.
• The creation of a reference procedure and reporting template for doing risk assessment with respect to the developed evaluations.
• Ad hoc and on demand services for complex evaluations, including for example red-teaming and uplift studies.
• A risk monitoring service in the form of regular briefings on new developments (e.g. models, risk sources, elicitation techniques, mitigation, incidents) to complement existing AI Office monitoring actions.
The scope of these tasks varies by lot.

Furthermore, there is one lot that is cross-cutting across various risk.
• Lot 6: agentic evaluation interface, procuring software, Cloud infrastructure support, and methodology to evaluate GPAI on diverse types of benchmarks, focusing on agentic interaction and complex agentic benchmarks.

LOT-0001
CBRN Risk Modelling and Evaluation.
The objective of this lot is to support the AI Office in its enforcement of the AI Act in relation to General-Purpose AI Models (GPAI) and General-Purpose AI Models with systemic risk (GPAISR), specifically where such models may pose risks of enabling chemical, biological, radiological, and nuclear attacks or accidents. This includes significantly lowering the barriers to entry for malicious actors, or increasing the potential impact achieved, in the design, development, acquisition, and use of such weapon. The AI Office seeks to improve its capacity in identifying, monitoring, and assessing these risks, including the capacity to measure relevant capabilities of GPAI models and assess the effectiveness of mitigations aimed at preventing malicious actors from gaining access to these capabilities.
This lot aims to allow for flexible and close collaboration in developing risk models, prioritising risk scenarios, setting risk levels including a level of unacceptable systemic risk at Union level, identifying key model capabilities that could be linked to risk levels, and tailoring both new and existing risk measurement instruments to the context of the AI Office. Flexible collaboration between the AI Office and the contractor will be needed to ensure robust risk assessment as risks and our understanding of them evolve.

LOT-0002
Cyber Offence Risk Modelling and Evaluation.
The objective of this lot is to support the AI Office in its enforcement of the AI Act in relation to General-Purpose AI Models (GPAI) and General-Purpose AI Models with systemic risk (GPAISR), specifically where such models may pose risks related to offensive cyber capabilities that could enable large-scale or sophisticated cyber-attacks, including on critical systems (e.g. critical infrastructure). This includes automated vulnerability discovery, exploit generation, operational use, and attack scaling. The AI Office seeks to improve its capacity in identifying, monitoring, and assessing these risks, including the capacity to measure relevant capabilities of GPAI models and assess the effectiveness of mitigations aimed at preventing malicious actors from gaining access to these capabilities.
This lot aims to allow for flexible and close collaboration in developing risk models, prioritising risk scenarios, setting risk levels including a level of unacceptable systemic risk at Union level, identifying key model capabilities that could be linked to risk levels, and tailoring both new and existing risk measurement instruments to the context of the AI Office. Flexible collaboration between the AI Office and the contractor will be needed to ensure robust risk assessment as risks and our understanding of them evolve.

LOT-0003
Loss of Control Risk Modelling and Evaluation.
The objective of this lot is to support the AI Office in its enforcement of the AI Act in relation to General-Purpose AI Models (GPAI) and General-Purpose AI Models with systemic risk (GPAISR), specifically with regards to risks related to the inability to oversee and control autonomous GPAISRs that may result in large-scale safety or security threats. This includes model capabilities and propensities related to autonomy, alignment with human intent or values, self-reasoning, self-replication, self-improvement, evading human oversight, deception, or resistance to goal modification. It further includes model capabilities of conducting autonomous AI research and development that could lead to the unpredictable emergence of GPAISRs without adequate risk mitigations.
The AI Office seeks to improve its capacity in identifying, monitoring, and assessing these risks, including the capacity to measure relevant capabilities and propensities of GPAI models and assess the effectiveness of mitigations.
This lot aims to allow for flexible and close collaboration in developing risk models, prioritising risk scenarios, setting risk levels including a level of unacceptable systemic risk at Union level, identifying key model capabilities that could be linked to risk levels, and tailoring both new and existing risk measurement instruments to the context of the AI Office. Flexible collaboration between the AI Office and the contractor will be needed to ensure robust risk assessment as risks and our understanding of them evolve.

LOT-0004
Harmful Manipulation Risk Modelling and Evaluation.
The objective of this lot is to support the AI Office in its enforcement of the AI Act in relation to General-Purpose AI Models (GPAI) and General-Purpose AI Models with systemic risk (GPAISR), specifically where such models may pose risks of enabling the targeted distortion of the behaviour of persons, in particular through multi-turn interactions, that causes them to take a decision that they would not have otherwise taken, in a manner that causes, or is reasonably likely to cause, significant harm on a large scale. This includes the capability to manipulate through multi-turn interaction and the propensity of models to manipulate, including manipulation of high-stakes decision makers, large-scale fraud, or exploitation of people based on protected characteristics. As a guide, risk of harmful manipulation exists if it cannot, without reasonable doubt, be ruled out that a GPAISR, when integrated into an AI system, enables the AI system, irrespective of the intention of the AI system provider or deployer, to deploy subliminal, purposefully manipulative, or deceptive techniques as outlined in the Commission Guidelines on prohibited artificial intelligence practices established by Regulation (EU) 2024/1689 (AI Act) ( ).
The AI Office seeks to improve its capacity in identifying, monitoring, and assessing these risks, including the capacity to measure relevant capabilities of GPAI models and assess the effectiveness of mitigations aimed at preventing malicious actors from gaining access to these capabilities.
This lot aims to allow for flexible and close collaboration in developing risk models, prioritising risk scenarios, setting risk levels including a level of unacceptable systemic risk at Union level, identifying key model capabilities that could be linked to risk levels, and tailoring both new and existing risk measurement instruments to the context of the AI Office. Flexible collaboration between the AI Office and the contractor will be needed to ensure robust risk assessment as risks and our understanding of them evolve.

LOT-0005
Sociotechnical Risk Modelling and Evaluation.
The objective of this lot is to support the AI Office in its enforcement of the AI Act in relation to General-Purpose AI Models (GPAI) and General-Purpose AI Models with systemic risk (GPAISR), specifically where such models may pose large-scale sociotechnical risks, including those risks stemming from harmful bias or discrimination, or from the endangering of fundamental human rights such as freedom of expression or health protection—irrespective of the downstream systems and contexts in which these models are deployed.
The AI Office aims to enhance its ability to identify, monitor, and evaluate these risks, including the capacity to measure relevant capabilities and propensities of GPAI models and asses the effectiveness of mitigations. This includes the monitoring of indicator variables and the creation of early warning indicators for tracking accumulative risks that materialize through reach, well-meaning behaviour, and adoption rather than through novel capabilities or malicious actors.
This lot aims to allow for flexible and close collaboration in developing risk models, prioritising risk scenarios, setting risk levels including a level of unacceptable systemic risk at Union level, identifying key model capabilities that could be linked to risk levels, and tailoring both new and existing risk measurement instruments to the context of the AI Office. Flexible collaboration between the AI Office and the contractor will be needed to ensure robust risk assessment as risks and our understanding of them evolve.

LOT-0006
Agentic Evaluation Interface.
The objective of this lot is to supply the AI Office with a programmatic interface for the evaluation of General-Purpose Artificial Intelligence (GPAI) models through agentic interaction patterns, i.e. the evaluation of GPAI models for use in tasks requiring multiple decisions and interaction with digital environments such as a browser, command line, or a full operating system.
The interface is taken to mean software, methodology, configuration, cloud orchestration and other digital technical components that would allow the AI Office to execute evaluations of GPAI model and their agentic capabilities, without further dependency on the contractor, supplementing and working in tandem with AI Office digital infrastructure.
The AI Office is seeking such an interface in the form of a GPAI model evaluation workflow or ‘harness’, and the necessary components thereof, that allow the AI Office and its technical experts to rapidly integrate into the evaluation pipeline new GPAI models and benchmarks across all digital modalities (text, image, video, audio) and from the full spectrum of digital tasks. This harness should provide support for relevant and state of the art elicitation methods and agent scaffolding, including fallback mechanisms that connect text only models to multi-modal benchmarks.
It is expected that a significant part of the service will be dedicated to supporting the AI Office in the deployment on Commission infrastructure of existing or developed components of the evaluation interface.
Given the fast pace of development in AI, contractors should prepare for significant flexibility in execution and should structure the project to facilitate close communication for creating short feedback cycles and the purpose of alignment on design and needs.

The Buyer:: European Commission, DG CNECT - Communications Networks, Content and Technology

: Additional information:
Link:: Additional document: default-text
Link:: Download Full Notice as PDF
Link:: View Full Notice

CPV Code(s):: 79419000 - Evaluation consultancy services