Over the last year, I spent a lot of time thinking about what the field of AI could learn from the pharmaceutical industry. As a physician who has worked in clinical trials, I am intimately familiar with the processes there and learned to deeply appreciate them over time. I also spent a few months upskilling in AI, from learning how transformers work and attending conferences to project managing ML runs and cyber security workshops.
Here, I juxtapose AI labs and Pharma, showcase some of my favorite examples of discrepancies, and discuss more rigorous assessments of AI models inspired by clinical trial frameworks.
Thanks to LightSpeed Grants for providing some funding to investigate this topic and to the many individuals with whom I have discussed these thoughts over the past year.
Contents
AI Labs vs. Pharma
“The industry produces products that can provide great benefits and can pose severe risks to individuals. Extensive technical expertise is needed to develop products, and there are high costs ($100M+) involved in development.”
This statement applies to both AI labs and pharmaceutical companies.
Moreover, regulators are starting to gravitate toward a similar “mostly unregulated” and “seriously regulated” dichotomy in AI, similar to what is already in place for drug regulation. Using indicators such as application context and invested computational resources (~10^26 FLOPs) is similar to distinguishing between the regulation of drugs and dietary supplements.
In the tables below, I juxtapose R&D processes and stakeholders. Early phases of research and testing look quite similar. Later in the R&D process, clear differences become apparent, such as the commonly discussed difference in path to approval/deployment/licensing, which is far less systematized for AI (at this point) and often lacks complete workstreams.
There are many differences beyond that. One core difference is that the phases of drug development are built to rigorously test the safety of candidate drugs. Safety is defined as preventing a system or product from impacting its environment in an undesirable or harmful way, typically to protect human lives, the natural environment, or assets. The drug development process is not built to identify and tackle threat models where someone would use a drug detrimentally against others, i.e., security risks. Security aims to prevent often-adversarial agents or conditions from impacting a system in an undesirable or harmful way, e.g., product weaponization, autonomous replication, security breaches, or broad societal risks. Indeed, a 2010 National Research Council report concluded that predicting traits such as virulence or pathogenicity to better deal with the risks from bioengineered potential pandemic pathogens—with the degree of certainty necessary for regulatory purposes—would be impossible in the foreseeable future.
While safety of AI systems is clearly a concern, most experts are even more nervous about its security implications, i.e., how this tool can be abused. The AI industry can learn extensively about adverse event detection and management from drug development, but when it comes to protecting groups or even whole societies from technology abuse, the AI industry needs to look elsewhere for guidance.
If anything, this means that AI should go beyond what is expected in drug development, right?
Despite clear differences, many authors have written about various lessons that could be drawn from the FDA in particular regarding regulation and licensing in the AI industry.
The four lessons closely related to pharma as opposed to drug regulators I rarely see discussed are:
- Budget
- Scientific approach
- Scope
- Ecosystem checks and balances
Key areas of improvement for AI
Domains | Drug Development | AI |
---|---|---|
Drug development and other industries have spent the better part of a century developing methodologies to evaluate, test, and assure the quality and safety of their products. Industry best practices have made huge leaps over the years. However, these standards were not built in a day, and the lack of proper measures had severe consequences for both companies and patients. | AI is a nascent field with little to no track record in safety and security-related efforts. Given the limited adoption from related fields (and minimal attention from academia), we should be very concerned if AI labs are not meeting the societal expectations we have for other high-risk sectors. | |
Budget | Today, approximately 50-90% of total R&D costs in drug development are invested in quality and safety measures (including overhead for good manufacturing practices, animal experiments, clinical trials, etc.). | AI companies appear to spend only low single-digit percentages on assurance measures (and increasingly less in relative terms due to rising training costs). This represents an inversion of what we see in mature industries such as pharma or aviation. |
Ecosystem Checks and Balances | Drug development occurs within an ecosystem and is far from being run by a single entity. This enables numerous checks and balances. | AI companies are mostly autonomous and have few checks and balances. |
Scope | Testing depends on scoping and must be repeated if the scope or product changes. You must be very careful when applying a drug outside the distribution of what you tested. | Application of AI does not seem to recognize being “out of scope.” Safety results are often overly generalized. |
Scientific Approach | Drug evaluations are based on quantifying acceptable risks and obtaining statistically reliable information about real-world risk. | AI labs currently pursue a more qualitative approach to risk assessment without proper hypothesis testing. |
Budget: Putting Money Where Your Quality & Safety Is
This was one of the most compelling insights for me. In short, 50-90% of total drug development costs go to safety testing and quality assurance. Rigorous testing, including abandoning unsafe or inefficacious products, is what makes drug development so expensive—not the engineering challenges of manufacturing the drug itself. With investments of hundreds of millions spent on safety evaluations, they represent probably the biggest single R&D cost factor in a drug development program, followed by effectiveness evaluation, failed candidates, and process development/manufacturing.
To provide some figures: R&D expenditure per approved drug varies between under $1 billion to over $2 billion. Failed candidates account for a third of these costs. Process development and manufacturing consume approximately 15% of the R&D budget from pre-clinical trials to approval. The bulk (~60%) of expenses for successful candidates arise during clinical testing, with the combination of Phase 1 and 4 trials being the most costly. Frankly, it’s difficult to disambiguate the costs as virtually everything revolves around the central dogma of producing a safe and reliable drug.
For a $1 billion drug approval:
- $330M (33%): Failed candidates abandoned mostly for efficacy or safety reasons
- $100M (10%): Process development and manufacturing costs
- $66M (6%): Stringent quality and testing criteria in GMP (⅔) and actual manufacturing costs (⅓) (estimate from expert experience)
- Small-scale initial GMP batch (up to Phase 1): $3-10M
Medium/large-scale GMP batch (Phase 2 to Approval): $25-50M
- $225M (22.5%): Preclinical development efforts, including many iterations of
- in-vitro at less than 0.5M,
- animal experiments (0.5-5M) and
- small scale initial non-GMP drug batches at 0.5-1.5M
- $335M (33.5%): Clinical trials
- Costs vary dramatically, but can be cited at 4, 13, 20 and 20M dollars respectively.
- Average number of trials: Phase 1 (1.7), Phase 2 (2.0), Phase 3 (2.8), Phase 4 (3.2)
- Tallying them up gets you to average costs of $155M
- Capital opportunity costs make up 25-50% of costs in each category due to the decade-long development timeline:
- Preclinical phase: ~31 months
- Clinical phase: 5.9-7.2 years (non-oncology), 13.1 years (oncology)
Individual trial durations: Phase 1 (1.6 years), Phase 2 (2.9 years), Phase 3 (3.8 years)
Looking at drug development costs, safety and quality assurance account for over 50% of total R&D expenditure. When analyzing clinical trials, animal studies in preclinical phases, failed candidates, and GMP overhead, the ratio of development to safety costs ranges from 1:1 to 1:10, depending on the therapeutic area and definitions used. Novel drug designs require additional mandatory safety measures due to their uncharted nature, meaning safety and quality assurance can comprise 50-90% of total costs.
In aviation, my brief investigation of R&D cost breakdown for passenger airplanes shows total costs about ten times higher than drug development, around $10+ billion / new airplane. Safety and quality assurance are integrated into every R&D step, and the testing/certification process alone spans multiple years, consuming hundreds of millions of dollars. The ratio appears closer to 1:1 between testing/quality assurance and actual manufacturing costs, though this warrants deeper investigation. (One helpful intuition pump: A new passenger airplane is in the low $100M range, i.e. 1-5% the development costs)
AI developers appear to be at the opposite extreme, with an estimated >95% of total deployment costs going to what’s essentially manufacturing—training the model. For example, ChatGPT’s estimated $100M cost included only a trivial amount for safety measures. Industry insiders confirm that safety testing typically receives only low single-digit percentages of investment.
Given AI’s elevated stakes, a minimum ratio of 1:10 (training costs to assurance measures) seems appropriate for AI models—and possibly more. However, even this suggested ratio falls short of recommendations from concerned AI experts, who call for at least one-third of R&D budgets to be allocated to safety and ethical use. It’s also below some AI labs’ proclaimed safety budgets, such as OpenAI’s former superalignment team at 20%.
Ecosystem of checks and balances: Multi-party development where stakeholders have skin in the game
Drug development has many mechanisms to involve independent parties into the process and provide them with leverage. Many production and manufacturing services are provided by external vendors, contract research organizations oversee testing, pharmacists handling the product and phyisicans recruiting (and later treating) patients. Every single one of them is accountable for their work and expected to ensure that best-practices are accounted for. If not they are legally liable, can lose their license to operate and even end up in jail.
In the following table I tried to juxtapose different categories of stakeholders between drug development and AI. In terms of getting things done, having to synchronize between more people is certainly a challenge. But splitting responsibility across actors and holding each of them accountable by law seems like a substantially more resilient way of advancing a field. Incentives can be better aligned against recklessness and fraud.
A world where AI Labs not only need to find a data center with enough power and GPUs, but also have to engage deeply with multiple committees, independent boards, sensitive applications specialists etc. would be way more appealing to me.
Table: Stakeholders in Drug Development vs. High-Risk AI Development
The following compares the “actors” in drug development with those in high-risk AI model development. Note that while plausible equivalents exist for each drug development actor in high-risk AI development, most have not yet been established, though early efforts are underway. The establishment of independent boards, certifications, and public reporting systems has previously been identified as crucial for AI governance.
Domain | Category | Drug Development Implementation | High-risk AI Model development Implementation |
Regulatory body | International Guidelines Development | ICH / OECD / WHO Good X Practice (GxP, x = variable) guidelines with local implementation, e.g.,Good Clinical Practice (GCP) – for the design and conduct of clinical trialsGood Laboratory Practice (GLP) – for design and conduct of nonclinical (animal) studiesGood Manufacturing Practice (GMP) – for the manufacture of the drug product. “GMP material” means a drug produced in compliance with GMP standards. | e.g., NIST AI Risk Management Framework13 and ISO/IEC FDIS 23894.14 |
Local Law and Regulatory Guidance | Compliance with these standards is required by FDA regulations for US submissions (e.g., 21CFR210 for GMP, 21CFR58 for GLP, several parts of 21CFR for GCP), and often by other countries. | EU AI act, guidelines on ethical AI and similar laws | |
Regulatory Agencies | FDA, EMA, … | Unclear | |
Independent bodies | Independent advisory committees | E.g., Expert groups providing evaluations for regulators | Not a standard |
Independent Ethics committees | Institutional review boards (IRBs) | Not a standard | |
Independent Safety committees | Safety databases and Data and safety monitoring boards (DSMBs) | Not a standard | |
Independent Auditors | Review by independent external professionals as well as federal agencies | Not a standard | |
Public engagement | Patient Organizations in respective boards | Not a standard | |
Non-Profits | Public advocacy groups and non-profit watchdog groups | GovAI and think tanks such as RAND | |
Reporting | Public databases | Clinicaltrials.gov, other national trial registrations | Not a standard |
Publication of results in peer reviewed journals | Medical Journals (mandatory publication strategy) | Not a standard, Arxiv (voluntary) | |
Company | Sponsor | Pharmaceutical company (often the Sponsor) | AI Lab |
Providers | Operational Services | Contract research organizations (CROs) for project management | Not a standard |
Subject Matter Experts | Consultants for diseases, regulation, applications, etc | E.g. Consultants for chemical / biological dual use and other harmful application | |
Manufacturing | Contract Development and Manufacturing Organization (CDMO) | Data Centers / Cloud providers / Energy providers | |
Early product evaluation | Animal study providers, specialized labs | Organizations like METR | |
Training and Certification organizations | Medical/pharmaceutical licenses, GxP training, audits, etc | Not a standard | |
Logistic / Distribution Providers | Supply chain/pharmacies | Cloud providers running models (e.g. legal access to AI Models only available over certified providers) | |
Advanced product evaluation | Study sites | Not a standard Potentially: Companies with specially certified cybersecurity systems that can participate in controlled real world evaluation runs | |
Locally responsible | Physician Principal investigator (PI) | Not a standard. Potentially: AI Officers in companies | |
Study subjects | Patients | Customers / IT Systems / data structures / etc. | |
Real-world Application | Real world testing | Phase 4 Studies | Public Beta of ChatGPT and other LLMs |
Real world monitoring | E.g. FDA’s Adverse Event Reporting System (FAERS) | Not a standard; OECD Proposal for incident reporting database | |
Customers | Patients | Customers | |
Bystanders | Access control via physicians For drug development there is little to no history of harms to non-customers | Strong suspicion of potential high-stake complications for bystanders |
Scope: Implications of Different Customers and Ongoing Modification
Approved drugs always have an intended target population. Prescribing outside this population enters a legal gray zone and is approached with extreme caution. Similarly, drug modifications (and even generic drugs replicating the substance) require comprehensive new studies to prove that a) past evidence remains relevant and b) new evidence demonstrates safety in a similar target population. This creates an inherent tension between deploying the latest innovations to all potential patients worldwide and maintaining safety and security standards for each target group and version.
AI developers appear to be inadequately scoping their safety and security assessments, deploying to customers and contexts far removed from their testing scenarios, without adequate tracking of consequences.
While generative AI models’ safety testing is considered unusually complex due to their broad capabilities, this shouldn’t lead to abandoning safety testing. Instead, it suggests that capabilities are too broad and need to be scoped to allow for conclusive safety data collection. The solution is to narrow down use cases for generative AI models. Many drug candidates similarly show wide-ranging potential benefits across different diseases or even in healthy individuals. However, due to the cost of establishing product safety confidence, companies can only market them to targeted populations where convincing evidence for safety (and effectiveness) has been collected.
Whether for society at large, companies, or specific individuals, different use cases require different types and amounts of conclusive and reliable evidence to determine product safety, either through company testing or regulatory oversight. This necessitates a structured approach and external validation of experimental planning, execution, and analysis.
Scientific Testing: AI Safety Studies for Assessing Dangerous Capabilities Reliably – Scale Matters
In drug development, all stakeholders share a singular obsession: do no harm to humans. This applies both to test subjects and the millions who will eventually receive the drug. Massive resources are invested in ensuring safety, with every aspect of development, testing, and rollout designed to meet predefined conditions.
There are various ways to quantify acceptable risks, and risk tolerance always depends on context and potential positive effects. For example, the acceptable risk profile for a potentially life-saving drug in a terminally ill patient differs significantly from that of a vaccine against a mild virus in healthy individuals.
The most serious adverse events in drug development are termed “serious adverse events” (SAE), encompassing seven specific event types (see here for definition). For most drugs, even single-digit occurrences of these events can halt an entire development program or restrict the drug from certain patient populations. Event rates of serious adverse events between 1:100 to 1:10,000 are often considered unacceptable.
This stringency exists because rolling out drugs across an entire population could potentially cause thousands of deaths or other SAEs—outcomes deemed absolutely unacceptable in most cases. Preventing such situations requires rigorous evidence collection, monitoring, and quality assurance.
The upper bound of harm envisioned in drug development appears to be the lower bound of what AI experts consider catastrophic risks. Beyond immediate individual harms, AI experts acknowledge potential catastrophic societal risks, ranging from “thousands of deaths or hundreds of billions of dollars in damage” (Anthropic’s RSP) to extinction-level events.
Current primary threat models focus on:
- Misuse by bad actors using AI to cause harm via cyber, bio, nuclear, or social unrest
- Similar threats from autonomous, uncontrollable AI agents
Current frameworks for assessing frontier AI model risks typically involve teams of 3-10 domain experts attempting to elicit specific capabilities, benchmarked against fair comparators like search engines or human performance (as seen in biological evaluations reports and various system cards).
While individual teams test various prompts and paths, a single team’s inability to elicit dangerous capabilities with given resources doesn’t guarantee that other equally capable teams wouldn’t succeed—if only by chance. Capability elicitation in AI models remains a nascent field, and ongoing demonstrations of jailbreaks and other attacks by amateur individuals highlight the importance of scale in testing.
If we want to claim “individuals of type X cannot elicit this capability given Y resources,” we need experiments with acceptable event rates (e.g., 1:1000) and power thresholds (e.g., 90% power) repeated multiple independent times.
With sufficient repetition (guided by sample size calculations), failing to detect any team eliciting a dangerous capability could support statements like: “we are 95% confident that fewer than 1:1000 teams of X individuals with resources Y will be able to elicit these capabilities from the model.” While this doesn’t completely protect against catastrophic risks, it provides substantially more reliable data than assessments from AI lab employees who have obvious conflicts of interest.
AI labs should assess and implement mitigations for frontier models to obtain statistically reliable data on whether individuals or teams at different expertise and resource levels can misuse their models. Similar evidence is needed for dangerous autonomy risks, though this discussion focuses on misuse contexts.
For understanding different threat actors, RAND’s recent report on securing AI model weights defines five operational capacity (OC) levels from OC1: Amateur level (~$1,000) to OC5: State actor level ($1 billion budget). These resource levels are also relevant for dangerous capability assessments.
To demonstrate reliable hypothesis testing in misuse scenarios, let’s consider the global distribution of actors across operational capacity levels:
- OC1 (Amateur): 100,000 actors
- OC2: 10,000 actors
- OC3: 1,000 actors
- OC4: 100 actors
- OC5 (State-level): 2 actors
Using the rule of three in statistics, to reliably rule out unwanted capability elicitation at any OC level, you need approximately 3x the number of test instances. For example, 300,000 OC1 attempts would allow you to state with 95% confidence that fewer than 1:100,000 amateur coders with ≤$1k resources can elicit dangerous capabilities.
The elicitation of dangerous capabilities could be measured through aggregate scores across domains or individual metrics, with careful consideration given to appropriate primary endpoints.
Importantly, this scale of testing is achievable: Lakera demonstrated this by creating “Gandalf,” an online jailbreaking game challenging users to devise clever prompts of increasing difficulty. Their community has invested an aggregate of 25 years across over 1 million sessions attempting to jailbreak their system.
In their responsible scaling policy Anthropic defines that they would consider actors who are able to elicit dangerous capabilities with “1% of the total training cost” as concerning such that they would initiate substantial additional security mitigations. 1% of the total training costs is roughly equivalent to OC3, benchmarked at ~1 million dollars. If we wanted to get 95% confident in assuring that less than 1:1,000 of these groups are able to do that, we would need ~3,000 trials, very similar to the minimum number of participants in vaccine drug development where data to detect adverse events of 1:1,000 need to be provided. This would be equivalent to run a phase 3 clinical trial, but likely with 100x the costs (approximately $30M / trial vs $3B). Importantly, one might make the assertion that while 1,000 such actors exist globally, not all of them will decide to pursue misusing the model / will spend that much money on it. One could assume that the outlined experiment of 3,000 trials with an average of $1M per trial would include level OC4 (actors with $10M in funding) as well. Similar to drug development one would assume that multiple such trials might be necessary, in particular for general AI models. General models have many threat models making testing them all exceedingly hard. Scoping down the model’s capabilities, e.g. by purely training it on code, or purely on biology etc. would remove the necessity of multiple paths of testing.
Misuse from OC5 actors is likely out of scope for safety testing of models, as funding (≥ $1B) of these actors is sufficient to create frontier models themselves. Additionally studies on OC3 & 4 level would provide indicators for crucial mitigations that would also apply to even better resourced actors. Additionally safety buffers could serve as a proxy for what is assumed to be possible for state actors (e.g. shifting the threshold towards a more conservative end)
The proposals above necessitate a substantial investment in testing infrastructure. Similar to the interactions of pharmaceutical companies, contract research organizations and quality assured study sites that recruit patients, frontier AI labs would likely need to partner with companies that provide the platform to engage with quality controlled teams and partners securely and with very clear monitoring and reporting requirements.
Appendix
Comparison of Drug and AI model development process
This table lists all the essential steps of the drug development process and maps respective steps and terminology from the AI model development space. In the early stages, substantial overlap exists, whereas no equivalent for clinical testing exists at later stages for AI Models.
Drug Development | Milestones | AI Model Development | |||
Discovery | Target population and disease literature review; patient sampling; expert consultation | Goal: General Artificial intelligence | Research | ||
In-silico / in-vitro prototyping | Identification of product. Correspondence with regulators on how to best assess and determine safety/efficacy | Prototyping (test runs, hypothesis testing, early algorithm evaluations) | |||
Early Manufacturing | Generating initial drug substance via process development run | Generating initial model weights via iterative training runs (iteratively with evaluations at predefined intervals) | Training (Testruns) | ||
Formulation & Fill Finish to arrive at initial Drug Product | User interface attached to model | ||||
Early drug evaluation and iterative drug improvement in controlled environments | Preclinical (multiple in-vitro, animals), Phase-0 trials | Ethical approval | Safety Evaluations | ||
Toxicology | Start of assessment of characteristics, safety, and efficacy in highly controlled low stake environment | Safety Evaluations of dangerous capabilities (controllability, autonomous replication, lying, etc) via standard tests and red-teaming | |||
Pharmacology (Pharmacodynamics & Kinetics) | Interpretability runs, incl. capability testing | ||||
Multiple model organisms | Fairness assessment | ||||
Stability testing | Various temperatures, timepoints and analysis methods | Accuracy, Robustness and jailbreaking testing | |||
Consulting with external experts | Evaluation of generated data by SMEs | Proof of concept established – additional investments in upscaling | Evaluations done by external experts | ||
Production Manufacturing run | Clinical Batch | Model trained additional data from evaluation runs now meeting all the requirements | Model Fine Tuning | ||
Extensive Real World Drug Evaluation | Clinical Trials in controlled environments | Regulatory and Ethics approval (for each study individually) | No controlled real world evaluation as part of the AI Model development process | ||
Phase 1 (~2x) Assess the drug’s safety and dosage in a small group of healthy volunteers. This helps determine the safe dosage range and identifies potential side effects. | Start testing characterization, safety, and efficacy in a highly controlled high stake environment | ||||
Phase 2 (~2x) Evaluate the drug’s efficacy and further assess its safety in a larger group of patients who have the condition that the drug aims to treat. | |||||
Phase 3 (~3x) Test the drug in an even larger group of patients to confirm its effectiveness, compare it to commonly used treatments, and monitor side effects in a more diverse population. | |||||
Audits and inspections by sponsor and regulators | Audits throughout the development process proportional to the criticality of the materials or services provided | All necessary data on safety and efficacy collected and ready evaluation by regulators | |||
Approval & Post Market Drug Monitoring | Application | Gradual public Rollout | |||
Phase 4 (~3x) | Access to the public | Beta testing programs | |||
Full market release | Public release | ||||
Adverse Event Reporting | |||||
Withdrawal | Access controls | ||||
Updated Versions | New application process with new data collected | Deployment of updated model | Updates |
Overview of safety principles in clinical trials
Description | Documents |
---|---|
Preparation Research Prior to Real-World Testing: Literature, animal, and later human data indicating concerning effects and their frequency are consolidated in continuously updated reports that define plausibly related adverse events to monitor. Optimized Experimental Designs: Employ strategies like sequential testing, simulations, and sample size calculations to minimize harm, ensure resource efficiency, and detect adverse events at specific frequencies (e.g., 1 adverse event per 10,000,000 user interactions). Optimized Detection Environments: Specialized sites employing personnel with mandatory training for optimal detection, response, and reporting of adverse events. Copy | International guidelines for various preclinical safety testing. Example (mandatory) preclinical safety brochure and package insert (consumer) (physician) Pfizer / BioNTech Comirnaty. FDA and EMA guidance on statistical principles for clinical trials based on ICH E9 guideline. See section 4. of Good Clinical Practice guidelines and FDA guidance on physician responsibilities. |
Detection Dedicated Experiments: Clinical trials, particularly Phase 1, 3, and 4 studies, are large-scale experiments deliberately designed to detect adverse events. Continuous Adverse Event Evaluations: Extensive structured data collection during development and after market access. Automated signal detection is commonly implemented through pharmacovigilance. Recording All Negative Events: Ensure a holistic view by recording any unexpected, undesired, or harmful occurrences, regardless of their causal relationship. Maintaining Reliable Data: Employ auditable safety databases, uphold good documentation practices, and implement various forms of monitoring. | International guidelines for safety in human testing (see section E1 and E2A-F) and FDA guidance for post market evaluation “Pharmacovigilance is the science and activities relating to the detection, assessment, understanding and prevention of adverse effects or any other medicine/vaccine related problem.” WHO definition “An adverse event (AE) can therefore be any unfavourable and unintended sign (including an abnormal laboratory finding, for example), symptom, or disease temporally associated with the use of a medicinal product, whether or not considered related to the medicinal product.” ICH E2A Good Documentation Practice (GDocP) as outlined by section 4.9 of GCP. |
Management Categorization and Prioritization of adverse events: – Relatedness: Causal relationship between treatment and effect – Severity: Intensity of affected individuals experience – Expectedness: Consistency with anticipated adverse events – Outcome: Seriousness of consequences – Clusters: hierarchical mapping of events into clusters Prespecified Event Handling: Type-specific prespecified procedures and notifications. Unbiased Assessment: Use independent monitors/safety boards and multiparty decision-making to eliminate conflicts of interest. | International Clinical Safety Data Management: Definitions and Standards for reporting, and the FDA implementation MedDRA provides mapping of over 70,000 medical terms. Safety event procedures physicians and pharmaceutical companies (GCP Section 4.11, 5.16 , 5.17) FDA on Establishment and Operation of Data Monitoring Committees. |
Documents in a typical clinical trial application
- The minimal set of documents will include
- Study protocol
- Introduction (incl Study Rationale, Background, Risk/Benefit Assessment)
- Objectives and Endpoints
- Study Design (incl. Scientific rational, end of study definition, stopping criteria)
- Study Object (incl. Inclusion / exclusion criteria)
- AI model interventions and management (incl. Data infrastructure, accountability, preparation, etc)
- Discontinuation of Study
- Study Assessments and Procedures (incl. benchmarks, assessments, interpretability work, definition of adverse events, independent monitoring etc.)
- Statistical considerations (incl. Statistical Hypotheses, Sample Size Determination, endpoint analysis etc)
- Administrative Matters (incl Ethics, data privacy, informed consent, records, etc)
- Investigators Brochure (similar to Model Cards in AI)
- Informed Consent
- Vendor Evaluation Plan
- Monitoring Plan
- Project Management Plan (incl comms, escalation)
- Vendor Management Plan
- Trial Master File Plan (includes Investigator Site Files)
- Safety Management Plan
- Risk Assessment and Categorisation
- Data Management Plan
- Statistical Analysis Plan