OWASP LLM AI Cybersecurity & Governance Checklist

Welcome to Resilient Cyber! Before we dive into the below topic, please be sure to hit the “Subscribe” button. This will ensure you continue to get access and updates on podcasts, interviews and articles, all for FREE!

Cybersecurity leaders around the industry have been scrambling to keep pace with their organizations' rapid exploration, adoption and use of Large Language Models (LLM)’s and GenerativeAI. Companies such as OpenAI, Anthropic, Google and others have seen an exponential growth in the use of their GenAI and LLM offerings.

Additionally, open source alternatives have also seen significant growth, with AI communities such as HuggingFace becoming widely used, offering models, datasets and applications to the technology community. 

As the use of GenAI and LLM has evolved, so has the guidance from industry leading organizations such as OWASP, OpenSSF, CISA and others. OWASP notably has been critical in providing key resources, such as their OWASP AI Exchange, AI Security and Private Guide and LLM Top 10. Recently the OWASP LLM Top 10 project released their “LLM AI Cybersecurity & Governance Checklist” which we will be taking a look at in this article. 

Overview

The publication starts off by clarifying the distinction between broader AI and ML and Generative AI and LLM’s. Generative AI is defined as a type of machine learning that focuses on creating new data, while LLM’s are a top of AI model used to process and generate human-like text.

They make predictions based on the input provided to them and the outputs are human-like in nature. We see this with leading tools such as ChatGPT which already boasts over 180 million users with over 1.6 billion site visits in January 2024 alone.

The LLM Top 10 project produced the checklist to help cybersecurity leaders and practitioners try and keep pace with the rapidly evolving space and protect against risks associated with rapid insecure AI adoption and use, particularly from GenAI and LLM offerings.

The topic of AI can be incredibly vast, so the checklist is aimed at helping leaders quickly identify some of the key risks associated with GenAI and LLM’s and equip them with associated mitigations and key considerations to limit risk to their respective organizations.  

The group also stresses the checklist isn’t exhaustive and will continue to evolve as the use of GenAI, LLM’s and the tools themselves also evolve and mature. 

There are various LLM Threat Categories, which are captured in the image below: 

Source: OWASP LLM AI Cybersecurity & Governance Checklist

Organizations need to determine their LLM strategy. This is essentially how organizations will handle the unique risks associated with GenAI and LLM’s and implement organizational governance and security controls to mitigate these risks. The publication lays out a 6 step practical approach for organizations to develop their LLM strategy, as depicted below:

There are also various LLM deployment types, each with their own unique considerations, as captured below They range from public AP access and licensed models all the way to custom models: 

With those key considerations out of the way, let's walk through the various checklist areas identified and some key takeaways from each of them. 

Adversarial Risk

This area involves both competitors and attackers and is focused on not just the attack landscape but the business landscape. This includes understanding how competitors and using AI to drive business outcomes as well as updating internal processes and policies such as Incident Response Plans (IRP)’s to account for GenAI attacks and incidents. 

In fact, NIST has an entire project dedicated to Adversarial Machine Learning and they published a comprehensive draft document titled “A Taxonomy of Adversarial Machine Learning” which provides a robust set of potential techniques used to compromise ML systems and environments.

Threat Modeling

Threat Modeling is a security technique that continues to gain increased traction in the broader push for Secure-by-Design systems, being advocated for by CISA and others. This involves thinking through how attackers can use LLM’s and GenAI to accelerate exploitation, the businesses ability to detect malicious LLM use and examining if the organization can safeguard connections to LLM and GenAI platforms from internal systems and environments. 

AI Asset Inventory

The old adage “you can’t protect what you don’t know you have” applies in the world of GenAI and LLM’s as well. This area of the checklist involves having an AI asset inventory for both internally developed solutions as well as external tools and platforms as well. Understanding not just the tools and services being used by the organization but “ownership” too, in terms of who will be accountable for their use.

There’s also recommendations to include AI components in SBOM’s and catalog AI data sources and their respective sensitivity as well. In addition to having an inventory of existing tools in use, there also should be a process to onboard and off-board future tools and services from the organizational inventory securely. 

One of the leading SBOM formats is CycloneDX by OWASP and they announced in 2024 their support for “ML-BOM”s.

AI Security and Privacy Training

It’s often quipped “humans are the weakest link”, however that doesn’t need to be the case if an organization properly integrates AI security and privacy training into their GenAI and LLM adoption journey. This involves helping staff understand existing GenAI/LLM initiatives, as well as the broader technology, how it functions, and key security considerations, such as data leakage.

Additionally, it is key to establish a culture of trust and transparency so staff feel comfortable sharing what GenAI and LLM tools and services are being used, and how. A key part of avoiding “shadow AI” usage will be this trust and transparency among the organization, otherwise people will continue to use these platforms and simply not bring it to the attention of IT and Security teams for fear of consequences or punishment.

Establish Business Cases

This one may be surprising, but much like with cloud before it, most organizations don’t actually establish coherent strategic business cases for using new innovative technologies, including GenAI and LLM. It is easy to get caught in the hype and feel you need to join the race or get left behind. But without a sound business case, the organization risks poor outcomes, increased risks and opaque goals. 

Governance

Without Governance, accountability and clear objectives is nearly impossible. This area of the checklist involves establishing an AI RACI chart for the organization’s AI efforts, documenting and assigning who will be responsible for risks and governance and establishing organizational-wide AI policies and processes. 

Legal

While obviously requiring input from legal experts beyond the cyber-domain, the legal implications of AI aren’t to be underestimated. They are quickly evolving and could impact the organization financially and reputationally.

This area involves an extensive list of activities, such as product warranties involving AI, AI EULA’s, ownership rights for code developed with AI tools, IP risks and contract indemnification provisions just to name a few. To put it succinctly, be sure to engage your legal team or experts to determine the various legal-focused activities the organization should be undertaking as part of their adoption and use of GenAI and LLM’s. 

Regulatory

To build on the legal discussions, regulations are also rapidly evolving, such as the EU’s AI Act, with others undoubtedly soon to follow. Organizations should be determining their country, state and Government AI compliance requirements, consent around the use of AI for specific purposes such as employee monitoring and clearly understanding how their AI vendors store and delete data as well as regulate its use. 

For those interested, I cover the EU AI Act in an article with Acceleration Economy titled “Decoding the EU AI Act: A Look at the Global Implications for AI Security and Compliance”.

Using or Implementing LLM Solutions

Using LLM solutions requires specific risk considerations and controls. The checklist calls out items such as access control, training pipeline security, mapping data workflows and understanding existing or potential vulnerabilities in the LLM models and supply chains. Additionally, there is a need to request third-party audits, penetration testing and even code reviews for suppliers, both initially and on an ongoing basis. 

Testing, Evaluation, Verification and Validation (TEVV)

The TEVV process is one specifically recommended by NIST in their AI Framework. This involves establishing continuous testing, evaluation, verification and validation throughout AI Model lifecycles as well as providing executive metrics on AI model functionality, security and reliability. 

Model Cards and Risk Cards

To ethically deploy LLM’s, the checklist calls for the use of model and risk cards, which can be used to let users understand and trust the AI systems as well as openly addressing potentially negative consequences such as biases and privacy. These cards can include items such as model details, architecture, training data methodologies and performance metrics. There is also emphasis on accounting for responsible AI considerations and concerns around fairness and transparency. 

RAG: LLM Optimizations

Retrieval-Augmented Generation (RAG) is a way to optimize capabilities of LLM’s when it comes to retrieving relevant data from specific sources. It is a part with optimizing pre-trained models or re-training existing models on new data to improve performance. The checklist recommended implementing RAG to maximize the value and effectiveness of LLM’s for organizational purposes. 

AI Red Teaming 

Lastly, the checklist calls out the use of AI red teaming, which is emulating adversarial attacks of AI systems to identify vulnerabilities and validate existing controls and defenses. It does emphasize that red teaming alone isn’t a comprehensive solution or approach to securing GenAI and LLM’s but should be part of a comprehensive approach to secure GenAI and LLM adoption.

That said, it is worth noting that organizations need to clearly understand the requirements and ability to red team services and systems of external GenAI and LLM vendors to avoid violating policies or even find themselves in legal trouble as well. 

AI Red Teaming and Penetration Testing is also called for in other sources such as those by NIST discussed above, as well as in the EU AI guidance.

Conclusion

While not exhausting of all potential GenAI and LLM’s threats and risk considerations, the OWASP LLM AI Cybersecurity & Governance Checklist represents a concise and quick resource for organizations and security leaders to use to help their organization. It can aid practitioners in identifying key threats and ensuring the organization has fundamental security controls in place to help secure and enable the business as it matures in their approach of adopting GenAI and LLM’s tools, services and products.