Geneea’s AI Spotlight #3

The third edition of our newsletter on Large Language Models is here.

Today, we look at

an introduction to LLM models by Andrej Karpathy;
two posts on practical aspects of using LLMs; and
the regulation of AI by the EU.

Overview of GPT in 45 minutes

State of GPT (2023-05) – a great talk by Andrej Karpathy from OpenAI. It covers all the main topics around LLMs: their training, finetuning, prompting strategies, reflection, tools/plugins, and more.

Practical aspects

AI in Sequoia companies (2023-06)

The post summarizes how 33 Sequoia companies (including Zoom, Midjourney, Airbnb, and Notion) use LLMs.
65% of them have an LLM-based application in production.
94% use a 3rd-party LLM through an API. 15% train their own models. The authors expect these two approaches to converge. This can mean, for example, using own model for embeddings and API for summarization.
OpenAI’s GPT was the favored API for 90+%. Anthropic API was used by 15%, growing in the last month.
Nearly 90% of companies believe that retrieval mechanisms, such as vector databases, are a key part of the stack and address problems like data freshness and hallucinations. We fully agree with this.
Companies want to customize LLMs. The easiest but still very effective approach is to find the relevant information by other means (e.g., using embeddings) and then provide the right context to the LLM as part of the query. Training a custom model is hard, but the authors expect to see it more often as the tools improve.
It is still early. Output quality and data privacy are major roadblocks to full adoption. Less than 10% of the companies were looking at tools to monitor LLM output or A/B test prompts. Everybody the authors spoke with said AI is moving too quickly for them to have high confidence in the end-state stack.

Emerging Architectures for LLM Applications (2023-06)

Matt Bornstein and Rajko Radovanovic from Andreessen Horowitz wrote a great post discussing the emerging common architecture of LLM-based systems.

The common workflow consists of three steps: indexing of information via embeddings, retrieval of the most relevant information, and inference over it.
This workflow is, for example, the backbone of both Langchain and LlamaIndex. And the post above shows that at least Sequoia companies often use it.
The authors discuss each step in more detail but also address aspects like operational tooling, hosting, etc.
Agents are incredibly exciting, but they do not work yet. We agree with this.

AI regulation

The European Parliament passed the EU AI Act (AIA) on June 14. We are no lawyers, so for legal analysis, you need to go elsewhere. But from our perspective, the most important points of the bill can be summarized as follows.

Applications are organized into four levels with an increasing level of regulation:

Minimal risk (most AI systems in use: AI in video games, spam filters, … ): No special regulation.
Limited risk (chatbots): Users must be aware they are interacting with AI.
High risk (AI used in planes, cars, medical devices, toys, education, law enforcement, loan scoring, recommenders when used by platforms with 45M+ users, etc.): See below.
Unacceptable risk (social scoring by governments, predictive policing, manipulation of children, etc.): Banned.

High-risk applications & foundational models:

Foundational models, including LLMs such as GPT, are not automatically considered high-risk (according to Time magazine, this is the result of OpenAI’s lobbying), but in many respects, they come close.
They will need to be assessed before being put on the market.
Mitigation of risks must be part of their design and development process, and detailed documentation about this must be provided to customers.
Quality management of training data is required, and all the copyrighted data must be disclosed (but we cannot find anything about paying its authors for it).
An interesting point is that all high-risk and foundational models need to be registered in an EU-wide, publicly accessible database. All use of these systems by governments must be recorded as well.
In sum, there is a push for bigger transparency and caution, rejecting the “let’s release it and see what happens” approach.
Update: Standford’s Center for Research on Foundation Models (CRFM) published a great article evaluating how the most often used LLMs comply with the draft AIA. GPT4 got 25 points out of 48, PaLM 2 got 27, Claude got 7.

The final version will be the result of a negotiation between the Parliament, the Commission, and the Council (the states), a trialogue in the EU jargon. We can expect it to enter into force around the end of this year. Obviously, a lot can change before then, but similarly to GDPR before, the act has an effect already. And also similarly to GDPR, it is very likely to influence, through the so-called Brussels effect, AI development around the world.

For more info, see, for example, AP, CNN, OpenAI’s own AIA White Paper, and, obviously, the act itself.

Please subscribe and stay tuned for the next issue of Geneea’s AI Spotlight newsletter!