Understanding the Tech Stack Behind Generative AI

When ChatGPT reached the a million person mark inside 5 days and took off quicker than every other expertise in historical past, the world started to concentrate to synthetic intelligence and AI functions.

And so it continued apace. Since then, many alternative phrases have been buzzing round — from ChatGPT and Nvidia H100 chips to Ollama, LangChain, and Explainable AI. What is definitely meant for what?

That’s precisely what you’ll discover on this article: A structured overview of the expertise ecosystem round generative AI and LLMs.

Let’s dive in!

Desk of Contents
1 What makes generative AI work – at its core
2 Scaling AI: Infrastructure and Compute Power
3 The Social Layer of AI: Explainability, Fairness and Governance
4 Emerging Abilities: When AI Starts to Interact and Act
Final Thoughts
Where Can You Continue Learning?

1 What makes generative AI work – at its core

New phrases and instruments within the subject of synthetic intelligence appear to emerge nearly day by day. On the core of all of it are the foundational fashions, frameworks and the infrastructure required to run generative AI within the first place.

Basis Fashions

Are you aware the Swiss Military Knife? Basis fashions are like such a multifunctional knife – you’ll be able to carry out many alternative duties with only one software.

Basis fashions are giant AI fashions which were pre-trained on enormous quantities of information (textual content, code, photographs, and so on.). What’s particular about these fashions is that they can’t solely remedy a single job however can be used flexibly for a lot of completely different functions. They will write texts, appropriate code, generate photographs and even compose music. And they’re the premise for a lot of generative AI functions.

The next three facets are key to understanding basis fashions:

Pre-trained
These fashions had been skilled on enormous information units. Because of this the mannequin has ‘learn’ an enormous quantity of textual content or different information. This section may be very expensive and time-consuming.
Multitask-capable
These basis fashions can remedy many duties. If we take a look at GPT-4o, you should utilize it to resolve on a regular basis questions on information questions, textual content enhancements and code era.
Transferable
By fine-tuning or Retrieval Augmented Era (RAG), we are able to adapt such Basis Fashions to particular domains or specialise them for particular utility areas. I’ve written about RAG and fine-tuning intimately in How to Make Your LLM More Accurate with RAG & Fine-Tuning. However the core of it’s that you’ve two choices to make your LLM extra correct: With RAG, the mannequin stays the identical, however you enhance the enter by offering the mannequin with further sources. For instance, the mannequin can entry previous help tickets or authorized texts throughout a question – however the mannequin parameters and weightings stay unchanged. With fine-tuning, you retrain the pre-trained mannequin with further sources – the mannequin saves this information completely.

To get a really feel for the quantity of information we’re speaking about, let’s take a look at FineWeb. FineWeb is a massive dataset developed by Hugging Face to help the pre-training section of LLMs. The dataset was created from 96 common-crawl snapshots and contains 15 trillion tokens – which takes up about 44 terabytes of cupboard space.

Most basis fashions are primarily based on the Transformer structure. On this article, I gained’t go into this in additional element because it’s concerning the high-level elements round AI. A very powerful factor to grasp is that these fashions can take a look at the whole context of a sentence on the identical time, for instance – and never simply learn phrase by phrase from left to proper. The foundational paper introducing this structure was Attention is All You Need (2017).

All main gamers within the AI subject have launched basis fashions — every with completely different strengths, use circumstances, and licensing situations (open-source or closed-source).

GPT-4 from OpenAI, Claude from Anthropic and Gemini from Google, for instance, are highly effective however closed fashions. Because of this neither the mannequin weights nor the coaching information are accessible to the general public.

There are additionally high-performing open-source fashions from Meta, comparable to LLaMA 2 and LLaMA 3, in addition to from Mistral and DeepSeek.

An amazing useful resource for evaluating these fashions is the LLM Arena on Hugging Face. It gives an summary of varied language fashions, ranks them and permits for direct comparisons of their efficiency.

Screenshot taken by the writer: We will see a comparability of various llm fashions within the LLM Enviornment.

Multimodal fashions

If we take a look at the GPT-3 model, it may solely course of pure textual content. Multimodal fashions now go one step additional: They will course of and generate not solely textual content, but additionally photographs, audio and video. In different phrases, they will course of and generate a number of varieties of information on the identical time.

What does this imply in concrete phrases?

Multimodal fashions course of several types of enter (e.g. a picture and a query about it) and mix this info to supply extra clever solutions. For instance, with the Gemini 1.5 model you’ll be able to add a photograph with completely different components and ask the query which components you see on this plate.

How does this work technically?

Multimodal fashions perceive not solely speech but additionally visible or auditory info. Multimodal fashions are additionally often primarily based on transformer structure like pure textual content fashions. Nevertheless, an essential distinction is that not solely phrases are processed as ‘tokens’ but additionally photographs as so-called patches. These are small picture sections which are transformed into vectors and might then be processed by the mannequin.

Let’s take a look at some examples:

GPT-4-Imaginative and prescient
This mannequin from OpenAI can course of textual content and pictures. It recognises content material on photographs and combines it with speech.
Gemini 1.5
Google’s mannequin can course of textual content, photographs, audio and video. It’s significantly sturdy at retaining context throughout modalities.
Claude 3
Anthropic’s mannequin can course of textual content and pictures and is excellent at visible reasoning. It’s good at recognising diagrams, graphics and handwriting.

Different examples are Flamingo from DeepMind, Kosmos-2 from Microsoft or Grok (xAI) from Elon Musk’s xAI, which is built-in into Twitter.

GPU & Compute Suppliers

When generative AI fashions are skilled, this requires huge computing capability. Particularly for pre-training but additionally for inference – the next utility of the mannequin to new inputs.

Think about a musician practising for months to arrange for a live performance – that’s what pre-training is like. Throughout pre-training, a mannequin comparable to GPT-4, Claude 3, LLaMA 3 or DeepSeek-VL learns from trillions of tokens that come from texts, code, photographs and different sources. These information volumes are processed with GPUs (Graphics Processing Items) or TPUs (Tensor Processing Items). That is vital as a result of this {hardware} permits parallel computing (in comparison with CPUs). Many corporations lease computing energy within the cloud (e.g. through AWS, Google Cloud, Azure) as a substitute of working their very own servers.

When a pre-trained mannequin is tailored to particular duties with fine-tuning, this in flip, requires a whole lot of computing energy. This is without doubt one of the main variations when the mannequin is customised with RAG. One solution to make fine-tuning extra resource-efficient is low-rank adaptation (LoRA). Right here, small elements of the mannequin are particularly retrained as a substitute of the whole mannequin being skilled with new information.

If we stick with the music instance, the inference is the second when the precise stay live performance takes place, which needs to be performed time and again. This instance additionally makes it clear that this additionally requires sources. Inference is the method of making use of an AI mannequin to a brand new enter (e.g. you ask a query to ChatGPT) to generate a solution or a prediction.

Some examples:

Specialised {hardware} elements which are optimised for parallel computing are used for this. For instance, NVIDIA’s A100 and H100 GPUs are normal in lots of information centres. AMD Instinct MI300X, for instance, are additionally catching up as a high-performance various. Google TPUs are additionally used for sure workloads – particularly within the Google ecosystem.

ML Frameworks & Libraries

Identical to in programming languages or net improvement, there are frameworks for AI duties. For instance, they supply ready-made features for constructing neural networks with out the necessity to program every part from scratch. Or they make coaching extra environment friendly by parallelising calculations with the framework and making environment friendly use of GPUs.

A very powerful ML frameworks for generative AI:

PyTorch was developed by Meta and is open supply. It is vitally versatile and well-liked in analysis & open supply.
TensorFlow was developed by Google and may be very highly effective for big AI fashions. It helps distributed coaching – rationalization and is usually utilized in cloud environments.
Keras is part of TensorFlow and is especially used for newcomers and prototype improvement.
JAX can be from Google and was specifically developed for high-performance AI calculations. It’s typically used for superior analysis and Google DeepMind tasks. For instance, it’s used for the most recent Google AI fashions comparable to Gemini and Flamingo.

PyTorch and TensorFlow can simply be mixed with different instruments comparable to Hugging Face Transformers or ONNX Runtime.

AI Software Frameworks

These frameworks allow us to combine the Basis Fashions into particular functions. They simplify entry to the Basis Fashions, the administration of prompts and the environment friendly administration of AI-supported workflows.

Three instruments, as examples:

LangChain permits the orchestration of LLMs for functions comparable to chatbots, doc processing and automatic analyses. It helps entry to APIs, databases and exterior storage. And it may be related to vector databases – which I clarify within the subsequent part – to carry out contextual queries.
Let’s take a look at an instance: An organization needs to construct an inside AI assistant that searches by way of paperwork. With LangChain, it may now join GPT-4 to the inner database and the person can search firm paperwork utilizing pure language.
LlamaIndex was particularly designed to make giant quantities of unstructured information effectively accessible to LLMs and is due to this fact essential for Retrieval Augmented Era (RAG). Since LLMs solely have a restricted information base primarily based on the coaching information, it permits RAG to retrieve further info earlier than producing a solution. And that is the place LlamaIndex comes into play: it may be used to transform unstructured information, e.g. from PDFs, web sites or databases, into searchable indices.
Let’s check out a concrete instance:

A lawyer wants a authorized AI assistant to look legal guidelines. LlamaIndex organises hundreds of authorized texts and might due to this fact present exact solutions shortly.
Ollama makes it potential to run giant language fashions by yourself laptop computer or server with out having to depend on the cloud. No API entry is required because the fashions run instantly on the machine.
For instance, you’ll be able to run a mannequin comparable to Mistral, LLaMA 3 or DeepSeek domestically in your machine.

Databases & Vector Shops

In conventional information processing, relational databases (SQL databases) retailer structured information in tables, whereas NoSQL databases comparable to MongoDB or Cassandra are used to retailer unstructured or semi-structured information.

With LLMs, nevertheless, we now additionally want a solution to retailer and search semantic info.

This requires vector databases: A basis mannequin doesn’t course of enter as textual content, however converts it into numerical vectors – so-called embeddings. Vector databases make it potential to carry out quick similarity and reminiscence administration for embeddings and thus present related contextual info.

How does this work, for instance, with Retrieval Augmented Era?

Every textual content (e.g. a paragraph from a PDF) is translated right into a vector.
You go a question to the mannequin as a immediate. For instance, you ask a query. This query is now additionally translated right into a vector.
The database now calculates which vectors are closest to the enter vector.
These high outcomes are made out there to the LLM earlier than it solutions. And the mannequin then makes use of this info moreover for the reply.

Examples of this are Pinecone, FAISS, Weaviate, Milvus, and Qdrant.

Programming Languages

Generative AI improvement additionally wants a programming language.

After all, Python might be the primary selection for nearly all AI functions. Python has established itself as the primary language for AI & ML and is without doubt one of the hottest and extensively used languages. It’s versatile and presents a big AI ecosystem with all of the beforehand talked about frameworks comparable to TensorFlow, PyTorch, LangChain or LlamaIndex.

Why isn’t Python used for every part?

Python isn’t very quick. However because of CUDA backends, TensorFlow or PyTorch are nonetheless very performant. Nevertheless, if efficiency is actually essential, Rust, C++ or Go are extra seemingly for use.

One other language that should be talked about is Rust: This language is used in the case of quick, safe and memory-efficient AI infrastructures. For instance, for environment friendly databases for vector searches or high-performance community communication. It’s primarily used within the infrastructure and deployment space.

Julia is a language that’s near Python, however a lot quicker – this makes it excellent for numerical calculations and tensor operations.

TypeScript or JavaScript will not be instantly related for AI functions however are sometimes used within the entrance finish of LLM functions (e.g., React or Subsequent.js).

Personal visualization — Illustrations from unDraw.co

2 Scaling AI: Infrastructure and Compute Energy

Aside from the core elements, we additionally want methods to scale and prepare the fashions.

Containers & Orchestration

Not solely conventional functions, but additionally AI functions should be offered and scaled. I wrote about containerisation intimately on this article Why Data Scientists Should Care about Containers – and Stand Out with This Knowledge. However at its core, the purpose is that with containers, we are able to run an AI mannequin (or every other utility) on any server and it really works the identical. This permits us to supply constant, transportable and scalable AI workloads.

Docker is the usual for containerisation. Generative AI is not any completely different. We will use it to develop AI functions as remoted, repeatable models. Docker is used to deploy LLMs within the cloud or on edge gadgets. Edge signifies that the AI doesn’t run within the cloud, however domestically in your machine. The Docker photographs include every part you want: Python, ML frameworks comparable to PyTorch, CUDA for GPUs and AI APIs.

Let’s check out an instance: A developer trains a mannequin domestically with PyTorch and saves it as a Docker container. This permits it to be simply deployed to AWS or Google Cloud.

Kubernetes is there to handle and scale container workloads. It will probably handle GPUs as sources. This makes it potential to run a number of fashions effectively on a cluster – and to scale routinely when demand is excessive.

Kubeflow is much less well-known outdoors of the AI world. It permits ML fashions to be orchestrated as a workflow from information processing to deployment. It’s particularly designed for machine studying in manufacturing environments and helps computerized mannequin coaching & hyperparameter coaching.

Chip producers & AI {hardware}

The immense computing energy that’s required should be produced. That is carried out by chip producers. Highly effective {hardware} reduces coaching occasions and improves mannequin inference.

There at the moment are additionally some fashions which were skilled with fewer parameters or fewer sources for a similar efficiency. When DeepSeek was printed on the finish of February, it was considerably questioned what number of sources are literally vital. It’s turning into more and more clear that vast fashions and intensely costly {hardware} will not be all the time vital.

Most likely the best-known chip producer within the subject of AI is Nvidia, one of the useful corporations. With its specialised A100 and H100 GPUs, the corporate has change into the de facto normal for coaching and inferencing giant AI fashions. Along with Nvidia, nevertheless, there are different essential gamers comparable to AMD with its Intuition MI300X sequence, Google, Amazon and Cerebras.

API Suppliers for Basis Fashions

The Basis Fashions are pre-trained fashions. We use APIs in order that we are able to entry them as shortly as potential with out having to host them ourselves. API suppliers provide fast entry to the fashions, comparable to OpenAI API, Hugging Face Inference Endpoints or Google Gemini API. To do that, you ship a textual content through an API and obtain the response again. Nevertheless, APIs such because the OpenAI API are topic to a charge.

The perfect-known supplier is OpenAI, whose API gives entry to GPT-3.5, GPT-4, DALL-E for picture era and Whisper for speech-to-text. Anthropic additionally presents a strong various with Claude 2 and three. Google gives entry to multimodal fashions comparable to Gemini 1.5 through the Gemini API.

Hugging Face is a central hub for open supply fashions: the inference endpoints permit us to instantly deal with Mistral 7B, Mixtral or Meta fashions, for instance.

One other thrilling supplier is Cohere, which gives Command R+, a mannequin particularly for Retrieval Augmented Era (RAG) – together with highly effective embedding APIs.

Serverless AI architectures

Serverless computing doesn’t imply that there isn’t a server however that you do not want your individual server. You solely outline what’s to be executed – not how or the place. The cloud surroundings then routinely begins an occasion, executes the code and shuts the occasion down once more. The AWS Lambda features, for instance, are well-known right here.

One thing related can be out there particularly for AI. Serverless AI reduces the executive effort and scales routinely. That is superb, for instance, for AI duties which are used irregularly.

Let’s check out an instance: A chatbot on a web site that solutions questions from clients doesn’t should run on a regular basis. Nevertheless, when a customer involves the web site and asks a query, it will need to have sources. It’s, due to this fact, solely referred to as up when wanted.

Serverless AI can save prices and scale back complexity. Nevertheless, it isn’t helpful for steady, latency-critical duties.

Examples: AWS Bedrock, Azure OpenAI Service, Google Cloud Vertex AI

With nice energy and functionality comes accountability. The extra we combine AI into our on a regular basis functions, the extra essential it turns into to interact with the ideas of Accountable AI.

So…Generative AI raises many questions:

Does the mannequin clarify the way it arrives at its solutions?
-> Query about Transparency
Are sure teams favoured?
-> Query about Equity
How is it ensured that the mannequin isn’t misused?
-> Query about Safety
Who’s answerable for errors?
-> Query about Accountability
Who controls how and the place AI is used?
-> Query about Governance
Which out there information from the online (e.g. photographs from
artists) could also be used?
-> Query about Copyright / information ethics

Whereas we now have complete laws for a lot of areas of the bodily world — comparable to noise management, gentle air pollution, autos, buildings, and alcohol gross sales — related regulatory efforts within the IT sector are nonetheless uncommon and infrequently averted.

I’m not making a generalisation or a price judgment about whether or not that is good or dangerous. Much less regulation can speed up innovation – new applied sciences attain the market quicker. On the identical time, there’s a threat that essential facets comparable to moral accountability, bias detection or vitality consumption by giant fashions will obtain too little consideration.

With the AI Act, the EU is focusing extra on a regulated strategy that’s supposed to create clear framework situations – however this, in flip, can scale back the velocity of innovation. The USA tends to pursue a market-driven, liberal strategy with voluntary tips. This promotes fast improvement however typically leaves moral and social points within the background.

Let’s check out three ideas:

Explainability

Many giant LLMs comparable to GPT-4 or Claude 3 are thought-about so-called black packing containers: they supply spectacular solutions, however we have no idea precisely how they arrive at these outcomes. The extra we entrust them with – particularly in delicate areas comparable to schooling, medication or justice – the extra essential it turns into to grasp their decision-making processes.

Instruments comparable to LIME, SHAP or Consideration Maps are methods of minimising these issues. They analyse mannequin choices and current them visually. As well as, model cards (standardised documentation) assist to make the capabilities, coaching information, limitations and potential dangers of a mannequin clear.

Equity

If a mannequin has been skilled with information that incorporates biases or biased representations, it’s going to additionally inherit these biases and distortions. This could result in sure inhabitants teams being systematically deprived or stereotyped. There are strategies for recognising bias and clear requirements for a way coaching information must be chosen and examined.

Governance

Lastly, the query of governance arises: Who truly determines how AI could also be used? Who checks whether or not a mannequin is being operated responsibly?

4 Rising Talents: When AI Begins to Work together and Act

That is concerning the new capabilities that transcend the basic prompt-response mannequin. AI is turning into extra lively, extra dynamic and extra autonomous.

Let’s check out a concrete instance:

A basic LLM like GPT-3 follows the everyday course of: For instance, you ask a query like ‘Please present me methods to create a button with rounded corners utilizing HTML & CSS’. The mannequin then gives you with the suitable code, together with a quick rationalization. The mannequin returns a pure textual content output with out the mannequin actively executing or considering something additional.

Screenshot taken by the writer: The reply from ChatGPT if we ask for creating buttons with rounded corners.

AI brokers go a lot additional. They not solely analyse the immediate but additionally develop plans independently, entry exterior instruments or APIs and might full duties in a number of steps.

A easy instance:

As an alternative of simply writing the template for an e-mail, an agent can monitor a knowledge supply and independently ship an e-mail as quickly as a sure occasion happens. For instance, an e-mail may exit when a gross sales goal has been exceeded.

AI brokers

AI brokers are an utility logic primarily based on the Foundation Models. They orchestrate choices and execute steps independently. Brokers comparable to AutoGPT perform multi-step duties independently. They suppose in loops and attempt to enhance or obtain a aim step-by-step.

Some examples:

Your AI agent analyzes new market studies day by day, summarizes them, shops them in a database, and notifies the person in case of deviations.
An agent initiates a job utility course of: It scans submitted profiles and matches them with job presents.
In an e-commerce store, the agent displays stock ranges and buyer demand. If a product is working low, it routinely reorders it – together with worth comparisons between suppliers.

What usually makes up an AI agent?

An AI agent consists of a number of specialised elements, making it potential to autonomously plan, execute, and study duties:

Massive Language Mannequin
The LLM is the core or considering engine. Typical fashions embrace GPT-4, Claude 3, Gemini 1.5, or Mistral 7B.
Planning unit
The planner transforms a higher-level aim right into a concrete plan or sequence of steps. Usually primarily based on strategies like Chain-of-Thought or ReAct.
Device entry
This part permits the agent to make use of exterior instruments. For instance, utilizing a browser for prolonged search, a Python surroundings for code execution or enabling entry to APIs and databases.
Reminiscence
This part shops details about earlier interactions, intermediate outcomes, or contextual information. That is vital in order that the agent can act persistently throughout a number of steps.
Executor
This part executes the deliberate steps within the appropriate order, displays progress, and replans in case of errors.

There are additionally instruments like Make or n8n (low-code / no-code automation platforms), which additionally allow you to implement “agent-like” logic. They execute workflows with situations, triggers, and actions. For instance, an automatic reply must be formulated when a brand new e-mail arrives within the inbox. And there are a whole lot of templates for such use circumstances.

Screenshot taken by the writer: Templates on n8n for example for low-code or no-code platforms.

Reinforcement Studying

With reinforcement studying, the fashions are made extra “human-friendly.” On this coaching technique, the mannequin learns by way of reward. That is particularly essential for duties the place there isn’t a clear “proper” or “unsuitable,” however somewhat gradual high quality.

An instance of that is if you use ChatGPT, obtain two completely different responses and are requested to charge which one you favor.

The reward can come both from human suggestions (Reinforcement Studying from Human Suggestions – RLHF) or from one other mannequin (Reinforcement Studying from AI Suggestions – RLVR). In RLHF, a human charges a number of responses from a mannequin, permitting the LLM to study what “good” responses appear to be and higher align with human expectations. In RLVR, the mannequin doesn’t simply obtain binary suggestions (e.g., good vs. dangerous) however differentiated, context-dependent rewards (e.g., a variable reward scale from -1 to +3). RLVR is very helpful the place there are various potential “good” responses, however some match the person’s intent significantly better.

On my Substack, I repeatedly write summaries concerning the printed articles within the fields of Tech, Python, Knowledge Science, Machine Learning and AI. When you’re , have a look or subscribe.

Remaining Ideas

It will in all probability be potential to jot down a complete e-book about Generative Ai proper now – not only a single article. Synthetic intelligence has been researched and utilized for a few years. However we’re at the moment in a second the place an explosion of instruments, functions, and frameworks is occurring – AI, and particularly generative AI, has actually arrived in our on a regular basis lives. Let’s see the place this takes us and finish with a quote from Alan Kay:

One of the simplest ways to foretell the longer term is to invent it.

The place Can You Proceed Studying?

Source link

Agentic AI 102: Guardrails and Agent Evaluation

The Automation Trap: Why Low-Code AI Models Fail When You Scale

How to Set the Number of Trees in Random Forest

Jeff Bezos’ Blue Origin Is Laying Off 10% of Its Workforce

The AI Revolution in Development: 11 Game-Changing Tools You Need to Try | by Madhavsingh | Mar, 2025

Text-To-Image using Diffusion model with AWS Sagemaker Distributed Training | by Aniketp | Mar, 2025

From RGB to HSV — and Back Again

How AI is Revolutionizing Data Visualization for Businesses | by Emmanuel Otaesiri | Mar, 2025

Most Popular

Starbucks Introduces a Strict New Dress Code for Baristas

CPI Report: Inflation Rose in January. Will the Fed Cut Rates?

CEOs Get Paid Too Much, According to Pretty Much Everyone in the World | by Bhajan Bishnoi | Feb, 2025

Our Picks

More Jobs Were Added in April Than Expected: Report

Real-Time Data Processing with ML: Challenges and Fixes

AI and Machine Learning Shaping the Future of Sales in 2025 | by Yoroflow™ | Mar, 2025