Chat With YouTube Videos Using LlamaIndex

In this article, we are using the LlamaIndex to chat with YouTube videos.

LlamaIndex is a data framework for Large Language Models (LLMs) based applications that allow users to ingest data from various sources such as APIs, databases, PDFs, and more via flexible data connectors.

This data is indexed into intermediate representations, which are numerical vectors or embeddings that capture its semantic meaning.

Chat With YouTube Videos Using LlamaIndex
Source: Deeplearning AI

LlamaIndex uses Retrieval Augmented Generation (RAG) systems that combine large language models with vector indexes during the indexing stage to create a searchable knowledge base specific to the user’s domain.

LlamaIndex provides tools for both beginner users and advanced users, and its high-level API allows beginner users to use LlamaIndex to ingest and query their data.

Introduction to LlamaIndex

  • LlamaIndex is a data framework designed for LLM-based applications.
  • It enables the ingestion, structuring, and accessing of private or domain-specific data.
  • The framework is available in Python and Typescript.

Purpose and Benefits of LlamaIndex

  • LlamaIndex addresses the limitation of LLMs not being trained on specific, private data.
  • It connects to various data sources and integrates this data with existing LLM datasets.
  • The concept of Retrieval-Augmented Generation (RAG) is utilized for querying and generating insights from data.

Features of LlamaIndex

  • Data connectors for ingesting data from diverse sources like APIs, PDFs, and SQL databases.
  • Data indexes for structuring data in formats easily consumed by LLMs.
  • Query engines for knowledge-augmented output and chat engines for conversational interactions.
  • Data agents, which are LLM-powered tools for various functions and integrations.
  • Application integrations for incorporating LlamaIndex into broader ecosystems.

Now Let’s get started with our project.

Overview

This is a simple overview of what we are going to do.

  1. We will first load/fetch the data which is YouTube video transcripts using Langchain’s YoutubeLoader.
  2. The next step is to save the data in a file. (also some cleaning and formatting can be done)
  3. Loading Documents: First, you need to load the documents you want to interact with. This can be done using the SimpleDirectoryReader class in Llama Index, where you provide the path to the folder containing your documents. This class will automatically determine the appropriate loader for different types of documents (e.g., text, Word documents, PDFs).
  4. Creating a Vector Store: Next, you divide your documents into smaller chunks and compute embeddings for each chunk. These embeddings are numerical representations of the text in each chunk. The VectorStoreIndex class in the Llama Index is used for this purpose. It will store both the embeddings and the chunks.
  5. Setting Up a Query Engine: To interact with your documents, you set up a query engine using the embeddings and chunks stored in the vector store. In Llama Index, this can be done by creating a query engine on the index using the asQueryEngine method. If you want the chatbot to remember previous interactions, you can use the asChatbot function instead.
  6. Querying and Generating Responses: To chat with the documents, you pass a question to the query engine, which computes embeddings for the question and performs a semantic search on your knowledge base. The search returns relevant chunks from the documents, which are then used as context for a Large Language Model (LLM) like GPT-3 or Google’s Palm. The LLM generates a response based on the question and the provided context.
  7. In summary, the process involves loading and preparing your documents, creating a vector store for embeddings and chunks, setting up a query engine, and using an LLM to generate responses based on the semantic search results from your document chunks.

Setting Up the Environment

Before diving into the LLaMa Index, it’s crucial to set up the environment correctly. This involves installing necessary Python packages that facilitate the process:

!pip install langchain
!pip install pytube
!pip install youtube-transcript-api

These commands install LangChain for language processing, PyTube for downloading YouTube content, and the YouTube Transcript API for accessing video transcripts.

Loading YouTube Video Data

The next step involves loading YouTube video data. We start by using the YoutubeLoader from the LangChain library, which allows for the extraction of content from YouTube URLs:

from langchain.document_loaders import YoutubeLoader

loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=LrtHhKhUVfo", add_video_info=True)
docs = loader.load()

with open('docs.txt', 'w') as file:
    for doc in docs:
        file.write(str(doc) + '\n')

This code retrieves data from a specified YouTube video and saves it into a text file, docs.txt.

Youtube video for chatting

I am using this video from In-Depth Story YouTube channel for the data.

Integrating the LLaMa Index

Now comes the core part of our process: integrating the LLaMa Index:

!pip install llama_index

from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=["./docs.txt"]).load_data()

This code snippet installs the LLaMa Index package and uses the SimpleDirectoryReader to load the YouTube video data we previously saved.

Processing and Indexing the Data

Once the data is loaded, it’s time to process and index it for querying:

from llama_index import Document, VectorStoreIndex, ServiceContext
from llama_index.llms import OpenAI

# Creating a Document object
document = Document(text="\n\n".join([doc.text for doc in documents]))

# Setting up the service context and index
os.environ['OPENAI_API_KEY'] = "your-openai-api-key"
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5") 
index = VectorStoreIndex.from_documents([document], service_context=service_context)

# Creating a query engine
query_engine = index.as_query_engine()

This code creates a Document object from our loaded data, sets up a service context with OpenAI’s GPT-3.5 model, and prepares an index for querying.

Here we are using local:BAAI/bge-small-en-v1.5 model to create embeddings instead of default OPENAI embeddings. This can save us the cost.

You can find more about embeddings on llaMa Index documentation. It is also possible to use an opensource model but this requires more memory to run. You can read more about this on LLaMa Index Documentation on LLM

Interacting with the Data

Finally, we can query the indexed data to interact with the YouTube video content:

response = query_engine.query("What did Mahendra do?")
print(str(response))

This query and response mechanism enables users to ask questions directly related to the YouTube video content, and receive AI-generated responses based on the data indexed.

Mahendra played a significant role in establishing Nepal as a member country of the United Nations. He also built strong relations with powerful nations such as the United States, Russia, Japan, and Britain. During his tenure, he established various industries, including the Janakpur Churot Factory, Himal Cement Factory, and Nepal Oil Corporation. Additionally, he implemented reforms in the education sector, including the establishment of Tribhuvan University and several engineering campuses. Mahendra also initiated the construction of important infrastructure projects such as the Mahendra Highway, Araniko Highway, and Prithivi Highway. However, it is worth noting that some of his actions, such as dissolving the multiparty democracy system and propagandizing radios, were controversial and faced criticism.

Extracting YouTube Channel Data

Additionally, the notebook provides a way to extract data from an entire YouTube channel:

import requests
import xml.etree.ElementTree as ET

# Fetching YouTube channel data
URL = "https://www.youtube.com/feeds/videos.xml?channel_id=UCBJycsmduvYEL83R_U4JriQ"
response = requests.get(URL)
xml_data = response.content

# Parsing the XML data
root = ET.fromstring(xml_data)
namespaces = {"atom": "http://www.w3.org/2005/Atom", "media": "http://search.yahoo.com/mrss/"}
youtube_links = [link.get("href") for link in root.findall(".//atom:link[@rel='alternate']", namespaces)][1:]

This snippet fetches and parses the XML feed of a YouTube channel, extracting the links to all its videos. I got this code to parse youtube link from this medium article.

from langchain.document_loaders import YoutubeLoader

all_docs = []
for link in youtube_links:
    # Retrieve captions
    loader = YoutubeLoader.from_youtube_url(link)
    docs = loader.load()
    all_docs.extend(docs)

Here I used MKBHD YouTube videos as the data source and chatted with it.

chatting with youtube videos

Here’s my Google Collab file: Google Collab File

Conclusion

In the same way, we can chat with our private data using LlamaIndex and LLMs.