How to Chat with PDFs, Websites, and More using EmbedChain

by Tobias Abdon

“Chat with X” is all the rage this year. Chatting with a PDF, or website, is often a much better user experience compared to reading through it to find what you need.

There are a lot of different tools you can use to build the “chat with x” feature. They vary on features, complexity, etc.

There is one that stands out in the ‘easy to use’ and ‘powerful’ category: EmbedChain. This framework handles the complexity of chunking, embedding, and storing content from sources such as websites, docs, PDFs, and databases. It then makes it easy to chat with those sources via LLMs like GPT-4.

Instead of building a web app that demonstrates EmbedChain, I thought it’d be fun to build a little Python CLI. By the end of this guide, you will have a functioning CLI tool that interacts with any given webpage and fetches answers to your questions.

Prerequisites

Before diving into the setup, ensure you have the following:

  • An API key from OpenAI. If you haven't obtained it yet, you can sign up here.

Step 1: Setting Up the Project

To begin, you'll need to set up a virtual environment and install the required packages. Let's break down the purpose of each command.

  1. Setting up a Virtual Environment: A virtual environment is a tool that helps prevent conflicts between versions by creating isolated environments for Python projects.
  2. Activating the Virtual Environment: This step ensures that the packages are installed within this environment and not system-wide.
  3. Installing Required Packages: embedchain and langchain are the core packages required for our application. The embedchain[dataloaders] is an additional requirement that provides extended functionality.

To set up the project:

# Create a virtual environment
python3.10 -m venv venv

# Activate the virtual environment
# For Linux/Mac:
source venv/bin/activate
# For Windows:
venv\Scripts\activate

# Install the necessary packages
pip install embedchain langchain

# Upgrade embedchain with the dataloaders extension
pip install --upgrade "embedchain[dataloaders]"

Step 2: Create the main.py File

Next, you'll need to set up the main codebase for our CLI tool.

  1. In your working directory, create a file named main.py.
  2. Copy and paste the following code into main.py.
import os
from embedchain import App

OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]

def main():
    # Ask for the URL to chat with
    url = input("Enter the URL to chat with (e.g., LinkedIn profile): ")

    app = App()
    app.add(url)

    # Engage in a conversation with the provided URL.
    while True:
        question = input("Ask your question (or type 'exit' to quit): ")
        if question.lower() == 'exit':
            break
        response = app.query(question)
        print(response)

if __name__ == "__main__":
    main()

Code Summary:

  • The code starts by importing the necessary modules and fetching your OpenAI API key from the environment.
  • We define a main() function that serves as the core of our CLI tool.
    • We prompt the user to input a URL.
    • We initialize the App class from embedchain and add the provided URL to it.
    • We then enter a loop where the user can ask questions related to the URL, and our application fetches and prints the answers. The loop continues until the user types 'exit'.

Step 3: Test the Application Out

With our codebase set up, it's time to test the application. You’ll need your OpenAI key, and a website URL

# set your key as an environment variable
$ export OPENAI_API_KEY={your_key}

# start the CLI app
$ python main.py

# add a link and chat 
Enter the URL to chat with (e.g., LinkedIn profile): https://www.linkedin.com/in/johndoe/
Ask your question (or type 'exit' to quit): Where does John Doe currently work?
Response: John Doe currently works at TechCorp Inc.
Ask your question (or type 'exit' to quit): exit

And that's it! You now have a fully functional CLI tool powered by EmbedChain, enabling you to converse with any webpage. Happy chatting!