Building Flask APIs for Retrieval Augmented Generation (RAG), LangChain, and GPT-4

by Tobias Abdon

In this article you’ll learn how to build a Retrieval Augmented Generation (RAG) app using Flask and LangChain.

You’ll learn how to do this using Python and Flask. There will be two API endpoints: one for uploading the source file which will be saved in Pinecone vector DB, and the second for performing chat operations. Users of the API will be able to upload a doc, have it processed into embeddings and stored in a vector DB, and then start chatting with it.

Let’s get started.

Get Your OpenAI Key

We will use OpenAI’s Embeddings API and GPT-4 API. This requires an OpenAI key. If you don’t have an account, go create one at Then go to to get your API key (create an account if you don’t have one).

I recommend creating a new API to complete this project, and then deleting it when you are done.

Screenshot 2023-09-12 at 8.04.35 AM.png

Copy the key somewhere safe for later use.

Note: in a production environment it’s best practice to use environment variables for keys, vs storing them in code like is done above.

Configure Pinecone DB

We will store the text embeddings in Pinecone vector database. You can create a free account at

First, go to and create an account. If prompted, make sure to select that you’re using the Python programming language.

When prompted with this screen, click Create Index:

Screenshot 2023-09-17 at 4.34.13 PM.png

On the Create a new Index screen, fill in these details:

  • Name: embed-project
  • Dimensions: 1536
  • Metric: cosine

Screenshot 2023-09-17 at 4.36.13 PM.png

Then, click on the API Keys section on the left nav:

Screenshot 2023-09-17 at 4.37.41 PM.png

Setting up a Flask Project

At a terminal, follow these steps in a working directory for this project.

This project requires that you have Python 3.10 or higher. If you don’t have that version, please go install it now.

# setup a virtual environment
$ python3.10 -m venv venv

# activate the virtual environment
$ source venv/bin/activate  # On Windows use `venv\\Scripts\\activate`

# install requirements
$ pip install Flask Flask-Uploads openai pinecone-client langchain pypdf tiktoken

# create the file
$ touch

Define Constants

There are several constants, such as the OpenAI and Pinecone DB keys, we’ll need to use throughout our project. We’ll setup a Config class to do so.

Open the file for editing and add the code below.

class Config:
    OPENAI_KEY = 'your_openai_key'
    PINECONE_API_KEY = 'your_pinecone_api_key'
    PINECONE_API_ENV = 'your_pinecone_environment'
    UPLOAD_FOLDER = 'media'
    ALLOWED_EXTENSIONS = {'pdf'}  # Allow only PDFs


Ensure to manage secret keys (like API keys) securely, preferably through environment variables, especially in a production environment. Also, error handling and data validation should be robust, providing helpful error messages and ensuring data integrity and security.

Create Endpoints

Now it’s time to build our APIs. To recap, we’re going to build two APIs as follwos:

  • /embeddings - This API will be able to ingest PDF documents and process them. That process involves chunking the file, creating embeddings, and uploading to Pinecone.
  • /chat - This API can accept incoming chat requests, create embeddings of it, query Pinecone for similarity matches, and then send to GPT-4 for reply generation.

Let’s go ahead and build these now. Open the file for editing and create the two endpoints.

Endpoint 1: /embeddings

import os
import json
from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename

import pinecone
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain

app = Flask(__name__)

def allowed_file(filename):
    return '.' in filename and filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']

@app.route('/embeddings', methods=['POST'])
def create_embeddings():
    # Check if file exists and is allowed
    if 'file' not in request.files:
        return jsonify(message='No file part'), 400
    file = request.files['file']
    if file.filename == '':
        return jsonify(message='No selected file'), 400
    if file and allowed_file(file.filename):
        filename = secure_filename(file.filename)['UPLOAD_FOLDER'], filename))

                # load the file
            loader = PyPDFLoader(f"{BASE_DIR}{file_url}")
            data = loader.load()

            # split into chunks
            text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
            texts = text_splitter.split_documents(data)

            # set up the embeddings object
            openai_key = app.config['OPENAI_KEY']
            embeddings = OpenAIEmbeddings(openai_api_key=openai_key)

            # initialize and upload embeddings to Pinecone
            index_name = "resume" # replace with your index name

            # upload to our pinecone index
            Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

        return jsonify(message='File uploaded')
    return jsonify(message='Allowed file types are ' + ', '.join(app.config['ALLOWED_EXTENSIONS'])), 400

Endpoint 2: /chat

# ... (previous code)

@app.route('/chat', methods=['POST'])
def create_chat():
    payload = request.get_json()

    if 'message' not in payload:
        return jsonify(message='No message provided'), 400

        openai_key = app.config['OPENAI_KEY']
    embeddings = OpenAIEmbeddings(openai_api_key=openai_key)
    index_name = "resume"

    docsearch = Pinecone.from_existing_index(index_name=index_name, embedding=embeddings)

    llm = OpenAI(temperature=0, openai_api_key=settings.OPENAI_KEY)
    chain = load_qa_chain(llm, chain_type="stuff")

    query = payload["message"]
    docs = docsearch.similarity_search(query)
    response =, question=query)

    return jsonify(message=json.dumps(response))

if __name__ == '__main__':

Testing the APIs

Now we’re ready to test this solution. Complete the following steps to do so.

Open a terminal and navigate to your project.

# activate the virtual environment
$ source venv/bin/activate  # On Windows use `venv\\Scripts\\activate`

# run flask
$ flask run

Next, upload a PDF to the /embeddings endpoint using the following cURL command. To make it easy, you can copy the file to the project directory.

$ curl -X POST -F "file=@path_to_your_file/your_file.pdf" http://localhost:5000/embeddings

{"message":"File uploaded"}


  • -X POST: This specifies that you want to make a POST request.
  • -F "file=@path_to_your_file/your_file.pdf": This formulates a POST request with form data, where file is the name of the field and @path_to_your_file/your_file.pdf is the file you want to upload. Replace path_to_your_file/your_file.pdf with the actual path and name of the file you want to upload.
  • http://localhost:5000/embeddings: This is the URL to which you want to send the request. Make sure to replace localhost and 5000 with your actual server name and port if different.

Next, submit a prompt to chat with your doc using the following cURL command. Change the message to be relevant to the document you’re working with.

curl -X POST -H "Content-Type: application/json" -d '{"message":"Your message here"}' http://localhost:5000/chat


  • -X POST: Specifies that a POST request should be used.
  • -H "Content-Type: application/json": Sets the Content-Type header to application/json, indicating that you're sending JSON data.
  • -d '{"message":"Your message here"}': The d flag sends the specified data in the POST request. Ensure that your JSON is correctly formatted and properly escaped if needed.
  • http://localhost:5000/chat: This is the URL of the /chat endpoint on your Flask app. Replace localhost and 5000 with your actual server name and port if different.