Building Flask APIs for Retrieval Augmented Generation (RAG), LangChain, and GPT-4
by Tobias Abdon
In this article you’ll learn how to build a Retrieval Augmented Generation (RAG) app using Flask and LangChain.
You’ll learn how to do this using Python and Flask. There will be two API endpoints: one for uploading the source file which will be saved in Pinecone vector DB, and the second for performing chat operations. Users of the API will be able to upload a doc, have it processed into embeddings and stored in a vector DB, and then start chatting with it.
Let’s get started.
Get Your OpenAI Key
We will use OpenAI’s Embeddings API and GPT-4 API. This requires an OpenAI key. If you don’t have an account, go create one at openai.com. Then go to https://platform.openai.com/account/api-keys to get your API key (create an account if you don’t have one).
I recommend creating a new API to complete this project, and then deleting it when you are done.
Copy the key somewhere safe for later use.
Note: in a production environment it’s best practice to use environment variables for keys, vs storing them in code like is done above.
Configure Pinecone DB
We will store the text embeddings in Pinecone vector database. You can create a free account at pinecone.io.
First, go to pinecone.io and create an account. If prompted, make sure to select that you’re using the Python programming language.
When prompted with this screen, click Create Index:
On the Create a new Index screen, fill in these details:
- Name: embed-project
- Dimensions: 1536
- Metric: cosine
Then, click on the API Keys section on the left nav:
Setting up a Flask Project
At a terminal, follow these steps in a working directory for this project.
This project requires that you have Python 3.10 or higher. If you don’t have that version, please go install it now.
# setup a virtual environment
$ python3.10 -m venv venv
# activate the virtual environment
$ source venv/bin/activate # On Windows use `venv\\Scripts\\activate`
# install requirements
$ pip install Flask Flask-Uploads openai pinecone-client langchain pypdf tiktoken
# create the app.py file
$ touch app.py config.py
Define Constants
There are several constants, such as the OpenAI and Pinecone DB keys, we’ll need to use throughout our project. We’ll setup a Config class to do so.
Open the config.py
file for editing and add the code below.
class Config:
OPENAI_KEY = 'your_openai_key'
PINECONE_API_KEY = 'your_pinecone_api_key'
PINECONE_API_ENV = 'your_pinecone_environment'
UPLOAD_FOLDER = 'media'
ALLOWED_EXTENSIONS = {'pdf'} # Allow only PDFs
Note:
Ensure to manage secret keys (like API keys) securely, preferably through environment variables, especially in a production environment. Also, error handling and data validation should be robust, providing helpful error messages and ensuring data integrity and security.
Create Endpoints
Now it’s time to build our APIs. To recap, we’re going to build two APIs as follwos:
/embeddings
- This API will be able to ingest PDF documents and process them. That process involves chunking the file, creating embeddings, and uploading to Pinecone./chat
- This API can accept incoming chat requests, create embeddings of it, query Pinecone for similarity matches, and then send to GPT-4 for reply generation.
Let’s go ahead and build these now. Open the app.py
file for editing and create the two endpoints.
Endpoint 1: /embeddings
import os
import json
from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import pinecone
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
app = Flask(__name__)
app.config.from_object('config.Config')
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']
@app.route('/embeddings', methods=['POST'])
def create_embeddings():
# Check if file exists and is allowed
if 'file' not in request.files:
return jsonify(message='No file part'), 400
file = request.files['file']
if file.filename == '':
return jsonify(message='No selected file'), 400
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
# load the file
loader = PyPDFLoader(f"{BASE_DIR}{file_url}")
data = loader.load()
# split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(data)
# set up the embeddings object
openai_key = app.config['OPENAI_KEY']
embeddings = OpenAIEmbeddings(openai_api_key=openai_key)
# initialize and upload embeddings to Pinecone
pinecone.init(
api_key=app.config['PINECONE_API_KEY'],
environment=app.config['PINECONE_API_ENV']
)
index_name = "resume" # replace with your index name
# upload to our pinecone index
Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
return jsonify(message='File uploaded')
return jsonify(message='Allowed file types are ' + ', '.join(app.config['ALLOWED_EXTENSIONS'])), 400
Endpoint 2: /chat
# ... (previous code)
@app.route('/chat', methods=['POST'])
def create_chat():
payload = request.get_json()
if 'message' not in payload:
return jsonify(message='No message provided'), 400
openai_key = app.config['OPENAI_KEY']
embeddings = OpenAIEmbeddings(openai_api_key=openai_key)
pinecone.init(
api_key=app.config['PINECONE_API_KEY'],
environment=app.config['PINECONE_API_ENV']
)
index_name = "resume"
docsearch = Pinecone.from_existing_index(index_name=index_name, embedding=embeddings)
llm = OpenAI(temperature=0, openai_api_key=settings.OPENAI_KEY)
chain = load_qa_chain(llm, chain_type="stuff")
query = payload["message"]
docs = docsearch.similarity_search(query)
response = chain.run(input_documents=docs, question=query)
return jsonify(message=json.dumps(response))
if __name__ == '__main__':
app.run(debug=True)
Testing the APIs
Now we’re ready to test this solution. Complete the following steps to do so.
Open a terminal and navigate to your project.
# activate the virtual environment
$ source venv/bin/activate # On Windows use `venv\\Scripts\\activate`
# run flask
$ flask run
Next, upload a PDF to the /embeddings endpoint using the following cURL command. To make it easy, you can copy the file to the project directory.
$ curl -X POST -F "file=@path_to_your_file/your_file.pdf" http://localhost:5000/embeddings
{"message":"File uploaded"}
Explanation:
-X POST
: This specifies that you want to make a POST request.-F "file=@path_to_your_file/your_file.pdf"
: This formulates a POST request with form data, wherefile
is the name of the field and@path_to_your_file/your_file.pdf
is the file you want to upload. Replacepath_to_your_file/your_file.pdf
with the actual path and name of the file you want to upload.http://localhost:5000/embeddings
: This is the URL to which you want to send the request. Make sure to replacelocalhost
and5000
with your actual server name and port if different.
Next, submit a prompt to chat with your doc using the following cURL command. Change the message to be relevant to the document you’re working with.
curl -X POST -H "Content-Type: application/json" -d '{"message":"Your message here"}' http://localhost:5000/chat
Explanation:
-X POST
: Specifies that a POST request should be used.-H "Content-Type: application/json"
: Sets theContent-Type
header toapplication/json
, indicating that you're sending JSON data.-d '{"message":"Your message here"}'
: Thed
flag sends the specified data in the POST request. Ensure that your JSON is correctly formatted and properly escaped if needed.http://localhost:5000/chat
: This is the URL of the/chat
endpoint on your Flask app. Replacelocalhost
and5000
with your actual server name and port if different.