How to Deploy a Stable Diffusion API Endpoint Model using OctoML

by Tobias Abdon


Generative AI image generators, like those from Stability AI, have captured with world’s imagination. With a simple prompt, you can create incredible images for a variety of use cases.

If you’re an application developer and want to add this kind of functionality to an app, you have a couple of options. You could use a web service API like those from Stability AI. Or, if you want more control, flexibility, and lower cost, you can run your own model. This tutorial is for anyone wanting to do the later.

In this tutorial you’ll learn how to deploy your own Stable Diffusion model endpoint using OctoML.

What is OctoML?

OctoML is a Seattle based startup that makes it easier to deploy and optimize machine learning (ML) models. Their web console makes it possible to deploy models like Llama 2, Stable Diffusion, and many others in mere minutes. After the model is deployed, you are provided with a URL that can be used to add AI functionality to any application. OctoML manages the model infrastructure, optimization, and scaling so you can focus on building an awesome application.

Let’s get started!

Step 1 - Create an Account & Token

First, you’ll need an OctoML account and an token. Go to the signup page ( and create an account.

Then, go to the settings page, enter in ‘demo’ in the token name input, and click Generate.


A pop-up will open that will let you copy your access token. Copy that token somewhere secure on your computer.


Step 2 - Find and Deploy the Stable Diffusion Model

Next, we will find and deploy the stable diffusion model. In this tutorial we will use Automatic1111’s web UI example. This is a “A browser interface based on Gradio library for Stable Diffusion.” You can check out their repo for more details.

Go to the page and locate the ‘Automatic1111 Web UI’ example project. Click on the Deploy button.


Next, you configure the settings. Give the endpoint a friendly name. The Medium hardware plan will be selected by default. The min and max replicas section is used for autoscaling. I left this configured at the defaults. Click on Deploy.


You’ll be redirected to the endpoint’s detail page, such as You’ll have to wait for the instance too startup.

Step 3 - Access the Model Endpoint

On the model endpoint page, take note of the Status (e.g. Running) and the endpoint URL. When the Status is Running, copy the URL (my endpoint URL is and open this URL in a new page.


You will be prompted for a username and password. Enter as follows:

Username: (leave blank)

Password: Copy in the token you created earlier in this tutorial

Screenshot 2023-11-02 at 8.02.26 AM.png

You should now have access to this page. If it doesn’t load, then your model instance may not be ready yet. Try to refresh this page if it is blank.

Screenshot 2023-11-02 at 8.04.18 AM.png

Step 4 - Enter a Prompt

Now you can generate images! Simply enter a prompt in the input box, click Generate, and wait for the results.


You can adjust the settings to change the outputs.

Step 5 - Shutdown the Instance

At the time of this writing, OctoML provided $10 in credits. To save these for other projects, you can now shutdown your instance.

Go to the endpoint page, and click on the Edit endpoint button.


Then, click the Delete Endpoint button.



In wrapping up this tutorial, we've explored the simplicity and efficiency of deploying machine learning models with OctoML. You've learned how to create an account, generate a token, and deploy a Stable Diffusion model as an API endpoint, which can be integrated into various applications to enhance their functionality. By following these steps, you can harness the power of AI in your projects, opening up a world of possibilities. OctoML's intuitive platform supports your journey in machine learning, ensuring that your projects remain cutting-edge. Remember to manage your resources wisely and shut down instances when they're not in use to conserve credits.

If this tutorial has sparked your interest and you're eager for more insights into AI deployment and machine learning, consider joining the community at for more informative content.