Mastering ChatGPT: From Zero to Hero (3/3)

Vikas K Solegaonkar
4 min readMay 30, 2023

As machines push their way into the domain of creative work, it is obvious that the days of the human workforce are numbered. This prediction has led to fear and panic in most of the industry. This series of blogs will help you build your skills — not just to retain your job — but to help you rule the new era of Generative LLMs.

Topics covered

This blog starts with a detailed theoretical introduction, then jumps into practical implementation and code examples. We will cover the following topics in the blog series.

  1. Introduction to ChatGPT, LLM, and Prompt Engineering
  2. Using Open AI API in your apps. Host your own Chatbot on AWS
  3. Host your own LLM on AWS, Amazon Bedrock

I am sure you are excited to continue the journey to the next step.

Hosting an LLM on AWS

ChatGPT and OpenAI are great. However, we don’t have to depend on them. We can deploy our own deployment LLM on AWS. It is not as difficult as it sounds. In fact, it is quite simple when we work with Sagemaker Studio.

Hugging face is a large repository of open-source models that we can use in our applications. Their license terms are a bit stringent. So make sure you understand it well before making a commercial product out of them.

Once the infrastructure is setup, the code is quite simple:

from transformers import T5ForConditionalGeneration, AutoTokenizer
import torch
import os

def model_fn(model_dir):
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl",
load_in_8bit=True, device_map="auto", cache_dir="/tmp/model_cache/")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xxl")

return model, tokenizer

def predict_fn(data, model_and_tokenizer):
model, tokenizer = model_and_tokenizer
text = data.pop("inputs", data)
inputs = tokenizer(text, return_tensors="pt")"cuda")
outputs = model.generate(inputs, **data)

return tokenizer.decode(outputs[0], skip_special_tokens=True)

We can deploy the inference script in a single command