How to Avoid Parser Errors for LLM-Based Applications?

Bowen Yang
10 min readMay 29, 2024

--

Prologue

As the capabilities of large language models (LLMs) continue to expand, their applications have become increasingly diverse and prevalent. If you’ve ever developed applications based on LLMs, you might have experienced frequent system crashes. These crashes often occur because LLMs fail to adhere to specified output format instructions, leading to pipeline disruptions. In this post, I will share some key insights on how to prevent such issues.

Typical Parser Error Caused By LLM MisFormating

In this post, you will learn about:
1. Why do parser errors occur?
2. How can you avoid parser errors?
3. An easy example of how to create a simple crashless LLM application

Most Common Reason For LLM System Crashes

From what I’ve seen, the most common reason an LLM system crashes is parser error.

When developing applications with LLMs, just like with traditional machine learning models, we basically treat the LLM as a black box. You give it some prompts, and it spits out text that you use for your app. Because LLMs always output text, you often need to build string parsers to really make use of that text. You’d have to tell the LLM exactly how you want the information to come out, and then you’d need to create a parser to break down that block of text into usable pieces.

Take an example where you want the LLM to pull out info from someone’s LinkedIn page — like their name, education, and job history. You will need to first instruct the LLM to output that info in a specific format. In this case, you can tell LLM to follow the following JSON format:

{"name":<name>, "education":<education>, "job history":<job history>}

and then create a JSON parser that reads in output from LLM and parses those into dictionaries.

Despite LLM showing impressive performance in completing different tasks, their reasoning skill remains unadvanced which will sometimes fail to follow guidelines humans provide. As parsers are always static and would require the output to follow some format, the system will crash if LLM fails to follow the output format illustrated in the prompt. For example, if you expect LLM to output a JSON string like

{"Key1":<value1>, "Key2":<value2>}

an output like:

{'Key1':<value1>, 'Key2':<value2>}

will cause a normal JSON parser to fail as a double quote is required for a string value in JSON.

How to avoid Parser Error

Simplifying Output format

The most useful core truth about developing LLM-based applications is that LLMs will perform better when the proposed task is simpler. In the case of solving the parsing errors, asking LLMs to format their output into a specific structure is a complex task itself and prone to errors. It would be ideal if we could completely avoid parsing at all.

However, there are cases where parsing is quite necessary. Like when you’re generating structured data such as code, LLMs can include extra, unwanted information. Using a parser to clean this up ensures your final output is tidy and useable. Also, in the chain of thought reasoning, where the LLM lays out its thought process but you only need the final result, a parser helps by extracting just the essential bits, keeping everything streamlined and to the point. In these cases, it will be important for you to simplify the expected output format as much as possible.

For example, if you’re asking an LLM to figure out if an animal described is a cat, dog, or turtle step by step, a format like

{
"Reasoning":<Intermediate Reasoning Step>,
"Result": <cat, dog, or turtle>,
}

will lead to significantly less parsing error than

{
"Reasoning":<Intermediate Reasoning Step>,
"cat": boolean,
"dog": boolean,
"turtle": boolean
}

Choose a Good Output Format

Even though LLMs can adapt to various output formats, there are two particularly effective ones that I recommend. First, you might use JSON, as shown in the animal example earlier. This format is straightforward because you can use a standard JSON parser to handle the output. However, keep in mind that even small mistakes in JSON can cause parsing errors.

Alternatively, you can use an HTML tag-like format, which is popular in research. In this method, you instruct the LLM to wrap different types of content in specific tags. For example, for the animal scenario, instead of JSON, you could have the LLM put the reasoning between [Reasoning] and [/Reasoning] tags, and the result within [Result] and [/Result] tags like this:

[Reasoning]<Reasoning>[/Reasoning]
[Result]<cat, dog, or turtle>[/Result]

you can easily capture the targeted information with regular expression

r"\[Result\](.*?)\[/Result\]"

This tag-based method helps separate content clearly and is particularly robust against parsing errors.

Design Good Prompts

As LLMs showcase their emergent ability of in-context learning, the art of crafting effective prompts becomes crucial. This skill greatly improves the chances that LLMs will produce outputs in the formats we specify. I’ve developed a prompt template that has proven very useful in directing the behavior of these models:

"You are an expert on <Task>."
"You will be provided with <Inputs Description>."
"Your job is to <Detailed Description of Your Task>."
"You should output a JSON/String in the following format:"
"<JSONL example or TAG example>."
"Here are a few shot examples:"
"Few-shot examples will have system prompts encapsulated in delimiters. ##### This delimiter ##### should never appear inside your output."
"Here are some few-shot examples:"
"<Few-shot Examples>"
"<Input Resources>"

These highlighted parts are designed to teach LLMs how to follow the specified output format. For a more in-depth understanding, please refer to the detailed example in the last section.

Using LangChain

Although it’s possible to develop an LLM application using only the APIs provided by your chosen model, I strongly recommend using LangChain as your development framework. LangChain simplifies the development process and adds a level of standardization, which is crucial for scaling production efficiently. It offers a wide range of prebuilt tools and incorporates built-in workflows, making it easier to manage and enhance your application over time.

In the case of avoiding parser error, LangChain provides a robust workflow that helps ensure LLMs output format. You can refer to the example section for more detail.

Retry Mechanics

Another practical approach to prevent system crashes is to implement a retry mechanism. You could set up a simple try-catch block outside your function and keep retrying until the operation succeeds. Alternatively, you can utilize LangChain’s built-in Fallback mechanism, which prompts the model to retry automatically.

However, it’s important to remember that API calls to LLMs can be costly. Therefore, you should always set a limit on the number of retries. Generally, allowing just one retry should suffice, especially if you’re adhering to the guidelines I mentioned earlier.

Example

In this section, I’ll show you a straightforward example of how to build a basic LLM application, ensuring it’s free from parsing errors.

We’ll be using LangChain along with the OpenAI API for this tutorial (Again, LangChain is not required but highly recommended). I’m going to skip some of the more detailed steps in the coding process since they’re not the focus here. If you’re curious about those details, you can check out the official LangChain documentation for more information.

We’ll use the example I mentioned earlier about describing animals. Our goal is to have the LLM predict which animal is being described based on the input.

Complete example can be found at:

Setup Environment

Before we dive into the tutorial, let’s start by installing the necessary packages:

pip install langchain
pip install langchain_openai
pip install python-dotenv

In this tutorial, I’m using OpenAI’s GPT-3.5-turbo for the LLM service. Feel free to choose any model you prefer; it won’t significantly alter the structure of this setup. If you’re new to this, you can learn how to create an OpenAI key here. Once your OpenAI key is ready, you should store it in a .env file like so:

OPENAI_API_KEY=<Your OpenAI Key>

To set up your environment, use the following code:

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain.output_parsers import PydanticOutputParser
load_dotenv()

This setup will help ensure that your application can access the OpenAI API smoothly using the key stored in your .env file.

Create the Prompt

After setting up the environment, let’s construct the LLM pipeline that receives a text description and predicts the corresponding animal. This task is straightforward, so we don’t need a complex parser. However, for educational purposes, I’ll guide the model to show its reasoning step-by-step before providing the final answer. You can delve deeper into this approach by exploring the “Chain of Thought” paper.

Given that our application is limited to identifying “Cat”, “Dog”, and “Turtle”, a parser becomes necessary to extract specific information from the LLM’s output. As previously discussed, the initial step involves teaching the LLM the expected output format.

LangChain offers various methods to facilitate this setup. I have found that using Pydantic with LangChain’s built-in Pydantic parser is the most straightforward. Here’s how you can define your desired output schema:

from langchain_core.pydantic_v1 import BaseModel, Field
class AnimalOutput(BaseModel):
"""Defines the output format for animal description analysis."""
reasoning: str = Field(…, description="Step-by-step analysis of the input animal description")
result: str = Field(…, description="Final answer, which should be 'Dog', 'Cat', or 'Turtle'.")

Next, create a Pydantic Parser to parse this schema from the output, and use LangChain’s Prompt Template to instruct the model on the desired output format:

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
parser = PydanticOutputParser(pydantic_object=AnimalOutput)
prompt = ChatPromptTemplate.from_messages(
[
(
"You are an expert in identifying animals"
"You will be provided with a description of an animal"
"Your job is to analyze the description step by step and predict whether the animal is a cat, a dog, or a turtle"
"Remenber, concepts from two tag should be included"
"You will be outputing a JSON in following format: "
"{output}"
"Remenber to use double quote \" for key values"
"Here is the input description: {description}"
)
]
).partial(format_instructions=parser.get_format_instructions())
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
outputformat = """{"reasoning":"<content>","result":"<content>"}"""
runnable = prompt | llm | parser
test_description = "This four-legged furry companion is known for its loyalty and wagging tail."
print(runnable.invoke({"output":outputformat, "description":test_description }))

When we invoke the LLM with the above prompt, LangChain will substitute any placeholders (such as “{output}” and “{description}”) with the specified values indicated in the dictionary (the {“output”: outputformat, “description”:test_description} dictionary inside invoke function). Moreover, the `.partial` function at the end again instructs the model on how to format the output correctly.

Enhance the Prompt with Few-Shot Examples

This example demonstrates a task that a large language model (LLM) can already perform flawlessly with basic instructions. However, for more complex scenarios, few-shot prompting becomes essential to maintain model performance. One important tip for few-shot prompts is to ensure they meticulously follow the exact output format, including the use of single and double quotes. Therefore, I often create a helper function to generate examples precisely as instructed:

def build_few_shot_examples(description:str, reasoning:str, result:str):
header = "#####description: " + description + "#####\n\n"
details = """{"reasoning":""" + "\"" + reasoning + "\"" + ""","result":""" + "\"" + result + """\"}"""
return header + details

“#####” are delimiters that separate inputs from outputs. Because we may have multiple few-shot prompts, we clarify the meaning of the delimiter in the main prompt

load_dotenv() # Load your OpenAI API environment variables
parser = PydanticOutputParser(pydantic_object=AnimalOutput)
prompt = ChatPromptTemplate.from_messages([
(
"You are an expert in identifying animals. "
"You will be provided with a description of an animal. "
"Your job is to analyze the description step-by-step and determine if the animal is a cat, a dog, or a turtle. "
"Remember, concepts from two tags should be included. "
"You will be outputting a dictionary in the following format: "
"{output} "
"Few-shot examples will have system prompts encapsulated in the delimiter '#####', and your output should not include the '#####' delimiters. "
"Here are the few-shot examples: "
"{examples} "
"Remember to use double quotes \" for key values. "
"Here is the input description: {description}"
)
]).partial(format_instructions=parser.get_format_instructions())
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
# Build Output Template
outputformat = """{"reasoning":"<content>","result":"<content>"}"""

# Build Few Shot examples
example = build_few_shot_examples(
description="This four-legged furry companion is known for its loyalty and wagging tail.",
reasoning="The description mentions a four-legged furry companion known for loyalty and a wagging tail. Cats and turtles are typically not associated with wagging tails, hence the animal is most likely a dog.",
result="dog")

# Actual LLM Chain
runnable = prompt | llm | parser
test_description = "This small, agile creature is known for its grace, independent spirit, and penchant for napping in sunbeams."
print(self_correct_enhance_chain.invoke({"output": outputformat, "description": test_description, "examples":example}))

This structured approach ensures the model uses the provided examples to generate accurate and consistent responses.

Retry Mechanism

You can implement a retry mechanism using LangChain Fallback, which activates when the initial pipeline execution is unsuccessful. Here’s how you might set it up:

runnable = prompt | llm | parser
self_correct_enhance_chain = runnable.with_fallbacks([exception_to_messages | runnable], exception_key="exception")
self_correct_enhance_chain.invoke(...)

The `exception_to_messages` function is where you define custom responses that trigger upon encountering an exception. Here is an example of such a function:

def exception_to_messages(inputs: dict, VERBOSE=True) -> dict:
exception = str(inputs['exception'])
# Incorporate historical messages into the original input to inform the model of its previous error.
messages = ChatPromptTemplate.from_messages([
"The last call raised an exception:",
exception,
"Please learn from this and try again."
])
inputs["last_output"] = messages
return inputs

This mechanism ensures the system attempts to correct its previous errors by re-evaluating the input through the pipeline after modifying the context based on the exception encountered.

Conclusion

In conclusion, developing robust LLM-based applications requires careful attention to output formatting and error handling. By simplifying the output format, choosing appropriate data structures, and designing effective prompts, you can significantly reduce parser errors and system crashes. Utilizing tools like LangChain can streamline the development process and provide a standardized workflow for managing LLM outputs. Implementing retry mechanisms further enhances the reliability of your application by allowing it to recover from occasional format deviations. By following these guidelines, you can create more resilient and efficient LLM applications, ensuring smoother performance and a better user experience.

Thank you for reading!

--

--