January 25th, 2024

Nicolas Camara imageNicolas Camara

How Mendable leverages Langsmith to debug Tools & Actions

Langsmith is an easy way to debug and evaluate LLM apps

How Mendable leverages Langsmith to debug Tools & Actions image

How Mendable leverages Langsmith to debug Tools & Actions

It is no secret that 2024 will be the year we start seeing more LLMs baked into our workflows. This means that the way we interact with LLM models will be less just Question and Answer and more action-based.

At Mendable.ai, we are seeing this transformation first hand. Late last year, we equipped ~1000 customer success + sales people at a $20+ billion tech company with GTM assistants that help with tech guidance, process help, and industry expertise. In five months, the platform achieved $1.3 million in savings, and it’s projected to save $3 million this year due to decreased research time and dependency on technical resources. Now we are working with that same company to enable these assistants to take action, enabling even more efficiency improvements.

An example use case is a salesperson who wants to get the latest focus areas for a prospect and their company. When asking an assistant enabled with our Tools & Actions “What are the latest key initiatives for X?”, the AI can:

  1. Call the CRM API and get the exact team the salesperson is trying to sell to
  2. Use the Google News or DUNS API to get the latest news on the specific team and related initiatives
  3. Call the CoreSignal API to get the latest hiring trends for the company based on job postings and more
  4. Interpret the news and hiring trends, highlighting ways the salesperson can use these new found initiatives to sell in meeting

As you can see the introduction of Tools & Actions in Mendable expands capabilities quite a bit, enabling chatbots to access and utilize a wider range of data sources and perform various automated tasks. On the backend, to ensure the precision and efficiency of such features, Mendable leverages Langsmith’s debugging tools, a critical component in the development and optimization of our AI-driven functionalities.

Mendable Tools & Actions feature

Opening the ‘black box’ of agent execution

One of the biggest problems when building applications that depend on agentic behavior is reliability and lack of observability. Understanding the key interactions, decisions of an Agent loop can be quite tricky. Especially when it has been giving access to multiple resources and is embedded in a production pipeline.

While building Tools & Actions, the core aspect we had in mind was giving the ability for the user to create their own Tool by defining an API spec for it. We designed this so the user could input a tag such as <ai-generated-value> when creating the API request and the AI can fill that value at request time with an ‘AI generated’ value based on the user’s question and schema. This is one example, but there were a lot more just in time AI inputs/outputs that went into it. This posed some challenges in the building process that we weren’t expecting. Soon our development process was full of “console.logs” everywhere and high latency runs. Trying to debug why a tool wasn’t being called or why the API request had failed became a nightmare. We had no proper visibility on what the agentic behavior looked like nor if custom tools were working as expected.

Here is where Langsmith from Langchain came to help. If you are not familiar, Langsmith allows you to easily debug, evaluate and manage LLM apps. It, of course, integrates swiftly with Langchain. As we were already using parts of the OpenAI tool agents that Langchain provides, the integration was smooth.

Langsmith Platform

The Debugging Process

Langsmith allows us to have a peek inside of the agents’ brain. This is very useful for debugging how an agent’s thinking and decision process can impact the output.

When you enable tracing in a LangChain, the app captures and displays a detailed visualization of the runs’ call hierarchy. This feature allows you to explore the inputs, outputs, parameters, response times, feedback, token consumption, and other critical metrics of your run.

Langsmith Traces

When we connected Langsmith to our Tools & Actions module, we quickly spotted problems that we didn’t have the visibility for.

Take a look for instance on one of our first traces using Tools. As you can see here, the last call to ChatOpenAI took a long time: 7.23 seconds.

Langsmith Single Trace for Mendable's AgentExecutor

When you click on the 7.23s Run we saw that the prompt was massive, it had concatenated all of our RAG pipeline prompts/sources with our Tools & Actions, leading to a delay in the streaming process. This allowed to further optimize what chunks of the prompt needed to be used by the Tools & Action pipeline, reducing total latency overall.

Inspecting Tools

Another valuable aspect of having ease access to traces is the ability to inspect a tool input. As I mentioned in the beginning, we allow users to create custom tools in Mendable. With that we need to make sure that the building process of a tool in the UI is easy and quick but also performs well. This means that when we create a tool in the backend, it needs to have the correct schema defined partially by what the user inputted in our UI (API request details) but also by what the AI will automatically feed in at request time.

In the example below, it shows a Recent News Tool that was run. The question inside the query parameter was generated by the AI.

Recent News Tool used by Mendable

Making sure that query was accurate with what the user inputted but also optimized for the tool being used was very challenging. Thankfully Langsmith made visualizing that very easy. What we did is we ran the same tool with different queries ~20 times and quickly scrolled through Langsmith making sure the output and schema were accurate. The times that weren’t accurate, we were able to understand why by opening the trace further or by annotating in Langsmith so we could review it later.

One of our realizations was that the Tools description was critical for the correct schema and input to be generated. With this new insight we obtained from running the experiments, we went ahead and improved the AI generated part of that in our product and also made users aware that they need to provide good detailed descriptions when creating a Tool.

Building our Dataset

With all the optimization experiments taking over, the need to quickly save inputs/outputs for further evaluation became evident. With Langsmith we selected the runs that we wanted to add to our dataset and clicked the “Add to Dataset” button.

Add to Dataset in Langsmith

This was a very quick and easy win for us as we now had all the data in one place from our runs and we could even evaluate that using Langsmith itself.

Mendable Dataset In Langsmith

Conclusion

Langsmith’s debugging tools have been a game-changer for us. They’ve given us a clear window into how our ‘Tools and Actions’ AI agent thinks and acts, which has been helpful for tackling tricky issues like slow response times and making our debugging process way smoother. Mendable Tools & Actions has launched but we are still early in the process. We have been working with amazing enterprises such as Snapchat to help improve and tailor custom solutions to them. If you are interested in testing Mendable, send me a message in X at @nickscamara_ or email us at eric@mendable.ai with your use case.

Also, if you are looking to speed up your LLM development process, I would definitely recommend trying Langsmith out - especially if you already use Langchain in your pipeline.

I hope my insights were helpful and thanks to Langchain for being an awesome partner.