Semantic Evaluation

To validate dynamic outputs agianst an expected output semantically using our AI based service LLM Evaluator

Usage 🚀

Refer to Setup Guide for installing dependecies for both Java and Python.

JAVA

There are 2 clients available with the Java SDK: SyncClient and AsyncClient

chevron-rightJava Client Codehashtag
# for async client 
import org.qyrus.ai_sdk.Clients.AsyncClient;
AsyncClient client = new AsyncClient(<API_TOKEN>, null);

# for sync client
import org.qyrus.ai_sdk.Clients.SyncClient;
SyncClient client = new SyncClient(<API_TOKEN>, null);

Using SyncClient

Here's an example of utilizing LLM Evaluator with SyncClient.

chevron-rightJava SyncClient Codehashtag
private static void testLLMEvaluator() {

        Dotenv dotenv = Dotenv.load();
        String QYRUS_AI_SDK_API_TOKEN = dotenv.get("QYRUS_AI_SDK_API_TOKEN");

        SyncClient client = new SyncClient(QYRUS_AI_SDK_API_TOKEN, null);
        String context = "application is about generating dynamic text for messages on phone";
        String expected_output = "Winning lottery of 10k$";
        List<String> executed_output = new ArrayList<>();
        executed_output.add("You have won 10000 dollars");
        String guardrails = "No sensititve info";


        long startTime = System.currentTimeMillis();
        int numberOfRequests = 1;

        for (int i = 0; i < numberOfRequests; i++) {
            try {
                // Assuming there is an api_builder field in SyncClient and a create method that matches the described input
                LLMEval.LLMEvalResponse response = client.llmevaluator.evaluate(context, expected_output, executed_output, guardrails);
                // Assuming the APIBuilderResponse class has a getSwaggerJson method to retrieve the swagger json
                String report = response.getReport();
                System.out.println("Report: " + report);
                
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        long endTime = System.currentTimeMillis();
        System.out.println("Synchronous Total time for LLM Eval request: " + (endTime - startTime) + " ms");
    }

Using AsyncClient

Here's an example of utilizing LLM Evaluator with AsyncClient.

chevron-rightJava AsyncClient Codehashtag

PYTHON

There are two clients available with the Python SDK: SyncQyrusAI and AsyncQyrusAI.

chevron-rightPython Client Codehashtag

Using SyncQyrusAI

Here's an example of utilizing LLM Evaluator with SyncQyrusAI.

chevron-rightPython SyncQyrusAI Codehashtag

Using AsyncQyrusAI

Here's an example of utilizing LLM Evaluator with AsyncQyrusAI.

chevron-rightPython AsyncQyrusAI Codehashtag

Python-only: RAG and MCP Testing

The Python SDK includes additional LLM Evaluator capabilities for RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol / tool-calling) testing. These helpers are available via the llm_evaluator.evaluator object on both AsyncQyrusAI and SyncQyrusAI.

Initialize LLM Evaluator

Evaluate RAG (Retrieval-Augmented Generation) Systems

Evaluate MCP (Model Context Protocol) Tool-Calling Systems

Batch Evaluation

Using JSON Input (Alternative to Pydantic)

Legacy Judge Evaluation (Backwards Compatibility)

Synchronous Usage

Advanced MCP with Schema Validation

Note: The legacy LLM Evaluator (the original judge-based evaluation) is accessible via REST APIs. The new RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol / tool-calling) testing capabilities are available in the Python SDK only at this time and are not yet exposed via the REST API. These RAG and MCP features will be made available via REST APIs soon.

Last updated