Build a land launching tool using the upper and langchain API

by Brenden Burgess

When you buy through links on our site, we may earn a commission at no extra cost to you. However, this does not influence our evaluations.

UpThe focus verification service is a powerful API to verify that the responses generated by AI are firmly rooted in a reliable source material. By submitting contextual-response pairs to the final point of the scene, we can instantly determine if the context provided takes care of a given response and receive a confidence assessment of this earth. In this tutorial, we show how to use the basic sub-pesting capacities, including unique verification, lots processing and multi-domain tests, to ensure that our AI systems produce factual and trustworthy content in various subjects.

!pip install -qU langchain-core langchain-upstage


import os
import json
from typing import List, Dict, Any
from langchain_upstage import UpstageGroundednessCheck


os.environ("UPSTAGE_API_KEY") = "Use Your API Key Here"

We install the latest Langchain nucleus and the upstage integration package, let's import the Python modules necessary for data management and entry, and define our UPSTAGE API key to the environment to authenticate all subsequent land setting control requests.

class AdvancedGroundednessChecker:
    """Advanced wrapper for Upstage Groundedness Check with batch processing and analysis"""
   
    def __init__(self):
        self.checker = UpstageGroundednessCheck()
        self.results = ()
   
    def check_single(self, context: str, answer: str) -> Dict(str, Any):
        """Check groundedness for a single context-answer pair"""
        request = {"context": context, "answer": answer}
        response = self.checker.invoke(request)
       
        result = {
            "context": context,
            "answer": answer,
            "grounded": response,
            "confidence": self._extract_confidence(response)
        }
        self.results.append(result)
        return result
   
    def batch_check(self, test_cases: List(Dict(str, str))) -> List(Dict(str, Any)):
        """Process multiple test cases"""
        batch_results = ()
        for case in test_cases:
            result = self.check_single(case("context"), case("answer"))
            batch_results.append(result)
        return batch_results
   
    def _extract_confidence(self, response) -> str:
        """Extract confidence level from response"""
        if hasattr(response, 'lower'):
            if 'grounded' in response.lower():
                return 'high'
            elif 'not grounded' in response.lower():
                return 'low'
        return 'medium'
   
    def analyze_results(self) -> Dict(str, Any):
        """Analyze batch results"""
        total = len(self.results)
        grounded = sum(1 for r in self.results if 'grounded' in str(r('grounded')).lower())
       
        return {
            "total_checks": total,
            "grounded_count": grounded,
            "not_grounded_count": total - grounded,
            "accuracy_rate": grounded / total if total > 0 else 0
        }


checker = AdvancedGroundednessChecker()

The AdvancedGrouDedNesschecker class envelops the upstanding API of the Upstage in a simple and reusable interface which allows us to perform single context context checks and by lots while accumulating the results. It also includes assistance methods to extract a confidence label from each response and calculate global precision statistics to all controls.

print("=== Test Case 1: Height Discrepancy ===")
result1 = checker.check_single(
    context="Mauna Kea is an inactive volcano on the island of Hawai'i.",
    answer="Mauna Kea is 5,207.3 meters tall."
)
print(f"Result: {result1('grounded')}")


print("\n=== Test Case 2: Correct Information ===")
result2 = checker.check_single(
    context="Python is a high-level programming language created by Guido van Rossum in 1991. It emphasizes code readability and simplicity.",
    answer="Python was made by Guido van Rossum & focuses on code readability."
)
print(f"Result: {result2('grounded')}")


print("\n=== Test Case 3: Partial Information ===")
result3 = checker.check_single(
    context="The Great Wall of China is approximately 13,000 miles long and took over 2,000 years to build.",
    answer="The Great Wall of China is very long."
)
print(f"Result: {result3('grounded')}")


print("\n=== Test Case 4: Contradictory Information ===")
result4 = checker.check_single(
    context="Water boils at 100 degrees Celsius at sea level atmospheric pressure.",
    answer="Water boils at 90 degrees Celsius at sea level."
)
print(f"Result: {result4('grounded')}")

We carry out four autonomous earthing checks, covering a factual error in height, a correct declaration, a partial vague correspondence and a contradictory assertion, using the AdvancedGroundnesschecker. He prints each result of the scene to illustrate how the service indicates responses founded in relation to the not extended responses in these different scenarios.

print("\n=== Batch Processing Example ===")
test_cases = (
    {
        "context": "Shakespeare wrote Romeo and Juliet in the late 16th century.",
        "answer": "Romeo and Juliet was written by Shakespeare."
    },
    {
        "context": "The speed of light is approximately 299,792,458 meters per second.",
        "answer": "Light travels at about 300,000 kilometers per second."
    },
    {
        "context": "Earth has one natural satellite called the Moon.",
        "answer": "Earth has two moons."
    }
)


batch_results = checker.batch_check(test_cases)
for i, result in enumerate(batch_results, 1):
    print(f"Batch Test {i}: {result('grounded')}")


print("\n=== Results Analysis ===")
analysis = checker.analyze_results()
print(f"Total checks performed: {analysis('total_checks')}")
print(f"Grounded responses: {analysis('grounded_count')}")
print(f"Not grounded responses: {analysis('not_grounded_count')}")
print(f"Groundedness rate: {analysis('accuracy_rate'):.2%}")


print("\n=== Multi-domain Testing ===")
domains = {
    "Science": {
        "context": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, & water into glucose and oxygen.",
        "answer": "Plants use photosynthesis to make food from sunlight and CO2."
    },
    "History": {
        "context": "World War II ended in 1945 after the surrender of Japan following the atomic bombings.",
        "answer": "WWII ended in 1944 with Germany's surrender."
    },
    "Geography": {
        "context": "Mount Everest is the highest mountain on Earth, located in the Himalayas at 8,848.86 meters.",
        "answer": "Mount Everest is the tallest mountain and is located in the Himalayas."
    }
}


for domain, test_case in domains.items():
    result = checker.check_single(test_case("context"), test_case("answer"))
    print(f"{domain}: {result('grounded')}")

We are executing a series of earth launch checks on prizes on predefined test cases, printing of individuals of high individual, then calculate and display global precision measures. He also performs multi-domain validations in science, history and geography to illustrate how the implementation of implementation in different subjects.

def create_test_report(checker_instance):
    """Generate a detailed test report"""
    report = {
        "summary": checker_instance.analyze_results(),
        "detailed_results": checker_instance.results,
        "recommendations": ()
    }
   
    accuracy = report("summary")("accuracy_rate")
    if accuracy < 0.7:
        report("recommendations").append("Consider reviewing answer generation process")
    if accuracy > 0.9:
        report("recommendations").append("High accuracy - system performing well")
   
    return report


print("\n=== Final Test Report ===")
report = create_test_report(checker)
print(f"Overall Performance: {report('summary')('accuracy_rate'):.2%}")
print("Recommendations:", report("recommendations"))


print("\n=== Tutorial Complete ===")
print("This tutorial demonstrated:")
print("• Basic groundedness checking")
print("• Batch processing capabilities")
print("• Multi-domain testing")
print("• Results analysis and reporting")
print("• Advanced wrapper implementation")

Finally, we define CREATE_TEST_REPORT help which compiles all the terrorist checks accumulated in a summary report, with global precision and tailor -made recommendations, then prints the final performance measures as well as a summary of the key manifestations of the tutorial.

In conclusion, with the verification of the Upstage earth at our disposal, we obtain an evolutionary and agnostic solution of the domain for the verification of facts in real time and the notation of confidence. Whether we validated isolated complaints or the processing of large lots of responses, upstage offers clear, founded / unfounded judgments and confidence measures that allow us to monitor precision rates and generate usable quality reports. By integrating this service into our workflow, we can improve the reliability of the outputs generated by AI and maintain rigorous standards of factual integrity in all applications.


Discover the Codes. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.


author profile Sana Hassan

Sana Hassan, consulting trainee at Marktechpost and double -degree student at Iit Madras, is passionate about the application of technology and AI to meet the challenges of the real world. With a great interest in solving practical problems, it brings a new perspective to the intersection of AI and real life solutions.

a sleek banner advertisement showcasing

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.