In this tutorial, we will learn to take advantage of Adala Framework for building a modular active learning pipeline for classification of medical symptoms. We start by installing and checking Adala alongside the required dependencies, then integrrons Google Gemini as a personalized annotator to classify symptoms in predefined medical fields. Thanks to a simple active learning loop with three things, prioritize the critical symptoms such as chest pain, we will see how to select, annotate and visualize the confidence of the classification, obtain practical information on the behavior of the model and the extensible architecture of Adala.
!pip install -q git+https://github.com/HumanSignal/Adala.git
!pip list | grep adala
We install the latest Adala version directly from its GitHub repository. At the same time, the following PIP list | The GREP Adala command analyzes the list of packages in your environment for all the entries containing “Adala”, giving a quick confirmation that the library has been successfully installed.
import sys
import os
print("Python path:", sys.path)
print("Checking if adala is in installed packages...")
!find /usr/local -name "*adala*" -type d | grep -v "__pycache__"
!git clone https://github.com/HumanSignal/Adala.git
!ls -la Adala
We print your current Python module search paths, then seek the / usr / local directory for all the “Adala” files installed (excluding __Pycache_) to check that the package is available. Then, it clones the Adala GitHub repository in your work directory and lists its content so that you can confirm that all the source files have been recovered correctly.
import sys
sys.path.append('/content/Adala')
By adding the Adala file cloned to Sys.path, we tell Python to treat / content / adala as an uninformed package directory. This ensures that subsequent import declarations … will take care of your local clone directly rather than (or more) of any installed version.
!pip install -q google-generativeai pandas matplotlib
import google.generativeai as genai
import pandas as pd
import json
import re
import numpy as np
import matplotlib.pyplot as plt
from getpass import getpass
We install the GOOGLE GENERIF AI SDK alongside data analysis and tracing libraries (Pandas and MATPLOTLIB), then imports key modules, GENAI to interact with Gemini, Pandas for Tabular data, JSON AND RE to analyze, and GetPass to invite the user for their Savotlib.Plot.
try:
from Adala.adala.annotators.base import BaseAnnotator
from Adala.adala.strategies.random_strategy import RandomStrategy
from Adala.adala.utils.custom_types import TextSample, LabeledSample
print("Successfully imported Adala components")
except Exception as e:
print(f"Error importing: {e}")
print("Falling back to simplified implementation...")
This test block / Unless you are tent to load the basic classes of Adala, Baseannotator, Randomstrategy, textsample and labeled so that we can take advantage of its integrated annotators and its sampling strategies. During success, he confirms that Adala components are available; If an import fails, it captures the error, prints the exceptional message and falls free of charge to a simpler implementation.
GEMINI_API_KEY = getpass("Enter your Gemini API Key: ")
genai.configure(api_key=GEMINI_API_KEY)
We invite you solidly to enter your gemini API key without echoing the notebook. Then, we configure the AI GENERATIVE Customer Google (GENAI) with this key to authenticate all the following calls.
CATEGORIES = ("Cardiovascular", "Respiratory", "Gastrointestinal", "Neurological")
class GeminiAnnotator:
def __init__(self, model_name="models/gemini-2.0-flash-lite", categories=None):
self.model = genai.GenerativeModel(model_name=model_name,
generation_config={"temperature": 0.1})
self.categories = categories
def annotate(self, samples):
results = ()
for sample in samples:
prompt = f"""Classify this medical symptom into one of these categories:
{', '.join(self.categories)}.
Return JSON format: {{"category": "selected_category",
"confidence": 0.XX, "explanation": "brief_reason"}}
SYMPTOM: {sample.text}"""
try:
response = self.model.generate_content(prompt).text
json_match = re.search(r'(\{.*\})', response, re.DOTALL)
result = json.loads(json_match.group(1) if json_match else response)
labeled_sample = type('LabeledSample', (), {
'text': sample.text,
'labels': result("category"),
'metadata': {
"confidence": result("confidence"),
"explanation": result("explanation")
}
})
except Exception as e:
labeled_sample = type('LabeledSample', (), {
'text': sample.text,
'labels': "unknown",
'metadata': {"error": str(e)}
})
results.append(labeled_sample)
return results
We define a list of medical categories and implement a Geminiannotator class which envelops the generative model of Google Gemini for the classification of symptoms. In his annotated method, he builds a JSON yield prompt for each text sample, analyzes the model's response in a structured label, a confidence score and an explanation, and envelops them in light objects labeled, falling on an “unknown” label if errors occur.
sample_data = (
"Chest pain radiating to left arm during exercise",
"Persistent dry cough with occasional wheezing",
"Severe headache with sensitivity to light",
"Stomach cramps and nausea after eating",
"Numbness in fingers of right hand",
"Shortness of breath when climbing stairs"
)
text_samples = (type('TextSample', (), {'text': text}) for text in sample_data)
annotator = GeminiAnnotator(categories=CATEGORIES)
labeled_samples = ()
We define a list of raw symptoms channels and each wrapped in a light text sample to transmit them to the annotator. It then instantly instantly instantly instantly with the predefined category set and prepares an empty list labeled to store the results of the next annotation iterations.
print("\nRunning Active Learning Loop:")
for i in range(3):
print(f"\n--- Iteration {i+1} ---")
remaining = (s for s in text_samples if s not in (getattr(l, '_sample', l) for l in labeled_samples))
if not remaining:
break
scores = np.zeros(len(remaining))
for j, sample in enumerate(remaining):
scores(j) = 0.1
if any(term in sample.text.lower() for term in ("chest", "heart", "pain")):
scores(j) += 0.5
selected_idx = np.argmax(scores)
selected = (remaining(selected_idx))
newly_labeled = annotator.annotate(selected)
for sample in newly_labeled:
sample._sample = selected(0)
labeled_samples.extend(newly_labeled)
latest = labeled_samples(-1)
print(f"Text: {latest.text}")
print(f"Category: {latest.labels}")
print(f"Confidence: {latest.metadata.get('confidence', 0)}")
print(f"Explanation: {latest.metadata.get('explanation', '')(:100)}...")
This active application loop works for three iterations, filtering samples already labeled each time and attributing a basic score of 0.1 – increased by 0.5 for keywords like “chest”, “heart” or “pain” – to prioritize critical symptoms. He then selects the most scoring sample, invokes the geminiannotator to generate a category, confidence and an explanation, and prints these details for examination.
categories = (s.labels for s in labeled_samples)
confidence = (s.metadata.get("confidence", 0) for s in labeled_samples)
plt.figure(figsize=(10, 5))
plt.bar(range(len(categories)), confidence, color="skyblue")
plt.xticks(range(len(categories)), categories, rotation=45)
plt.title('Classification Confidence by Category')
plt.tight_layout()
plt.show()
Finally, we extract the planned category labels and their trust scores and use Matplotlib to draw a vertical bar graph, where the height of each bar reflects the confidence of the model in this category. Category names are turned for readability, a title is added and tirm_layout () guarantees that the elements of the graph are carefully organized before the display.
In conclusion, by combining the Adala-And-Play annotators of Adala and sampling strategies with the generative power of Google Gemini, we have built a rationalized workflow which improves the quality of the annotation on the medical text. This tutorial guided you through the installation, configuration and a tailor -made geminiannotator, and demonstrated how to implement the sampling and the visualization of priority based on priority. With this foundation, you can easily exchange in other models, develop your category set or integrate more advanced active learning strategies to tackle larger and more complex annotation tasks.
Check Colab notebook here. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our 90K + ML Subdreddit.
Here is a brief overview of what we build on Marktechpost:
Asif Razzaq is the CEO of Marktechpost Media Inc .. as a visionary entrepreneur and engineer, AIF undertakes to exploit the potential of artificial intelligence for social good. His most recent company is the launch of an artificial intelligence media platform, Marktechpost, which stands out from its in-depth coverage of automatic learning and in-depth learning news which are both technically solid and easily understandable by a large audience. The platform has more than 2 million monthly views, illustrating its popularity with the public.
