"""
title: Enhanced Context Counter for OpenWebUI
author: AG
author_url: https://github.com/open-webui
funding_url: https://github.com/open-webui
version: 4.0
license: MIT
requirements: tiktoken
description: Advanced context window tracker and metrics dashboard for OpenWebUI with comprehensive LLM support, cost tracking, and performance analytics.
# ENHANCED CONTEXT COUNTER FOR OPENWEBUI v4.0
## Description
The Enhanced Context Counter is a sophisticated Function Filter for OpenWebUI that provides real-time monitoring and analytics for LLM interactions. It tracks token usage, estimates costs, monitors performance metrics, and provides actionable insights through a configurable status display. The system supports a wide range of LLMs through multi-source model detection and offers extensive customization options via Valves and UserValves.
## Key Features
- **Comprehensive Model Support**: Multi-source model detection using OpenRouter API, exports, hardcoded defaults, and user-defined custom models.
- **Advanced Token Counting**: Primary tiktoken-based counting with intelligent fallbacks, content-specific adjustments, and calibration factors.
- **Cost Estimation & Budgeting**: Precise cost calculation with input/output breakdown and multi-level budget tracking (daily, monthly, session).
- **Performance Analytics**: Real-time token rate calculation, adaptive window sizing, and comprehensive session statistics.
- **Intelligent Context Management**: Context window monitoring with progress visualization, warnings, and smart trimming suggestions.
- **Persistent Cost Tracking**: File-based tracking with thread-safe operations for user, daily, and monthly costs.
- **Highly Configurable UI**: Customizable status line with modular components and visual indicators.
## Other Features
- **Image Token Estimation**: Heuristic-based calculation using defaults, resolution analysis, and model-specific overrides.
- **Calibration Integration**: Status display based on external calibration results for accuracy verification.
- **Error Resilience**: Graceful fallbacks for missing dependencies, API failures, and unrecognized models.
- **Content-Type Detection**: Specialized handling for different content types (code, JSON, tables, etc.).
- **Cache Optimization**: Token counting cache with adaptive pruning for performance enhancement.
- **Cost Optimization Hints**: Actionable suggestions for reducing costs based on usage patterns.
- **Extensive Logging**: Configurable logging with rotation for diagnostics and troubleshooting.
## Valve Configuration Guide
The function offers extensive customization through Valves (global settings) and UserValves (per-user overrides):
### Core Valves
- **[Model Detection]**: Configure model recognition with `fuzzy_match_threshold`, `vendor_family_map`, and `heuristic_rules`.
- **[Token Counting]**: Adjust accuracy with `model_correction_factors` and `content_correction_factors`.
- **[Cost/Budget]**: Set `budget_amount`, `monthly_budget_amount`, and `budget_tracking_mode` for financial controls.
- **[UI/UX]**: Customize display with toggles like `show_progress_bar`, `show_cost`, and `progress_bar_style`.
- **[Performance]**: Fine-tune with `adaptive_rate_averaging` and related window settings.
- **[Cache]**: Optimize with `enable_token_cache` and `token_cache_size`.
- **[Warnings]**: Configure alerts with percentage thresholds for context and budget usage.
### UserValves
Users can override global settings with personal preferences:
- Custom budget amounts and warning thresholds
- Model aliases for simplified model references
- Personal correction factors for token counting accuracy
- Visual style preferences for the status display
## UI Status Line Breakdown
The status line provides a comprehensive overview of the current session's metrics in a compact format:
```
šŖ 48/1.0M tokens (0.00%) [ā±ā±ā±ā±ā±] | š½5/š¼43 | š° $0.000000 | š¦ Daily: $0.009221/$100.00 (0.0%) | ā±ļø 5.1s (8.4 t/s) | šļø $99.99 left (0.01%) this month | Text: 48 | š§ Not Calibrated
```
### Status Components
- **šŖ 48/1.0M tokens (0.00%)**: Total tokens used / context window size with percentage
- **[ā±ā±ā±ā±ā±]**: Visual progress bar showing context window usage
- **š½5/š¼43**: Input/Output token breakdown (5 input, 43 output)
- **š° $0.000000**: Total estimated cost for the current session
- **š¦ Daily: $0.009221/$100.00 (0.0%)**: Daily budget usage (spent/total and percentage)
- **ā±ļø 5.1s (8.4 t/s)**: Elapsed time and tokens per second rate
- **šļø $99.99 left (0.01%) this month**: Monthly budget status (remaining amount and percentage used)
- **Text: 48**: Text token count (excludes image tokens if present)
- **š§ Not Calibrated**: Calibration status of token counting accuracy
### Display Modes
The status line adapts to different levels of detail based on configuration:
1. **Minimal**: Shows only essential information (tokens, context percentage)
   ```
   šŖ 48/1.0M tokens (0.00%)
   ```
2. **Standard**: Includes core metrics (default mode)
   ```
   šŖ 48/1.0M tokens (0.00%) [ā±ā±ā±ā±ā±] | š½5/š¼43 | š° $0.000000 | ā±ļø 5.1s (8.4 t/s)
   ```
3. **Detailed**: Displays all available metrics including budgets, token breakdowns, and calibration status
   ```
   šŖ 48/1.0M tokens (0.00%) [ā±ā±ā±ā±ā±] | š½5/š¼43 | š° $0.000000 | š¦ Daily: $0.009221/$100.00 (0.0%) | ā±ļø 5.1s (8.4 t/s) | šļø $99.99 left (0.01%) this month | Text: 48 | š§ Not Calibrated
   ```
The display automatically adjusts based on available space and configured preferences in the Valves settings.
## Roadmap (2025-2026)
1. Enhanced model family detection with ML-based classification
2. Advanced content-specific token counting with specialized encoders
3. Interactive UI components for real-time adjustments and analytics
4. Predictive budget forecasting based on usage patterns
5. Cross-session analytics with visualization and reporting
7. API for external integration with monitoring and alerting systems
"""
import time
# import tiktoken # Moved down
import logging
import asyncio
import os
import re
import json
import urllib.request
import urllib.error  # Import specific error type
from typing import List, Optional, Dict, Callable, Any, Awaitable, Tuple
from pydantic import BaseModel, Field
# from decimal import ROUND_HALF_UP, Decimal # Removed unused import
from collections import deque  # Feature 4: Import deque
from datetime import date, datetime, timedelta
from pyee import EventEmitter  # Added for cache TTL check
# Fix 1: Import locking libraries (Corrected Structure)
fcntl = None
msvcrt = None
LOCK_EX = 0
LOCK_SH = 0
LOCK_NB = 0
LOCK_UN = 0
try:
    if os.name == "posix":
        import fcntl
        LOCK_EX = fcntl.LOCK_EX
        LOCK_SH = fcntl.LOCK_SH
        LOCK_NB = fcntl.LOCK_NB
        LOCK_UN = fcntl.LOCK_UN
        print("DEBUG: fcntl imported successfully.")  # Debug print
    elif os.name == "nt":
        import msvcrt
        print("DEBUG: msvcrt imported successfully.")  # Debug print
except ImportError as e:
    print(f"DEBUG: Locking library import failed ({os.name}): {e}")  # Debug print
    # Locking will not be available, functions using it should handle this.
# Define fallback functions *before* trying the import
def fallback_get_last_assistant_message(messages):
    """Fallback: Get the last assistant message."""
    for message in reversed(messages):
        if message.get("role") == "assistant" and "content" in message:
            return message.get("content", "")
    return ""
def fallback_get_messages_content(messages):
    """Fallback: Get all message content joined."""
    return "\n".join(
        [
            msg.get("content", "")
            for msg in messages
            if isinstance(msg.get("content"), str)
        ]
    )
# Import helpers from OpenWebUI - handle gracefully if not available
try:
    # Attempt to import from the expected location
    from open_webui.utils.misc import get_last_assistant_message, get_messages_content  # type: ignore
    print("DEBUG: open_webui.utils.misc imported successfully.")  # Debug print
except ImportError:
    print("DEBUG: open_webui.utils.misc import failed. Using fallbacks.")  # Debug print
    # Assign fallbacks if import fails
    get_last_assistant_message = fallback_get_last_assistant_message
    get_messages_content = fallback_get_messages_content
# Global error handler for uncaught exceptions
import sys
def global_exception_handler(exc_type, exc_value, exc_traceback):
    if issubclass(exc_type, KeyboardInterrupt):
        sys.__excepthook__(exc_type, exc_value, exc_traceback)
        return
    logger.error("Uncaught exception", exc_info=(exc_type, exc_value, exc_traceback))
sys.excepthook = global_exception_handler
# Add asyncio error handler
def handle_async_exception(loop, context):
    msg = context.get("exception", context["message"])
    logger.error(f"Asyncio error: {msg}")
try:
    loop = asyncio.get_event_loop()
    loop.set_exception_handler(handle_async_exception)
except Exception:
    pass
# Set up logging
logger = logging.getLogger("EnhancedContextCounter")
handler = logging.StreamHandler()
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
logger.addHandler(handler)
# logger.setLevel(logging.INFO)  # Fix 2: Moved level setting down
# Check for tiktoken availability
TIKTOKEN_AVAILABLE = False
try:
    import tiktoken
    TIKTOKEN_AVAILABLE = True
except ImportError:
    logger.warning(
        "tiktoken package not available. Will use fallback token counting method."
    )
    # Print a more detailed message to help users troubleshoot
    print(
        "NOTE: 'tiktoken' package is not installed. Using fallback token counting method instead."
    )
    print("TROUBLESHOOTING: If you're seeing installation errors, try these steps:")
    print("  1. Check your Python version (tiktoken requires Python 3.8+)")
    print("  2. Try installing with: pip install -U setuptools wheel")
    print("  3. For Linux systems, you may need 'build-essential' package")
    print("  4. Consider using alternatives like 'transformers' tokenizers if needed")
    print("The context counter will continue to work with reduced accuracy.")
    # We'll handle this in the token counting methods with our fallback mechanism
# Constants
PROGRESS_CHARS = ["ā±", "ā°"]  # Empty/filled progress bar characters
WARNING_EMOJI = "ā ļø"
TOKEN_EMOJI = "šŖ"
CLOCK_EMOJI = "ā±ļø"
MONEY_EMOJI = "š°"
CHART_EMOJI = "š"
CACHE_EMOJI = "š"
INPUT_TOKEN_EMOJI = "š„"  # Added for input token visualization
OUTPUT_TOKEN_EMOJI = "š¤"  # Added for output token visualization (cost breakdown)
BUDGET_EMOJI = "š¦"  # Feature 2: Emoji for budget
SCISSORS_EMOJI = "āļø"  # For trimming hint
# Feature 7: Fallback token counting estimates
CHAR_PER_TOKEN_ESTIMATE = {
    "text": 4.0,
    "code": 3.5,
    "json": 3.0,
    "table": 3.8,  # Estimate for tables
    "list": 4.2,   # Estimate for lists
    "default": 4.0, # Default estimate
}
# ANSI Colors for terminal output
COLORS = {
    "reset": "\033[0m",
    "bold": "\033[1m",
    "blue": "\033[34m",
    "green": "\033[32m",
    "yellow": "\033[33m",
    "red": "\033[31m",
    "cyan": "\033[36m",
    "magenta": "\033[35m",
    # Light versions
    "light_blue": "\033[94m",
    "light_green": "\033[92m",
    "light_yellow": "\033[93m",
    "light_red": "\033[91m",
    "light_cyan": "\033[96m",
    "light_magenta": "\033[95m",
    # Background colors
    "bg_blue": "\033[44m",
    "bg_green": "\033[42m",
    "bg_yellow": "\033[43m",
    "bg_red": "\033[41m",
}
# Configuration settings
DATA_DIR = "data"
CACHE_DIR = os.path.join(DATA_DIR, ".cache")
TOKEN_CACHE_DIR = os.path.join(CACHE_DIR, "token_cache")
PATTERN_CACHE_DIR = os.path.join(CACHE_DIR, "patterns")
USER_COST_FILE = os.path.join(DATA_DIR, f"costs-{time.strftime('%Y')}.json")
DAILY_COST_FILE = os.path.join(
    DATA_DIR, "daily_costs.json"
)  # Feature 2: Daily cost file
CACHE_TTL = 432000  # 5 days
CACHE_MAXSIZE = 1000
DECIMALS = "0.00000001"
MODEL_CACHE_FILE = os.path.join(
    CACHE_DIR, "openrouter_models.json"
)  # For Model Data Cache feature
MODEL_CACHE_TTL_SECONDS = 86400  # 24 hours for Model Data Cache
# --- Pydantic Models for Custom Model Definitions ---
class CustomModelPricing(BaseModel):
    input: float = Field(default=0.0, description="Cost per input token (default: 0.0)")
    output: float = Field(
        default=0.0, description="Cost per output token (default: 0.0)"
    )
class CustomModelDefinition(BaseModel):
    id: str = Field(
        ...,
        description="Unique ID for the custom model (e.g., 'ollama/llama3', 'my-local-model')",
    )
    context_length: int = Field(..., description="Context window size in tokens")
    pricing: CustomModelPricing = Field(
        default_factory=CustomModelPricing, description="Optional pricing per token"
    )
    # Optional: Add a 'family' field if needed for specific tokenizer logic later
    # family: Optional[str] = Field(default=None, description="Optional model family hint (e.g., 'llama', 'mistral') for tokenizer selection")
class Filter:
    """A Function that provides enhanced metrics display with context tracking, cost estimation and performance stats."""
    class Valves(BaseModel):
        """Configuration valves for the Enhanced Context Counter function."""
        # VALIDATION ERROR FIX: The structured `custom_models` list field was removed due to persistent
        # Pydantic validation errors when loading from saved config. The issue occurred because
        # the nested structure with custom model objects couldn't be properly deserialized from
        # the saved JSON configuration, causing validation errors on startup.
        #
        # SOLUTION: The `custom_models_plaintext` field below provides a more robust alternative
        # that uses simple string parsing instead of complex nested objects. This approach is
        # more resilient to serialization/deserialization issues and allows for easier user input.
        # The corresponding commented-out code for the structured approach has been removed for clarity.
        custom_models_plaintext: str = Field(
            default="openai/gpt-4o 128000 0.000005 0.000015",
            title="Custom Models (Plaintext - One per line)",
            description="Add models easily. Enter one model per line in this format: <ID> <Context> <Input Cost> <Output Cost>\nExample:\nopenai/gpt-4o 128000 0.000005 0.000015\nmylocal/mistral-7b 32768 0 0\n\nDetails: ID=Model Identifier, Context=Max Tokens, Costs=USD **per token** (use 0 for free models)."
        )
        # --- Image Token Estimation Heuristics ---
        default_image_tokens: int = Field(
            default=500,
            title="[Image Tokens] Default Tokens Per Image",
            description="Default estimated tokens per image if no other heuristic applies.",
        )
        tokens_per_megapixel: int = Field(
            default=100,
            title="[Image Tokens] Tokens Per Megapixel",
            description="Estimated tokens per megapixel of image resolution.",
        )
        model_image_token_overrides: Dict[str, int] = Field(
            default_factory=dict,
            title="[Image Tokens] Model-Specific Image Token Overrides",
            description="Overrides for specific models, e.g., {'gpt-4-vision': 1500}",
        )
        # --- Model Detection Improvement Plan Phase 1 ---
        force_openrouter_refresh: bool = Field(
            default=True,
            title="[Model Detection] Force OpenRouter Refresh",
            description="Force refresh OpenRouter model list on next startup or manual trigger.",
        )
        openrouter_refresh_interval_hours: int = Field(
            default=1,
            title="[Model Detection] OpenRouter Refresh Interval (hours)",
            description="How often to refresh OpenRouter model list automatically.",
        )
        # --- Model Detection Improvement Plan Phase 2 ---
        fuzzy_match_threshold: int = Field(
            default=90,
            title="[Model Detection] Fuzzy Match Similarity Threshold (%)",
            description="Minimum similarity score (0-100) to accept fuzzy match for model detection.",
        )
        # --- Model Detection Improvement Plan Phase 3 ---
        vendor_family_map: Dict[str, Tuple[str, int]] = Field(
            default_factory=lambda: {
                "openai": ("gpt4", 128000),
                "anthropic": ("claude", 200000),
                "google": ("gemini", 1000000),
                "mistralai": ("mistral", 32768),
                "meta-llama": ("llama", 128000),  # Updated default Llama family context to 128k
                "qwen": ("qwen", 32768),
                "cohere": ("cohere", 128000),
                "x-ai": ("grok", 131072),
                "cognitivecomputations": ("dolphin", 32768),
                "deepseek": ("deepseek", 131072),
                "all-hands": ("openhands", 32768),
                "openrouter": ("quasar", 32768),
            },
            title="[Model Detection] Vendor to Family Map",
            description="Map vendor prefixes to (family, context_size) fallback if exact/fuzzy match fails.",
        )
        # --- Model Detection Improvement Plan Phase 4 ---
        heuristic_rules: Dict[str, Tuple[str, int]] = Field(
            default_factory=lambda: {
                "flash": ("gemini", 1000000),
                "gemini": ("gemini", 1000000),
                "sonnet": ("claude", 200000),
                "opus": ("claude", 200000),
                "haiku": ("claude", 200000),  # Corrected Haiku context
                "mixtral": ("mixtral", 32768),
                "pixtral": ("mixtral", 32768),  # Corrected Pixtral context (Mistral docs say 16k, but OR lists 32k/131k - using 32k as safer default)
                "deephermes": ("llama", 8192),
                "dolphin": ("dolphin", 32768),
                "quasar": ("quasar", 32768),
                "grok": ("grok", 131072),
                "llama": ("llama", 8192),  # Base Llama rule
                "nemotron": ("llama", 128000),  # Updated Nemotron context
                "deepseek": ("deepseek", 131072),
                "command": ("cohere", 128000),
                "openhands": ("openhands", 32768),
                "claude": ("claude", 200000),
                "gpt-4.5": ("gpt4", 128000),
                "gpt-4o": ("gpt4o", 128000),
                "gpt-4-turbo": ("gpt4", 128000),  # Added specific turbo rule
                "gpt-4": ("gpt4", 128000),  # Keep general gpt-4 rule
                "gpt-3.5": ("gpt35", 16385),
                "o1": ("openai", 200000),  # Corrected o1 context
                "o3": ("openai", 200000),  # Corrected o3 context
            },
            title="[Model Detection] Heuristic Substring Rules",
            description="Map substrings to (family, context_size) fallback if other methods fail.",
        )
        # --- End Phase 4 additions ---
        # --- Unknown Model Logging ---
        log_unknown_models: bool = Field(
            default=True,
            title="[Model Detection] Log Unknown Models",
            description="Enable logging of unrecognized model names for continuous improvement.",
        )
        show_detection_source: bool = Field(
            default=True,
            title="[UI/UX] Show Model Detection Source",
            description="Show how the model was detected (exact, alias, fuzzy, vendor, heuristic, fallback) in the UI status message.",
        )
        unknown_models_log_max_size_kb: int = Field(
            default=1024,
            title="[Model Detection] Unknown Models Log Max Size (KB)",
            description="Rotate unknown_models.log if it exceeds this size (in KB).",
        )
        # Enable all features by default for testing
        enable_token_cache: bool = True
        enable_model_data_cache: bool = True
        enable_pattern_recognition: bool = True
        enable_content_detection: bool = True
        content_specific_counting: bool = True
        intelligent_trimming_hint: bool = True
        persist_user_costs: bool = True
        show_status: bool = True
        show_progress: bool = True
        show_tokens_per_message: bool = True
        show_after_completion: bool = True
        use_enhanced_visuals: bool = True
        show_cost_summary: bool = True
        show_metrics_panel: bool = True
        colored_output: bool = True
        show_cache_metrics: bool = True
        show_content_breakdown: bool = True
        adaptive_rate_averaging: bool = True
        # --- End Unknown Model Logging ---
        # [General]
        priority: int = Field(
            default=100,
            title="[General] Priority",
            description="Priority level for execution (lower runs earlier in pipeline). Set high (e.g., 100) to ensure UI status appears after all other filters.",
        )
        status_emission_retries: int = Field(
            default=3,
            title="[General] Status Emission Retries",
            description="Number of retries for failed status emissions.",
        )
        retry_delay_ms: int = Field(
            default=100,
            title="[General] Status Emission Retry Delay (ms)",
            description="Delay between status emission retries in milliseconds.",
        )
        # [UI/UX]
        show_status: bool = Field(
            default=True,
            title="[UI/UX] Show Status Updates",
            description="Show status updates in the UI.",
        )
        show_progress: bool = Field(
            default=True,
            title="[UI/UX] Show Progress Bar",
            description="Show progress bar for context usage.",
        )
        bar_length: int = Field(
            default=5,
            title="[UI/UX] Progress Bar Length",
            description="Length of the visual progress bar.",
        )
        progress_bar_style: str = Field(
            default="standard", # Options: standard, minimal, none
            title="[UI/UX] Progress Bar Style",
            description="Style of the progress bar ('standard'=[ā°ā±ā±ā±ā±], 'minimal'=ā°, 'none'=hide).",
        )
        show_tokens_per_message: bool = Field(
            default=True,
            title="[UI/UX] Show Tokens Per Message",
            description="Show token count for each message in status.",
        )
        show_after_completion: bool = Field(
            default=True,
            title="[UI/UX] Show Status After Completion",
            description="Continue showing context usage after text generation is complete.",
        )
        # --- Modular UI Status Toggles ---
        show_total_tokens: bool = Field(default=True, title="[UI/UX] Show Total Tokens")
        show_text_image_split: bool = Field(default=True, title="[UI/UX] Show Text/Image Token Split")
        show_context_percentage: bool = Field(default=True, title="[UI/UX] Show Context Percentage")
        show_progress_bar: bool = Field(default=True, title="[UI/UX] Show Progress Bar")
        show_cost: bool = Field(default=True, title="[UI/UX] Show Cost")
        show_cost_breakdown: bool = Field(default=True, title="[UI/UX] Show Cost Breakdown")
        show_budget_info: bool = Field(default=True, title="[UI/UX] Show Budget Info (Daily/Session)")
        show_monthly_budget_info: bool = Field(default=True, title="[UI/UX] Show Monthly Budget Info")
        show_daily_spend_info: bool = Field(default=True, title="[UI/UX] Show Daily Spend Info")
        show_performance_metrics: bool = Field(default=True, title="[UI/UX] Show Performance Metrics")
        show_image_token_warning: bool = Field(default=True, title="[UI/UX] Show Image Token Warning")
        show_trimming_hint: bool = Field(default=True, title="[UI/UX] Show Trimming Hint")
        show_cache_hit_rate: bool = Field(default=False, title="[UI/UX] Show Cache Hit Rate (Debug)")
        show_error_rate: bool = Field(default=False, title="[UI/UX] Show Error Rate") # Removed (Debug) suffix
        show_cost_comparisons: bool = Field(default=False, title="[UI/UX] Show Cost Comparisons")
        show_calibration_status: bool = Field(default=True, title="[UI/UX] Show Calibration Status")
        show_calibration_timestamp: bool = Field(default=True, title="[UI/UX] Show Calibration Timestamp") # Added new valve
        use_enhanced_visuals: bool = Field(
            default=True,
            title="[UI/UX] Use Enhanced Visuals",
            description="Use enhanced visual elements like symbols and better formatting.",
        )
        show_cost_summary: bool = Field(
            default=True,
            title="[UI/UX] Show Cost Summary",
            description="Show cost summary in the metrics display.",
        )
        show_metrics_panel: bool = Field(
            default=True,
            title="[UI/UX] Show Performance Metrics",
            description="Show expanded metrics panel with performance stats.",
        )
        colored_output: bool = Field(
            default=True,
            title="[UI/UX] Colored Terminal Output",
            description="Use ANSI color codes in the terminal output (requires env var).",
        )
        show_cache_metrics: bool = Field(
            default=True,
            title="[UI/UX] Show Cache Metrics",
            description="Show cache hit/miss metrics.", # Removed detailed/debug mention
        )
        show_content_breakdown: bool = Field(
            default=True, # Keep this default, relates to future use
            title="[UI/UX] Show Content Breakdown",
            description="Show breakdown of token usage by content type in status (Future Use).",
        )
        # [Warnings]
        warn_at_percentage: float = Field(
            default=75.0,
            title="[Warnings] Context Warning Threshold (%)",
            description="Percentage of context window at which to show warnings.",
        )
        # [Calibration]
        log_ui_token_counts: bool = Field(
            default=True,
            title="[Calibration] Log UI Token Counts",
            description="If enabled, logs plugin token counts and prompts user to input UI counts for calibration.",
        )
        # [Token Counting Correction Factors]
        model_correction_factors: Dict[str, float] = Field(
            default_factory=lambda: {
                "all-hands/openhands-lm-32b-v0.1": 1.0,
                "anthropic/claude-3.5-haiku": 1.2864,
                "anthropic/claude-3.5-sonnet": 1.2519,
                "anthropic/claude-3.7-sonnet": 1.1926,
                "anthropic/claude-3.7-sonnet:thinking": 1.2788,
                "cognitivecomputations/dolphin-mixtral-8x22b": 1.0,
                "deepseek/deepseek-chat-v3-0324": 1.1554,
                "deepseek/deepseek-chat-v3-0324:free": 1.1538,
                "deepseek/deepseek-r1": 1.0,
                "deepseek/deepseek-r1:free": 1.0,
                "google/gemini-2.0-flash-001": 1.1474,
                "google/gemini-2.0-flash-thinking-exp-1219:free": 1.3637,
                "google/gemini-2.0-flash-thinking-exp:free": 1.3637,
                "google/gemini-2.5-pro-exp-03-25:free": 1.3478,
                "google/gemini-2.5-pro-preview-03-25": 1.3637,
                "google/gemma-3-27b-it": 1.0,
                "google/gemma-3-27b-it:free": 1.0,
                "mistralai/pixtral-large-2411": 1.0,
                "nousresearch/deephermes-3-llama-3-8b-preview:free": 1.0,
                "nvidia/llama-3.1-nemotron-ultra-253b-v1:free": 1.0,
                "nvidia/llama-3.3-nemotron-super-49b-v1:free": 1.0685,
                "openai/chatgpt-4o-latest": 1.2140,
                "openai/gpt-4.5-preview": 1.1994,
                "openai/o1": 1.6884,
                "openai/o1-pro": 1.3998,
                "openai/o3-mini-high": 1.7964,
                "openrouter/quasar-alpha": 1.2492,
                "qwen/qwq-32b": 1.7994,
                "x-ai/grok-3-beta": 1.1121,
                "x-ai/grok-3-mini-beta": 1.7971,
            },
            title="[Token Counting] Model Correction Factors",
            description="Correction factors per model (e.g., {'gemini':1.1}) to adjust token counts.",
        )
        content_correction_factors: Dict[str, float] = Field(
            default_factory=lambda: {
                "plain_short": 1.0,
                "plain_long": 1.0,
                "code": 1.05,
                "json": 1.05,
                "markdown": 1.05,
                "emoji": 1.10,
            },
            title="[Token Counting] Content-Type Correction Factors",
            description="Correction factors per content type (e.g., {'code':1.05, 'emoji':1.1}) to adjust token counts.",
        )
        critical_at_percentage: float = Field(
            default=90.0,
            title="[Warnings] Context Critical Threshold (%)",
            description="Percentage of context window at which to show critical warnings.",
        )
        budget_warning_percentage: float = Field(  # Feature 2: Budget warning threshold
            default=80.0,
            title="[Warnings] Budget Warning Threshold (%)",
            description="Percentage of budget usage at which to show a warning.",
        )
        prompt_cost_warning_threshold: float = Field(
            default=0.005,  # Half a cent
            title="[Warnings] Inlet Prompt Cost Threshold ($)",
            description="Warn via log if estimated input prompt cost exceeds this $ amount (0=disable).",
        )
        enable_cost_optimization_hints: bool = Field(
            default=True,
            title="[Warnings] Enable Cost Optimization Hints",
            description="Show actionable hints for cost optimization in the status line.",
        )
        expensive_model_cost_threshold: float = Field(
            default=0.00001,  # 1 cent per 1K tokens
            title="[Warnings] Expensive Model Threshold ($ per token)",
            description="Threshold for considering a model 'expensive' for cost hint suggestions.",
        )
        # [Cost/Budget]
        budget_amount: float = (
            Field(  # Feature 2: Default budget (can be overridden by UserValves)
                default=100.0,
                title="[Cost/Budget] Budget Amount ($)",
                description="Default budget amount in dollars (per day or session).",
            )
        )
        budget_tracking_mode: str = Field(  # Feature 2: Add budget tracking mode
            default="daily",  # Options: 'daily', 'session'
            title="[Cost/Budget] Budget Tracking Mode",
            description="Track budget usage 'daily' or per 'session'.",
        )
        monthly_budget_amount: float = Field(  # Added monthly budget valve
            default=100.0,
            title="[Cost/Budget] Monthly Budget Amount ($)",
            description="Monthly budget amount in dollars (0 to disable).",
        )
        compensation: float = Field(
            default=1.0,
            title="[Cost/Budget] Cost Compensation Factor",
            description="Compensation factor for cost calculation (e.g., 1.5 for 10% markup).",
        )
        # [Performance]
        metrics_refresh_rate: float = Field(
            default=0.5,
            title="[Performance] Metrics Refresh Rate (s)",
            description="Refresh rate for metrics display in seconds (Not currently used).",
        )
        stream_update_interval: float = Field( # Feature 9: Add Valve
            default=0.2,
            title="[Performance] Stream Update Interval (s)",
            description="Interval in seconds for updating stream status (e.g., tokens/sec).",
        )
        adaptive_rate_averaging: bool = Field(
            default=True,
            title="[Performance] Enable Adaptive Rate Averaging",
            description="Enable adaptive window for rolling token rate calculation.",
        )
        rate_avg_window_min: int = Field(
            default=3,
            title="[Performance] Adaptive Rate Min Window",
            description="Minimum window size (samples) for adaptive rate.",
        )
        rate_avg_window_max: int = Field(
            default=30,
            title="[Performance] Adaptive Rate Max Window",
            description="Maximum window size (samples) for adaptive rate.",
        )
        rate_fast_threshold: float = Field(
            default=150.0,
            title="[Performance] Adaptive Rate Fast Threshold (t/s)",
            description="Tokens/sec threshold to consider 'fast' for adaptive rate window.",
        )
        rate_slow_threshold: float = Field(
            default=10.0,
            title="[Performance] Adaptive Rate Slow Threshold (t/s)",
            description="Tokens/sec threshold to consider 'slow' for adaptive rate window.",
        )
        # [Cache]
        enable_token_cache: bool = Field(
            default=True,
            title="[Cache] Enable Token Cache",
            description="Enable token counting cache for improved performance.",
        )
        token_cache_size: int = Field(
            default=2000,
            title="[Cache] Token Cache Size",
            description="Maximum number of entries in the token cache.",
        )
        enable_model_data_cache: bool = Field(
            default=True,
            title="[Cache] Enable Model Data Cache",
            description="Enable caching of fetched OpenRouter model list to reduce API calls on startup.",
        )
        # [Models]
        # [Token Counting]
        enable_pattern_recognition: bool = Field(
            default=True,
            title="[Token Counting] Enable Pattern Recognition",
            description="Enable detection of repeating patterns for optimized token counting (Future Use).",
        )
        enable_content_detection: bool = Field(
            default=True,
            title="[Token Counting] Enable Content Type Detection",
            description="Enable detection of content types (code, JSON, etc.) for specialized handling.",
        )
        content_specific_counting: bool = Field(
            default=True,
            title="[Token Counting] Content-Specific Counting",
            description="Use specialized counting methods for different content types (Future Use).",
        )
        # [Trimming]
        intelligent_trimming_hint: bool = Field(
            default=True,
            title="[Trimming] Enable Intelligent Trimming Hint",
            description="Show intelligent hints about trimming early messages when context is critical.",
        )
        trimming_hint_message_count: int = Field(
            default=3,
            title="[Trimming] Trimming Hint Message Count",
            description="Number of early messages to analyze for the trimming hint.",
        )
        # [Persistence]
        persist_user_costs: bool = Field(
            default=True,
            title="[Persistence] Persist User Costs",
            description="Save user cost data to disk (data/costs-{year}.json).",
        )
        # [Debug/Log]
        log_level: str = Field(
            default="INFO",
            title="[Debug/Log] Log Level",
            description="Logging level (DEBUG, INFO, WARNING, ERROR).",
        )
        debug: bool = Field(
            default=False,
            title="[Debug/Log] Enable Debug Logging",
            description="Enable debug logging.",
        )
    class UserValves(BaseModel):
        """Per-user configuration options."""
        enabled: bool = Field(
            default=True, description="Enable or disable the function for this user"
        )
        show_status: bool = Field(
            default=True, description="Show status updates for this user"
        )
        warn_at_percentage: float = Field(
            default=75.0, description="Custom warning threshold for this user"
        )
        bar_style: str = Field(
            default="standard",
            description="Visual style for progress bar (standard, minimal, detailed)",
        )
        budget_amount: Optional[float] = Field( # Feature 2: Allow user override
            default=None,
            description="User's budget amount in dollars (overrides global)",
        )
        model_aliases: Dict[str, str] = Field( # Feature 1: Add model aliases
            default_factory=dict,
            description="User-defined aliases for model IDs (e.g., {'o3mh': 'OR.openai/o3-mini-high'})",
        )
        # New: User override correction factors
        model_correction_factors: Dict[str, float] = Field(
            default_factory=dict,
            description="User override correction factors per model (e.g., {'gemini':1.1})",
        )
        content_correction_factors: Dict[str, float] = Field(
            default_factory=dict,
            description="User override correction factors per content type (e.g., {'code':1.05})",
        )
        enable_correction_factors: bool = Field(
            default=True,
            description="Enable applying correction factors (user or global)",
        )
        monthly_budget_amount: Optional[float] = Field(  # Add monthly budget override for users
            default=None,
            description="User's monthly budget amount in dollars (overrides global, 0 to disable)",
        )
    def __init__(self):
        """Initialize the enhanced context counter with model context sizes and pricing data."""
        self.valves = self.Valves()
        # Request tracking for status persistence
        self.current_request_id = None
        # Set debug mode based on valve configuration
        self.debug_mode = self.valves.debug
        # Ensure absolute path for logs directory
        log_dir = os.path.join(os.getcwd(), "logs")
        log_file = os.path.join(log_dir, "context_counter.log")
        # Safely create log directory
        os.makedirs(log_dir, exist_ok=True)
        # Add file handler for persistent logging
        file_handler = logging.FileHandler(log_file)
        file_handler.setFormatter(formatter)
        logger.addHandler(file_handler)
        # Set log level based on config *before* logging init message (Fix 2)
        logger.setLevel(
            logging.DEBUG
            if self.debug_mode
            else getattr(logging, self.valves.log_level, logging.INFO)
        )
        # Log the absolute path to the log file for easy location
        print(f"CONTEXT COUNTER: Log file created at: {os.path.abspath(log_file)}")
        logger.info(
            f"Logging initialized. Log file location: {os.path.abspath(log_file)}"
        )  # Fix 2: Moved after setLevel
        # Initialize context recognition patterns cache for improved performance
        self.language_patterns_cache = {}
        self.content_recognition_patterns = self._initialize_content_patterns()
        # Check for environment variable to enable colored output
        self.valves.colored_output = os.environ.get(
            "OPENWEBUI_USE_COLORS", "false"
        ).lower() in ["true", "1", "yes", "y"]
        logger.info(
            f"ANSI Color support is {'ENABLED' if self.valves.colored_output else 'DISABLED'}"
            )
        try:
            os.makedirs(DATA_DIR)
        except Exception as e:
            logger.warning(f"Could not create data directory: {e}")
        if not os.path.exists(CACHE_DIR):
            try:
                os.makedirs(CACHE_DIR)
            except Exception as e:
                logger.warning(f"Could not create cache directory: {e}")
        # ENHANCEMENT: Always try to load context sizes from OpenRouter API by default
        self.dynamic_contexts = (
            os.environ.get("OPENWEBUI_DYNAMIC_CONTEXTS", "true").lower() == "true"
        )
        self.openrouter_api_key = os.environ.get("OPENROUTER_API_KEY", "")
        # --- Model Data Initialization Order ---
        # 1. Load from Cache (if enabled and valid)
        # 2. Fetch from OpenRouter API (if enabled and cache invalid/disabled) -> Update Cache
        # 3. Load Hardcoded Defaults
        # 4. Load from JSON/MD Export (if exists)
        # 5. Load Custom Models from Valves (Highest Priority)
        # Initialize dictionaries
        self.model_contexts = {}
        self.model_pricing = {}
        self.using_dynamic_pricing = False  # Reset flag
        # 1. & 2. Load from Cache or Fetch from OpenRouter API
        fetched_models = {}
        if self.dynamic_contexts:  # Check if dynamic loading is generally enabled
            cache_valid = False
            if self.valves.enable_model_data_cache:
                fetched_models, cache_valid = self._load_models_from_cache()
            if not cache_valid:
                fetched_models = self._fetch_openrouter_model_list()
                if fetched_models and self.valves.enable_model_data_cache:
                    self._save_models_to_cache(
                        fetched_models
                    )  # Update cache if fetch successful
            # Merge fetched data using setdefault (lowest priority)
            for model_id, data in fetched_models.items():
                if data.get("context_length") is not None:
                    # Ensure context length is an integer
                    try:
                        context_len = int(data["context_length"])
                        if context_len > 0:
                            self.model_contexts.setdefault(model_id, context_len)
                    except (ValueError, TypeError):
                        logger.warning(
                            f"Invalid context length '{data['context_length']}' for model {model_id} from API/Cache."
                        )
                if data.get("pricing") and (
                    data["pricing"]["input"] > 0 or data["pricing"]["output"] > 0
                ):
                    self.model_pricing.setdefault(model_id, data["pricing"])
                    # Set flag only if API provided actual pricing
                    if not self.using_dynamic_pricing and (
                        data["pricing"]["input"] > 0 or data["pricing"]["output"] > 0
                    ):
                        self.using_dynamic_pricing = True
        # 3. Load Hardcoded Defaults (Medium priority - fills gaps)
        # HARDCODED CONSTANTS FOR MAJOR MODELS
        # Force correct context sizes for known models
        self.GPT4O_CONTEXT_SIZE = 128000
        self.GEMINI_FLASH_CONTEXT_SIZE = 1000000  # Corrected to exactly 1M tokens
        self.GEMINI_PRO_CONTEXT_SIZE = 256000
        self.CLAUDE3_OPUS_CONTEXT_SIZE = 200000
        # Initialize fallback for unrecognized models (no longer using default_context_size from valves)
        self.fallback_context_size = 4096  # Fallback if everything else fails
        # Hardcoded model context windows (will be supplemented/overridden)
        hardcoded_contexts = {
            "all-hands/openhands-lm-32b-v0.1": 32768, # Updated from API
            "anthropic/claude-3.5-haiku": 200000,
            "anthropic/claude-3.5-sonnet": 200000,
            "anthropic/claude-3.7-sonnet": 200000,
            "anthropic/claude-3.7-sonnet:thinking": 200000,
            "cognitivecomputations/dolphin-mixtral-8x22b": 65536, # Updated from API
            "deepseek/deepseek-chat-v3-0324": 163840, # Updated from API
            "deepseek/deepseek-chat-v3-0324:free": 163840,
            "deepseek/deepseek-r1": 163840,
            "deepseek/deepseek-r1:free": 163840,
            "google/gemini-2.0-flash-001": 1048576, # Updated from API
            "google/gemini-2.0-flash-thinking-exp-1219:free": 1048576, # Updated from API
            "google/gemini-2.0-flash-thinking-exp:free": 1048576,
            "google/gemini-2.5-pro-exp-03-25:free": 1048576, # Updated from API
            "google/gemini-2.5-pro-preview-03-25": 1048576, # Updated from API
            "google/gemma-3-27b-it": 131072,
            "google/gemma-3-27b-it:free": 131072, # Updated from API
            "mistralai/pixtral-large-2411": 131072,
            "nousresearch/deephermes-3-llama-3-8b-preview:free": 131072,
            "nvidia/llama-3.1-nemotron-ultra-253b-v1:free": 131072,
            "nvidia/llama-3.3-nemotron-super-49b-v1:free": 131072,
            "openai/chatgpt-4o-latest": 128000,
            "openai/gpt-4.1": 1048576, # Updated from API
            "openai/gpt-4.1-mini": 1048576, # Updated from API
            "openai/gpt-4.1-nano": 1048576, # Updated from API
            "openai/gpt-4.5-preview": 128000,
            "openai/o1": 200000,
            "openai/o1-pro": 200000,
            "openai/o3-mini-high": 200000,
            "qwen/qwq-32b": 131072,
            "x-ai/grok-3-beta": 131072,
            "x-ai/grok-3-mini-beta": 131072,
        }
        # Merge hardcoded contexts using setdefault
        for model_id, context in hardcoded_contexts.items():
            self.model_contexts.setdefault(model_id, context)
        # Base model pricing data as fallback (will be supplemented/overridden)
        # Prices are per token (API prices are per million, converted here)
        hardcoded_pricing = {
            "all-hands/openhands-lm-32b-v0.1": {"input": 9.0e-07, "output": 9.0e-07}, # Updated from API
            "anthropic/claude-3.5-haiku": {"input": 2.5e-07, "output": 1.25e-06}, # Updated from API
            "anthropic/claude-3.5-sonnet": {"input": 3.0e-06, "output": 1.5e-05}, # Updated from API
            "anthropic/claude-3.7-sonnet": {"input": 3.0e-06, "output": 1.5e-05}, # Updated from API
            "anthropic/claude-3.7-sonnet:thinking": {"input": 3.0e-06, "output": 1.5e-05}, # Updated from API
            "cognitivecomputations/dolphin-mixtral-8x22b": {"input": 6.5e-07, "output": 6.5e-07}, # Updated from API
            "deepseek/deepseek-chat-v3-0324": {"input": 1.4e-07, "output": 2.8e-07}, # Updated from API
            "deepseek/deepseek-chat-v3-0324:free": {"input": 0.0, "output": 0.0},
            "deepseek/deepseek-r1": {"input": 1.4e-07, "output": 2.8e-07}, # Updated from API
            "deepseek/deepseek-r1:free": {"input": 0.0, "output": 0.0},
            "google/gemini-2.0-flash-001": {"input": 1.25e-07, "output": 3.75e-07}, # Updated from API
            "google/gemini-2.0-flash-thinking-exp-1219:free": {"input": 0.0, "output": 0.0},
            "google/gemini-2.0-flash-thinking-exp:free": {"input": 0.0, "output": 0.0},
            "google/gemini-2.5-pro-exp-03-25:free": {"input": 0.0, "output": 0.0},
            "google/gemini-2.5-pro-preview-03-25": {"input": 5.0e-07, "output": 1.5e-06}, # Updated from API
            "google/gemma-3-27b-it": {"input": 1e-06, "output": 1e-06},
            "google/gemma-3-27b-it:free": {"input": 0.0, "output": 0.0},
            "mistralai/pixtral-large-2411": {"input": 1e-06, "output": 1e-06},
            "nousresearch/deephermes-3-llama-3-8b-preview:free": {"input": 0.0, "output": 0.0},
            "nvidia/llama-3.1-nemotron-ultra-253b-v1:free": {"input": 0.0, "output": 0.0},
            "nvidia/llama-3.3-nemotron-super-49b-v1:free": {"input": 0.0, "output": 0.0},
            "openai/chatgpt-4o-latest": {"input": 5.0e-06, "output": 1.5e-05}, # Updated from API
            "openai/gpt-4.1": {"input": 1e-06, "output": 3.0e-06}, # Updated from API
            "openai/gpt-4.1-mini": {"input": 5.0e-07, "output": 1.5e-06}, # Updated from API
            "openai/gpt-4.1-nano": {"input": 1.0e-07, "output": 3.0e-07}, # Updated from API
            "openai/gpt-4.5-preview": {"input": 6.0e-06, "output": 1.2e-05}, # Updated from API
            "openai/o1": {"input": 5.0e-06, "output": 1.5e-05}, # Updated from API
            "openai/o1-pro": {"input": 1.0e-05, "output": 3.0e-05}, # Updated from API
            "openai/o3-mini-high": {"input": 2.0e-07, "output": 6.0e-07}, # Updated from API
            "qwen/qwq-32b": {"input": 5.0e-07, "output": 5.0e-07}, # Updated from API
            "x-ai/grok-3-beta": {"input": 1e-06, "output": 1e-06},
            "x-ai/grok-3-mini-beta": {"input": 1.0e-07, "output": 1.0e-07}, # Updated from API
        }
        # Merge hardcoded pricing using setdefault
        for model_id, pricing in hardcoded_pricing.items():
            self.model_pricing.setdefault(model_id, pricing)
        # 4. Load from JSON/MD Export (Higher Priority - Overwrites API/Hardcoded)
        try:
            # Construct path relative to the script's directory if possible, or use absolute
            # Assuming script runs from within openwebui-context-counter directory
            base_dir = (
                os.path.dirname(os.path.abspath(__file__))
                if "__file__" in locals()
                else os.getcwd()
            )
            memory_bank_dir = os.path.join(base_dir, "memory-bank")
            # Look for export files (handle potential naming variations)
            export_file_pattern = re.compile(r"models-export-.*\.json")
            export_md_pattern = re.compile(r"models-export-.*\.md")
            json_export_path = None
            md_export_path = None
            if os.path.exists(memory_bank_dir):
                for filename in os.listdir(memory_bank_dir):
                    if export_file_pattern.match(filename):
                        json_export_path = os.path.join(memory_bank_dir, filename)
                        break  # Use the first one found
                    elif export_md_pattern.match(filename):
                        md_export_path = os.path.join(memory_bank_dir, filename)
                        # Don't break, prefer JSON if both exist
            if json_export_path:
                logger.info(f"Found model export file at {json_export_path}")
                self.load_models_from_json_export(json_export_path)
                logger.info(
                    f"Successfully loaded/overwrote model data from JSON export file"
                )
            elif md_export_path:  # Only use MD if JSON wasn't found
                logger.info(f"Found markdown model export file at {md_export_path}")
                self.load_models_from_json_export(md_export_path, is_markdown=True)
                logger.info(
                    f"Successfully loaded/overwrote model data from markdown export file"
                )
            else:
                logger.info("No model export file found in memory-bank directory.")
        except Exception as e:
            logger.error(f"Error loading models from export: {str(e)}")
        # 5. Load Custom Models from Valves (Highest Priority - Overwrites All Previous)
        logger.info("Model data initialization complete.")
        logger.debug(f"Final unique model contexts loaded: {len(self.model_contexts)}")
        logger.debug(f"Final unique model pricings loaded: {len(self.model_pricing)}")
        # Parse plaintext definitions if provided (one per line)
        if self.valves.custom_models_plaintext:
            # Log the actual value received from the valve at INFO level for visibility
            logger.info(f"Parsing custom_models_plaintext valve content: {self.valves.custom_models_plaintext!r}")
            logger.debug(f"Attempting to parse custom_models_plaintext: {self.valves.custom_models_plaintext!r}") # Log the raw valve value
            for line_num, line in enumerate(self.valves.custom_models_plaintext.strip().splitlines()):
                line = line.strip()
                if not line or line.startswith('#'): # Skip empty lines and comments
                    continue
                logger.debug(f"Processing plaintext line {line_num + 1}: '{line}'")
                # Use split() and filter empty strings for more robust parsing
                parts = [part for part in line.split() if part]
                logger.debug(f"  Split parts (filtered): {parts}")
                if len(parts) >= 4:
                    id_, ctx_str, inp_str, outp_str = parts[0], parts[1], parts[2], parts[3]
                    logger.debug(f"  Extracted: id='{id_}', ctx='{ctx_str}', inp='{inp_str}', outp='{outp_str}'")
                    try:
                        ctx_i = int(ctx_str)
                        inp_f = float(inp_str)
                        outp_f = float(outp_str)
                        # Add directly to the main dictionaries, overwriting if needed
                        self.model_contexts[id_] = ctx_i
                        self.model_pricing[id_] = {"input": inp_f, "output": outp_f}
                        logger.debug(f"  Successfully applied custom model from plaintext: {id_}")
                    except (ValueError, TypeError) as e:
                        # Log conversion errors as ERROR for better visibility
                        logger.error(f"  ERROR parsing custom model definition line {line_num + 1}: '{line}' - Error: {e}")
                else:
                    logger.warning(f"  Skipping invalid custom model line {line_num + 1} (expected 4+ parts): '{line}'")
        # REMOVED: Logic for merging structured custom_models list, as the valve was removed.
        # for custom in self.valves.custom_models:
        #     if custom.id not in self.model_contexts: # Avoid overwriting plaintext entries
        #          self.model_contexts[custom.id] = custom.context_length
        #          self.model_pricing[custom.id] = {"input": custom.pricing.input, "output": custom.pricing.output}
        #          # logger.debug(f"Applied custom model from structured list: {custom.id}") # Logic removed
        # logger.info(f"Applied custom models: {len(self.valves.custom_models)} from structured list, potentially more from plaintext.") # Logic removed
        logger.info(f"Applied custom models from plaintext valve.") # Log message remains relevant
        # Tokenizer cache
        self.encoders = {}
        # Token pattern recognition components
        self.token_cache = {}
        self.token_cache_stats = {"hits": 0, "misses": 0, "prunes": 0}
        self.pattern_frequency = {}
        self.content_type_stats = {}
        self.token_cache_last_prune = time.time()  # Track when we last pruned the cache
        self.token_cache_entry_times = (
            {}
        )  # Track when each cache entry was added/accessed
        # Per-message tracking
        self.message_tokens = {}
        # Cost tracking
        self.user_cost_file = USER_COST_FILE
        self._ensure_cost_file_exists()
        self.daily_cost_file = DAILY_COST_FILE  # Feature 2: Daily cost file path
        self._ensure_daily_cost_file_exists()  # Feature 2: Ensure daily cost file exists
        # Stream token tracking for dynamic rate calculation
        self.stream_token_counter = 0
        self.stream_start_time = None
        self.current_token_rate = 0.0
        self.last_stream_update = 0.0
        # Adaptive rate deque - use max window size initially
        self.stream_history = deque(maxlen=self.valves.rate_avg_window_max)
        # self.stream_update_interval = 0.5  # Feature 9: Removed hardcoded value, use valve instead
        # self.last_event_emitter = None  # Removed - incompatible
        # Timing info
        self.start_time = None
        # Session stats
        self.session_stats = {
            "total_tokens": 0,
            "input_tokens": 0,
            "output_tokens": 0,
            "requests": 0,
            "avg_tokens_per_req": 0,
            "avg_tokens_per_sec": 0,
            "success_rate": 100.0,
            "peak_concurrency": 1,
            "total_cost": 0.0,
            "budget_remaining": self.valves.budget_amount,
            "error_count": 0,  # Feature 6: Add error counter
            "session_cost": 0.0,  # Feature 2: Add session cost tracking
            "daily_cost": 0.0,  # Feature 2: Add daily cost tracking (loaded later)
            # Enhanced performance tracking
            "tokens_added_last_cycle": 0,
            "tokens_removed_last_cycle": 0,
            "last_update_time": time.time(),
            "message_generation_time": 0.0,
            "prev_total_tokens": 0,
        }
        # self.using_dynamic_pricing flag is set during data loading
        # Load initial calibration status
        self.calibration_status_display = self._load_calibration_status()
        # Initialize request counter
        self.request_counter = 0
        # Initial parsing attempt during initialization
        self._parse_and_apply_plaintext_models()
        # DEBUG: Log final pricing dictionary after initial loading/parsing in __init__
        logger.debug(f"Model pricing after __init__: {self.model_pricing}")
    def _parse_and_apply_plaintext_models(self):
        """Parses the plaintext custom models valve and updates pricing/context dictionaries.
        
        This method is a critical part of the model detection and pricing system, replacing
        the previous structured approach that used Pydantic models. It provides several key benefits:
        
        1. Robustness: Simple string parsing is more resilient to serialization/deserialization issues
           that were causing persistent Pydantic validation errors with the previous approach.
        
        2. User-Friendly: Allows users to define custom models in a simple text format:
           <MODEL_ID> <CONTEXT_SIZE> <INPUT_COST_PER_TOKEN> <OUTPUT_COST_PER_TOKEN>
           Example: openai/o4-mini-high 200000 0.0000011 0.0000044
        
        3. Flexibility: Supports comments (lines starting with #) and handles whitespace variations.
        
        4. Priority: Custom models defined here take highest priority, overriding any values from
           other sources (API, hardcoded defaults, exports).
        
        5. Real-time Updates: This method is called both during initialization and in the outlet
           method to ensure the latest custom model definitions are always applied before processing.
        
        The method parses each non-empty, non-comment line from the valve content, extracts the
        model ID, context size, and pricing information, converts them to the appropriate types,
        and adds them directly to the main model_contexts and model_pricing dictionaries.
        """
        if self.valves.custom_models_plaintext:
            # Log the actual value received from the valve at INFO level for visibility
            logger.info(f"Parsing custom_models_plaintext valve content: {self.valves.custom_models_plaintext!r}")
            logger.debug(f"Attempting to parse custom_models_plaintext: {self.valves.custom_models_plaintext!r}") # Log the raw valve value
            for line_num, line in enumerate(self.valves.custom_models_plaintext.strip().splitlines()):
                line = line.strip()
                if not line or line.startswith('#'): # Skip empty lines and comments
                    continue
                logger.debug(f"Processing plaintext line {line_num + 1}: '{line}'")
                parts = re.split(r'\s+', line)
                logger.debug(f"  Split parts: {parts}")
                if len(parts) >= 4:
                    id_, ctx_str, inp_str, outp_str = parts[0], parts[1], parts[2], parts[3]
                    logger.debug(f"  Extracted: id='{id_}', ctx='{ctx_str}', inp='{inp_str}', outp='{outp_str}'")
                    try:
                        ctx_i = int(ctx_str)
                        inp_f = float(inp_str)
                        outp_f = float(outp_str)
                        # Add directly to the main dictionaries, overwriting if needed
                        self.model_contexts[id_] = ctx_i
                        self.model_pricing[id_] = {"input": inp_f, "output": outp_f}
                        logger.debug(f"  Successfully applied custom model from plaintext: {id_}")
                    except (ValueError, TypeError) as e:
                        # Log conversion errors as ERROR for better visibility
                        logger.error(f"  ERROR parsing custom model definition line {line_num + 1}: '{line}' - Error: {e}")
                else:
                    logger.warning(f"  Skipping invalid custom model line {line_num + 1} (expected 4+ parts): '{line}'")
        else:
             logger.debug("custom_models_plaintext valve is empty, skipping parsing.")
    # --- New Methods for Dynamic Loading & Caching ---
    def _load_calibration_status(self) -> str:
        """Loads the calibration status string from the status file."""
        status_file = os.path.join(DATA_DIR, "calibration_status.json")
        default_status = "Not Calibrated" # Changed default from "Unknown"
        try:
            if os.path.exists(status_file):
                with open(status_file, "r", encoding="utf-8") as f:
                    status_data = json.load(f)
                    # Basic validation
                    if isinstance(status_data, dict) and "status_string" in status_data:
                        loaded_status = status_data["status_string"]
                        # Add timestamp info if available
                        timestamp = status_data.get("analysis_timestamp")
                        if timestamp:
                             try:
                                 # Format timestamp nicely
                                 dt_obj = datetime.fromisoformat(timestamp)
                                 # Example: " (as of Apr 16 14:30)"
                                 formatted_ts = dt_obj.strftime(" (as of %b %d %H:%M)")
                                 loaded_status += formatted_ts
                             except ValueError:
                                 pass  # Ignore invalid timestamp format
                        logger.info(f"Loaded calibration status: {loaded_status}")
                        return loaded_status
                    else:
                        logger.warning(f"Invalid format in {status_file}. Using default status.")
                        return default_status
            else:
                logger.info(f"Calibration status file not found ({status_file}). Using default status.")
                return default_status
        except (IOError, json.JSONDecodeError) as e:
            logger.error(f"Error loading calibration status from {status_file}: {e}")
            return default_status
        except Exception as e:  # Catch any other unexpected errors
             logger.error(f"Unexpected error loading calibration status: {e}")
             return default_status
    def _load_models_from_cache(self) -> Tuple[Dict, bool]:
        """Loads model data from cache file if valid."""
        if not os.path.exists(MODEL_CACHE_FILE):
            logger.info("Model cache file not found.")
            return {}, False
        try:
            with open(MODEL_CACHE_FILE, "r", encoding="utf-8") as f:
                cache_data = json.load(f)
            cache_timestamp_str = cache_data.get("timestamp")
            if not cache_timestamp_str:
                logger.warning("Model cache file missing timestamp.")
                return {}, False
            cache_timestamp = datetime.fromisoformat(cache_timestamp_str)
            if datetime.now() - cache_timestamp > timedelta(
                seconds=MODEL_CACHE_TTL_SECONDS
            ):
                logger.info("Model cache file is expired.")
                return {}, False
            logger.info(
                f"Loading model data from valid cache file (Timestamp: {cache_timestamp_str})."
            )
            return cache_data.get("models", {}), True
        except (json.JSONDecodeError, OSError, ValueError) as e:
            logger.error(f"Error loading model cache file: {e}")
            return {}, False
    def _save_models_to_cache(self, models_data: Dict):
        """Saves fetched model data to the cache file."""
        try:
            os.makedirs(os.path.dirname(MODEL_CACHE_FILE), exist_ok=True)
            cache_content = {
                "timestamp": datetime.now().isoformat(),
                "models": models_data,
            }
            with open(MODEL_CACHE_FILE, "w", encoding="utf-8") as f:
                json.dump(cache_content, f)
            logger.info(f"Saved fetched model data to cache file: {MODEL_CACHE_FILE}")
        except (OSError, TypeError) as e:
            logger.error(f"Error saving model data to cache: {e}")
    def _fetch_openrouter_model_list(self) -> Dict:
        """Fetches the model list from OpenRouter API."""
        logger.info("Attempting to fetch model list from OpenRouter API...")
        api_url = "https://openrouter.ai/api/v1/models"
        fetched_models = {}
        try:
            req = urllib.request.Request(
                api_url,
                headers={
                    "HTTP-Referer": "https://github.com/open-webui/open-webui",
                    "X-Title": "Context Counter Function",
                },
            )
            # Increased timeout for potentially slow API
            with urllib.request.urlopen(req, timeout=15) as response:
                if response.status == 200:
                    data = json.loads(response.read().decode("utf-8"))
                    models_data = data.get("data", [])
                    for model in models_data:
                        model_id = model.get("id")
                        if not model_id:
                            continue
                        context = model.get("context_length")
                        pricing = model.get("pricing", {})
                        # Handle potential None or non-string values gracefully
                        input_price_str = str(pricing.get("input", "0") or "0")
                        output_price_str = str(pricing.get("output", "0") or "0")
                        try:
                            # Prices are per million tokens, convert to per token
                            input_per_token = float(input_price_str) / 1_000_000
                            output_per_token = float(output_price_str) / 1_000_000
                        except ValueError:
                            logger.warning(
                                f"Could not parse pricing for {model_id}: input='{input_price_str}', output='{output_price_str}'"
                            )
                            input_per_token = 0.0
                            output_per_token = 0.0
                        fetched_models[model_id] = {
                            "context_length": context,
                            "pricing": {
                                "input": input_per_token,
                                "output": output_per_token,
                            },
                        }
                    logger.info(
                        f"Successfully fetched data for {len(fetched_models)} models from OpenRouter."
                    )
                    return fetched_models
                else:
                    logger.error(
                        f"Failed to fetch model list from OpenRouter. Status: {response.status}"
                    )
                    return {}
        except urllib.error.URLError as e:
            logger.error(f"Network error fetching OpenRouter model list: {e}")
            return {}
        except TimeoutError:
            logger.error("Timeout error fetching OpenRouter model list.")
            return {}
        except json.JSONDecodeError as e:
            logger.error(
                f"Error decoding JSON response from OpenRouter model list: {e}"
            )
            return {}
        except Exception as e:
            logger.error(f"Unexpected error fetching OpenRouter model list: {e}")
            return {}
    # --- End New Methods ---
    def _initialize_content_patterns(self):
        """Initialize content recognition patterns for different content types.
        This function builds pattern dictionaries for detecting various content types
        like code, JSON, tables, mathematical formulas, etc. These patterns are used
        throughout the token counting and content type detection system.
        Returns:
            Dictionary mapping content types to pattern dictionaries
        """
        # Directly define basic patterns as the enhanced module is not used/available.
        logger.debug("Initializing basic content patterns.")
        patterns = {
            # Programming language patterns
            "code": {
                # Function/method definitions
                "function_defs": [
                    r"(function|def|func|fn|sub|method|procedure)\s+[\w_]+ *\(",  # Basic function def
                    r"(public|private|protected|static|async)\s+[\w_]+\s+[\w_]+ *\(",  # OOP methods
                    r"^\s*class\s+[\w_]+",  # Class definitions
                ],
                # Variable declarations
                "variable_defs": [
                    r"(var|let|const|int|float|double|string|bool)\s+[\w_]+\s*=",
                ],
            },
            # JSON patterns
            "json": {
                "structure": [
                    r'^\s*\{\s*"[\w_]+"\s*:',  # Object start
                    r'^\s*\[\s*\{\s*"[\w_]+"\s*:',  # Array of objects start
                ],
            },
            # Markdown table patterns
            "table": {
                "headers": r"\|[\s-:]*\|[\s-:]*\|",  # Table header/divider row
                "structure": r"(\|[^\|]+)+\|",  # Basic row structure
            },
            # Markdown list patterns
            "list": {
                "unordered": r"(\n\s*[-*+ā¢]\s+[^\n]+)+",  # Unordered lists with different markers
                "ordered": r"(\n\s*\d+[.)\]]\s+[^\n]+)+",  # Numbered lists
            },
        }
        return patterns
    def _ensure_cost_file_exists(self):
        """Ensure that the cost tracking file exists."""
        if not os.path.exists(self.user_cost_file):
            try:
                # Create parent directories if they don't exist
                os.makedirs(os.path.dirname(self.user_cost_file), exist_ok=True)
                # Create empty cost file with empty JSON object
                with open(self.user_cost_file, "w", encoding="UTF-8") as f:
                    json.dump({}, f)
            except Exception as e:
                logger.error(f"Failed to create user cost file: {str(e)}")
    # Feature 2: Add method to ensure daily cost file exists
    def _ensure_daily_cost_file_exists(self):
        """Ensure the daily cost tracking file exists and initialize if needed."""
        if not os.path.exists(self.daily_cost_file):
            try:
                os.makedirs(os.path.dirname(self.daily_cost_file), exist_ok=True)
                with open(self.daily_cost_file, "w", encoding="UTF-8") as f:
                    # Initialize with today's date and zero cost
                    today_str = date.today().isoformat()
                    json.dump({today_str: 0.0}, f)
            except Exception as e:
                logger.error(f"Failed to create daily cost file: {str(e)}")
    def _load_and_update_daily_cost(self, cost_to_add: float) -> float:
        """Load today's cost, add the new cost, save with locking, and return the updated total."""
        today_str = date.today().isoformat()
        daily_costs = {}
        updated_today_cost = 0.0
        f = None  # Initialize f to None
        lock_acquired = False  # Initialize lock status
        try:
            f = open(self.daily_cost_file, "a+", encoding="UTF-8")
            f.seek(0)  # Move to beginning for reading
            # --- Acquire lock (OS-dependent) ---
            if os.name == "posix" and fcntl:
                try:
                    fcntl.flock(f, LOCK_EX | LOCK_NB)  # Try non-blocking exclusive lock
                    lock_acquired = True
                except IOError:
                    logger.warning("Daily cost file (POSIX) is locked, waiting...")
                    fcntl.flock(f, LOCK_EX)  # Fallback to blocking lock
                    lock_acquired = True
            elif os.name == "nt" and msvcrt:
                # Windows locking: Lock the entire file exclusively
                try:
                    msvcrt.locking(
                        f.fileno(), msvcrt.LK_NBLCK, 1
                    )  # Try non-blocking exclusive lock
                    lock_acquired = True
                except IOError:
                    logger.warning(
                        "Daily cost file (Windows) is locked, retrying with blocking lock..."
                    )
                    time.sleep(0.1)  # Simple delay before retry
                    try:
                        msvcrt.locking(
                            f.fileno(), msvcrt.LK_LOCK, 1
                        )  # Blocking lock on retry
                        lock_acquired = True
                    except IOError as e_lock:
                        logger.error(
                            f"Could not acquire lock on daily cost file (Windows) after retry: {e_lock}"
                        )
                        lock_acquired = False
                        logger.error("Proceeding without lock due to acquisition failure.")
            else:
                logger.warning(
                    "File locking not supported on this OS. Daily cost updates may be unreliable under concurrency."
                )
                lock_acquired = True
            # --- Perform Read/Update/Write ONLY if lock is held or not needed/supported ---
            if lock_acquired:
                try:
                    content = f.read()
                    if content:
                        daily_costs = json.loads(content)
                    else:
                        daily_costs = {}
                except json.JSONDecodeError:
                    logger.error("Failed to parse daily cost file, resetting.")
                    daily_costs = {}
                today_cost = daily_costs.get(today_str, 0.0)
                updated_today_cost = today_cost + cost_to_add
                daily_costs = {today_str: updated_today_cost}
                f.seek(0)
                f.truncate()
                json.dump(daily_costs, f)
                f.flush()
                os.fsync(f.fileno())
                logger.debug(f"Updated daily cost for {today_str}: Added ${cost_to_add:.6f}, New Total: ${updated_today_cost:.6f}")
                return updated_today_cost
            else:
                logger.error("Failed to acquire lock on daily cost file. Update skipped.")
                try:
                    f.seek(0)
                    content = f.read()
                    if content:
                        daily_costs = json.loads(content)
                    else:
                        daily_costs = {}
                except Exception:
                    daily_costs = {}
                return daily_costs.get(today_str, 0.0)
        except IOError as e:
            logger.error(f"File lock/IO error for daily costs: {e}")
            return daily_costs.get(today_str, 0.0)
        except Exception as e:
            logger.error(f"Failed to update daily cost: {str(e)}")
            return daily_costs.get(today_str, 0.0)
        finally:
            if f:
                if lock_acquired:
                    if os.name == "posix" and fcntl:
                        fcntl.flock(f, LOCK_UN)
                    elif os.name == "nt" and msvcrt:
                        try:
                            msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1)
                        except IOError:
                            pass
                f.close()
    # --- Monthly Cost Tracking ---
    def _load_and_update_monthly_cost(self, cost_to_add: float) -> Dict[str, float]:
        """Load this month's cost, add the new cost, save with locking, and return month info."""
        current_month_str = date.today().strftime("%Y-%m")
        # Define monthly cost file path dynamically based on current month
        monthly_cost_file = os.path.join(DATA_DIR, f"monthly_costs_{current_month_str}.json")
        monthly_costs = {}
        updated_month_cost = 0.0
        f = None
        lock_acquired = False
        try:
            # Ensure data directory exists
            os.makedirs(os.path.dirname(monthly_cost_file), exist_ok=True)
            # Check if the file exists for the *current* month. If not, it's a new month.
            is_new_month = not os.path.exists(monthly_cost_file)
            # Open file (create if doesn't exist)
            f = open(monthly_cost_file, "a+", encoding="UTF-8")
            f.seek(0)  # Move to beginning for reading
            # --- Acquire lock (OS-dependent) ---
            if os.name == "posix" and fcntl:
                try:
                    fcntl.flock(f, LOCK_EX | LOCK_NB)  # Try non-blocking exclusive lock
                    lock_acquired = True
                except IOError:
                    logger.warning("Monthly cost file (POSIX) is locked, waiting...")
                    fcntl.flock(f, LOCK_EX)  # Fallback to blocking lock
                    lock_acquired = True
            elif os.name == "nt" and msvcrt:
                # Windows locking: Lock the entire file exclusively
                try:
                    msvcrt.locking(f.fileno(), msvcrt.LK_NBLCK, 1)  # Try non-blocking exclusive lock
                    lock_acquired = True
                except IOError:
                    logger.warning("Monthly cost file (Windows) is locked, retrying...")
                    time.sleep(0.1)  # Simple delay before retry
                    try:
                        msvcrt.locking(f.fileno(), msvcrt.LK_LOCK, 1)  # Blocking lock on retry
                        lock_acquired = True
                    except IOError as e_lock:
                        logger.error(f"Could not acquire lock on monthly cost file (Windows): {e_lock}")
                        lock_acquired = False  # Indicate failure
            else:
                logger.warning("File locking not supported on this OS for monthly costs.")
                lock_acquired = True  # Proceed assuming no lock needed
            # --- Perform Read/Update/Write ONLY if lock is held or assumed ---
            if lock_acquired:
                try:
                    content = f.read()
                    if content and not is_new_month:  # Only load if file existed and wasn't empty
                        monthly_costs = json.loads(content)
                        # Get the cost for the current month if it exists in the file
                        updated_month_cost = monthly_costs.get(current_month_str, 0.0)
                    else:  # New month or empty file
                         monthly_costs = {}
                         updated_month_cost = 0.0  # Start fresh for the new month
                         logger.info(f"Starting new monthly cost tracking for {current_month_str}.")
                except json.JSONDecodeError:
                    logger.error("Failed to parse monthly cost file, resetting for the month.")
                    monthly_costs = {}
                    updated_month_cost = 0.0
                # Add the new cost
                updated_month_cost += cost_to_add
                monthly_costs[current_month_str] = updated_month_cost # Store/update with current month key
                # Write back the updated data for the current month
                f.seek(0)
                f.truncate()
                json.dump(monthly_costs, f) # Save the dictionary containing the current month's cost
                f.flush()
                os.fsync(f.fileno()) # Ensure data is written to disk
                logger.debug(f"Updated monthly cost for {current_month_str}: Added ${cost_to_add:.6f}, New Total: ${updated_month_cost:.6f}")
                # Determine the applicable monthly budget (user override or global valve)
                # Note: __user__ is not available here, so we rely on the global valve for now.
                # The check against user override needs to happen in the outlet method.
                monthly_budget = self.valves.monthly_budget_amount  # Use global valve here
                remaining_monthly = monthly_budget - updated_month_cost if monthly_budget > 0 else 0  # Calculate remaining only if budget > 0
                used_monthly_percent = (updated_month_cost / monthly_budget * 100) if monthly_budget > 0 else 0
                return {
                    "current_month_cost": updated_month_cost,  # Total cost accumulated this month
                    "remaining_monthly": remaining_monthly,
                    "used_monthly_percent": used_monthly_percent,
                    "monthly_budget": monthly_budget  # The budget amount used for calculation
                }
            else:
                 logger.error("Failed to acquire lock on monthly cost file. Update skipped.")
                 # Attempt to read current value without lock if possible, otherwise return 0
                 current_cost = 0.0  # Default if read fails
                 try:
                     f.seek(0)
                     content = f.read()
                     if content:
                         monthly_costs = json.loads(content)
                         current_cost = monthly_costs.get(current_month_str, 0.0)
                 except Exception:
                     pass  # Ignore read errors if lock failed
                 # Return defaults or last known state if possible, using global budget as fallback
                 fallback_budget = self.valves.monthly_budget_amount
                 return {"current_month_cost": current_cost, "remaining_monthly": fallback_budget - current_cost if fallback_budget > 0 else 0, "used_monthly_percent": (current_cost / fallback_budget * 100) if fallback_budget > 0 else 0, "monthly_budget": fallback_budget}
        except Exception as e:
            logger.error(f"Failed to update monthly cost: {str(e)}")
            # Return defaults in case of error, using global budget
            fallback_budget = self.valves.monthly_budget_amount
            return {"current_month_cost": 0.0, "remaining_monthly": fallback_budget, "used_monthly_percent": 0.0, "monthly_budget": fallback_budget}
        finally:
            if f:
                if lock_acquired:
                    if os.name == "posix" and fcntl:
                        fcntl.flock(f, LOCK_UN)
                    elif os.name == "nt" and msvcrt:
                        try:
                            msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1)
                        except IOError: pass  # Ignore errors on unlock
                f.close()
    def _update_user_cost(
        self,
        user_email: str,
        model: str,
        input_tokens: int,
        output_tokens: int,
        total_cost: float,
    ):
        """Update the user's cost data in the cost file."""
        if not self.valves.persist_user_costs:
            return
        if not user_email:
            logger.warning("User email not provided for cost tracking")
            return
        try:
            # Read existing cost data
            costs = {}
            if os.path.exists(self.user_cost_file):
                with open(self.user_cost_file, "r", encoding="UTF-8") as f:
                    try:
                        costs = json.load(f)
                    except json.JSONDecodeError:
                        logger.error("Failed to parse user cost file, creating new one")
                        costs = {}
            # Ensure user entry exists
            if user_email not in costs:
                costs[user_email] = []
            # Add new cost entry
            costs[user_email].append(
                {
                    "model": model,
                    "timestamp": time.strftime("%Y-%m-%dT%H:%M:%S"),
                    "input_tokens": input_tokens,
                    "output_tokens": output_tokens,
                    "total_cost": str(total_cost),
                }
            )
            # Write updated cost data
            with open(self.user_cost_file, "w", encoding="UTF-8") as f:
                json.dump(costs, f, indent=4)
        except Exception as e:
            logger.error(f"Failed to update user cost: {str(e)}")
    def normalize_model_name(self, model_id: str) -> str:
        """Normalize model names to standard format for better detection.
        This function standardizes model identifiers by:
        1. Converting to lowercase
        2. Removing vendor prefixes (openai/, anthropic/, google/, etc.)
        3. Standardizing separators (replacing spaces, underscores with hyphens)
        4. Removing version qualifiers (like :free, :beta)
        5. Removing suffixes like -tuned
        Examples:
            "openai/gpt-4-turbo" ā "gpt-4-turbo"
            "Google/gemini-1.5-flash" ā "gemini-1.5-flash"
        """
        if not model_id:
            return ""
        # Start with lowercase and strip whitespace
        normalized = model_id.lower().strip()
        # Remove common vendor prefixes
        prefixes = [
            # Provider paths
            "openai/",
            "anthropic/",
            "google/",
            "mistralai/",
            "meta-llama/",
            "cohere/",
            "meta/",
            "qwen/",
            "x-ai/",
            "azure/",
            "amazon/",
            "together.ai/",
            "deepinfra/",
            "fireworks.ai/",
            "replicate/",
            # Provider names without slashes
            "openai",
            "github",
            "google_genai",
            "anthropic",
            "google",
            "microsoft",
            "replicate",
            "perplexity",
            "huggingface",
            "meta",
            "amazon",
            "nvidia",
        ]
        # Try to handle nested prefixes
        for prefix in prefixes:
            if normalized.startswith(prefix):
                normalized = normalized[len(prefix) :]
                break
        # Standardize separators (convert spaces and underscores to hyphens)
        normalized = re.sub(r"[-_\s]+", "-", normalized)
        # Remove any extra hyphens at start or end
        normalized = normalized.strip("-")
        # Log the normalization result for debugging
        if model_id != normalized:
            logger.debug(f"Normalized model name: '{model_id}' ā '{normalized}'")
        return normalized
    def load_models_from_json_export(
        self, file_path: str, is_markdown: bool = False
    ) -> None:
        """Load model data from JSON export file."""
        try:
            # Read the file
            with open(file_path, "r", encoding="utf-8") as f:
                content = f.read()
            # Extract JSON content
            json_content = content
            if is_markdown:
                # Look for JSON block in markdown
                json_blocks = re.findall(r"```json\s*([\s\S]*?)\s*```", content)
                if json_blocks:
                    json_content = json_blocks[0]
                else:
                    # Try looking for any code block
                    code_blocks = re.findall(r"```\s*([\s\S]*?)\s*```", content)
                    if code_blocks:
                        json_content = code_blocks[0]
                    else:
                        # Try to find JSON-like content
                        json_like = re.search(r"(\{[\s\S]*\}|\[[\s\S]*\])", content)
                        if json_like:
                            json_content = json_like.group(0)
            # Parse JSON
            models_data = json.loads(json_content)
            # Process model data
            loaded_count = 0
            # Handle different JSON structures
            if isinstance(models_data, list):
                # Array of model objects
                for model in models_data:
                    if "id" in model:
                        model_id = model["id"]
                        # Extract context size if available
                        context_size = None
                        if "context_length" in model:
                            context_size = model["context_length"]
                        elif "context_window" in model:
                            context_size = model["context_window"]
                        # Apply context size if valid (OVERWRITE existing)
                        if (
                            context_size
                            and isinstance(context_size, (int, float))
                            and context_size > 0
                        ):
                            self.model_contexts[model_id] = int(context_size)
                            loaded_count += 1
                        # Extract pricing if available
                        input_price = None
                        output_price = None
                        if "pricing" in model:
                            pricing = model["pricing"]
                            if isinstance(pricing, dict):
                                if "input" in pricing:
                                    input_price = pricing["input"]
                                if "output" in pricing:
                                    output_price = pricing["output"]
                        # Apply pricing if valid (OVERWRITE existing)
                        if input_price is not None and output_price is not None:
                            try:
                                # Assume prices in export might be per-million or per-token
                                input_p = float(input_price)
                                output_p = float(output_price)
                                # Heuristic: if price > 0.01, assume per-million
                                if input_p > 0.01:
                                    input_p /= 1_000_000
                                if output_p > 0.01:
                                    output_p /= 1_000_000
                                self.model_pricing[model_id] = {
                                    "input": input_p,
                                    "output": output_p,
                                }
                            except ValueError:
                                logger.warning(
                                    f"Could not parse pricing from export for {model_id}"
                                )
            # Log results
            logger.info(
                f"Loaded/Overwrote {loaded_count} models from JSON export at {file_path}"
            )
        except Exception as e:
            logger.error(f"Error loading models from JSON export: {str(e)}")
    def is_claude(self, model_id: str) -> bool:
        """Check if a model is any variant of Anthropic Claude."""
        if not model_id:
            return False
        # Use normalized name for consistent matching
        model_id_lower = self.normalize_model_name(model_id).lower()
        # Comprehensive Claude indicators
        claude_indicators = [
            "claude",
            "anthropic-claude",
            "anthropic/claude",
            "anthropic",
            "claude-3",
            "claude-3.5",
            "claude-3.7",
            "claude-4",
            "opus",
            "sonnet",
            "haiku",
            "instant",
        ]
        # If ANY of these substrings appears in the model ID, assume it's Claude
        for indicator in claude_indicators:
            if indicator in model_id_lower:
                return True
        return False
    def is_gpt4o(self, model_id: str) -> bool:
        """Check if a model is any variant of GPT-4o."""
        if not model_id:
            return False
        # Use normalized name for consistent matching
        model_id_lower = self.normalize_model_name(model_id).lower()
        # Check for ANY variation that might indicate GPT-4o
        gpt4o_indicators = [
            "gpt-4o",
            "gpt4o",
            "gpt4-o",
            "gpt-4-o",
            "gpt4o",
            "gpt 4o",
            "gpt-4o-mini",
            "gpt4omini",
        ]
        # If ANY of these substrings appears in the model ID, assume it's GPT-4o
        for indicator in gpt4o_indicators:
            if indicator in model_id_lower:
                return True
        return False
    def is_gemini(self, model_id: str) -> bool:
        """Check if a model is any variant of Google Gemini."""
        if not model_id:
            return False
        # Use normalized name for consistent matching
        model_id_lower = self.normalize_model_name(model_id).lower()
        # Check for ANY variation that might indicate Gemini
        gemini_indicators = [
            "gemini",
            "google-gemini",
            "gemini-pro",
            "gemini-flash",
            "gemini-1.5",
            "gemini-2.0",
            "gemini-2.5",
        ]
        # If ANY of these substrings appears in the model ID, assume it's Gemini
        for indicator in gemini_indicators:
            if indicator in model_id_lower:
                return True
        return False
    def extract_openrouter_model(self, model_name: str) -> str:
        """Extract the actual model ID from an OpenRouter prefixed model."""
        if model_name.startswith("OR."):
            # Extract the actual model part after "OR." prefix
            return model_name[3:]
        return model_name
    def add_or_prefix(self, model_name: str) -> str:
        """Add OR. prefix to a model name if it doesn't already have it."""
        if not model_name.startswith("OR."):
            return f"OR.{model_name}"
        return model_name
    # Removed load_openrouter_contexts and load_openrouter_pricing as logic is now in __init__ / _fetch_openrouter_model_list
    def _detect_content_type(self, text: str) -> str:
        """Detect the type of content in the text."""
        if not text or not self.valves.enable_content_detection:
            return "text"
        # Code blocks (triple backticks)
        if re.search(r"```[a-zA-Z]*\n[\s\S]*?\n```", text):
            return "code"
        # JSON objects or arrays
        if (text.strip().startswith("{") and text.strip().endswith("}")) or (
            text.strip().startswith("[") and text.strip().endswith("]")
        ):
            try:
                json.loads(text.strip())
                return "json"
            except:
                pass
        # Markdown tables
        if re.search(r"\|[-]+\|[-]+\|", text):
            return "table"
        # Markdown headers
        if re.search(r"^#{1,6}\s", text, re.MULTILINE):
            return "markdown_header"
        # Markdown links or images
        if re.search(r"!\[.*?\]\(.*?\)", text) or re.search(r"\[.*?\]\(.*?\)", text):
            return "markdown_link"
        # HTML/XML blocks
        if re.search(r"<[a-zA-Z][\s\S]*?>", text):
            return "html"
        # Emojis (basic unicode emoji range)
        if re.search(r"[\U0001F600-\U0001F64F]", text) or re.search(
            r"[\U0001F300-\U0001F5FF]", text
        ):
            return "emoji"
        # Quoted text (blockquotes)
        if re.search(r"^\s*>", text, re.MULTILINE):
            return "blockquote"
        # Inline code snippets
        if re.search(r"`[^`]+`", text):
            return "inline_code"
        # Default fallback
        return "text"
    def _update_content_type_stats(self, content_type: str) -> None:
        """Update statistics on content type detection."""
        if content_type not in self.content_type_stats:
            self.content_type_stats[content_type] = 0
        self.content_type_stats[content_type] += 1
    def _get_appropriate_encoding(self, model_name: str) -> str:
        """Determine the appropriate tiktoken encoding for a model.
        Different models may use different tokenization schemes. This method
        determines the best encoding to use based on the model family.
        Args:
            model_name: The name of the model to get encoding for
        Returns:
            The name of the appropriate tiktoken encoding
        """
        # Normalize the model name for better matching
        normalized = self.normalize_model_name(model_name)
        # Most models use cl100k_base (GPT-4, Claude, etc.)
        if self.is_claude(model_name) or "claude" in normalized:
            return "cl100k_base"  # Claude uses similar tokenizer to GPT-4
        if self.is_gpt4o(model_name) or "gpt-4" in normalized:
            return "cl100k_base"
        if "gpt-3.5" in normalized:
            return "cl100k_base"
        # Check for older GPT models that use different encoders
        if "gpt-3" in normalized:
            return "p50k_base"
        # Gemini and other models - best approximation is cl100k_base
        # There's no perfect match but this is closest
        return "cl100k_base"
    def _preprocess_for_claude(self, text: str) -> str:
        """Apply Claude-specific preprocessing for more accurate token counting.
        Claude treats certain patterns differently from other models, especially
        with regard to whitespace, newlines, and XML-like tags.
        Args:
            text: The text to preprocess
        Returns:
            Preprocessed text optimized for Claude tokenization
        """
        # Note: This is a simplification. Full Claude token counting requires
        # their proprietary tokenizer, but these adjustments help approximate it better.
        # Claude is more sensitive to repeated newlines
        # Compress multiple newlines to reduce excessive token counts
        processed = re.sub(r"\n{3,}", "\n\n", text)
        # Claude considers <tag> structures as special
        # We don't modify them but account for them in our counting
        return processed
    def _preprocess_for_gemini(self, text: str) -> str:
        """Apply Gemini-specific preprocessing for more accurate token counting.
        Gemini has different handling for certain markdown patterns and
        for content like code blocks. This preprocessing helps approximate
        its token counting better.
        Args:
            text: The text to preprocess
        Returns:
            Preprocessed text optimized for Gemini tokenization
        """
        # Note: This is an approximation since Gemini uses a different underlying tokenizer
        # Gemini handles code blocks slightly differently
        # Mark them to account for this in counting
        processed = text
        # Gemini may parse markdown differently, but for now we'll
        # rely on the general-purpose tokenizer
        return processed
    def _fallback_count_tokens(self, text: str, model_name: str) -> int:
        """Fallback method for token counting when tiktoken is unavailable.
        This provides more sophisticated estimation based on model family
        and content type when the standard tokenizer isn't available.
        Args:
            text: The text to count tokens for
            model_name: The model name to determine estimation approach
        Returns:
            Estimated token count
        """
        # Check for content type to apply different estimation rules
        content_type = self._detect_content_type(text)
        # Basic character count as starting point
        char_count = len(text)
        # Word count (most tokenizers roughly track words, but with variations)
        # word_count = len(text.split()) # Less reliable than char count for tokens
        # Feature 7: Use refined estimates based on content type
        ratio = CHAR_PER_TOKEN_ESTIMATE.get(
            content_type, CHAR_PER_TOKEN_ESTIMATE["default"]
        )
        estimated_tokens = char_count / ratio
        # Model-specific adjustments (can be added if needed)
        # if self.is_claude(model_name):
        #     estimated_tokens *= 1.05  # Example adjustment
        return int(estimated_tokens)
    def _count_structured_tokens(self, text: str, content_type: str, encoder) -> int:
        """Apply specialized token counting for different content types.
        Different types of content (code, JSON, tables) tokenize differently.
        This method applies content-specific optimizations.
        Args:
            text: The text to count tokens for
            content_type: The detected content type
            encoder: The tiktoken encoder to use
        Returns:
            Optimized token count for the specific content type
        """
        if content_type == "code":
            # Code tends to have special tokens, indentation, etc.
            # For now, we'll use the regular encoder but we could add
            # code-specific optimizations here
            return len(encoder.encode(text, disallowed_special=()))
        elif content_type == "json":
            # JSON has many special characters that become tokens
            # For accurate counting, we apply the regular encoder
            return len(encoder.encode(text, disallowed_special=()))
        elif content_type == "table":
            # Tables have special handling in some models
            # For now, use regular encoding
            return len(encoder.encode(text, disallowed_special=()))
        else:
            # Regular text
            return len(encoder.encode(text, disallowed_special=()))
    def count_tokens(self, text: str, model_name: str = "default") -> int:
        """Count tokens in text using the appropriate tokenizer for the model.
        This enhanced method uses model-specific tokenization approaches and
        content-specific optimizations for more accurate token counts across
        different types of content and model families.
        Args:
            text: The text to count tokens for
            model_name: The model name to determine tokenization approach
        Returns:
            Token count for the provided text
        """
        if not text:
            return 0
        # Check cache first for performance optimization
        if self.valves.enable_token_cache:
            cache_key = f"{model_name}:{hash(text)}"
            if cache_key in self.token_cache:
                self.token_cache_stats["hits"] += 1
                return self.token_cache[cache_key]
            else:
                self.token_cache_stats["misses"] += 1
        # Determine content type for specialized handling
        content_type = "text"
        if self.valves.enable_content_detection:
            content_type = self._detect_content_type(text)
            self._update_content_type_stats(content_type)
        # Use tiktoken if available
        if TIKTOKEN_AVAILABLE:
            # Get appropriate encoding based on model family
            encoding_name = self._get_appropriate_encoding(
                model_name
            )  # This line was missing
            # Get or create encoder
            if encoding_name not in self.encoders:
                try:
                    self.encoders[encoding_name] = tiktoken.get_encoding(encoding_name)
                except Exception as e:
                    logger.error(f"Failed to get encoding {encoding_name}: {e}")
                    return self._fallback_count_tokens(text, model_name)
            encoder = self.encoders[encoding_name]
            # Apply model-specific preprocessing for better accuracy
            processed_text = text
            if self.is_claude(model_name):
                processed_text = self._preprocess_for_claude(text)
            elif self.is_gemini(model_name):
                processed_text = self._preprocess_for_gemini(text)
            # Apply content-specific counting if enabled
            if self.valves.content_specific_counting and content_type != "text":
                token_count = self._count_structured_tokens(
                    processed_text, content_type, encoder
                )
            else:
                token_count = len(encoder.encode(processed_text, disallowed_special=()))
        else:
            # Enhanced fallback token counting method when tiktoken not available
            token_count = self._fallback_count_tokens(text, model_name)
        # Add to cache with timestamp
        if self.valves.enable_token_cache:
            # Add or update the cache entry
            self.token_cache[cache_key] = token_count
            # Record the access time
            self.token_cache_entry_times[cache_key] = time.time()
        # Periodically check if cache needs pruning (every 100 misses)
        if self.token_cache_stats["misses"] % 100 == 0:
            self._prune_token_cache()
        # Apply model-specific correction factor
        model_factor = self.valves.model_correction_factors.get(model_name, 1.0)
        # Apply content-type correction factor
        content_factor = self.valves.content_correction_factors.get(content_type, 1.0)
        corrected_tokens = int(token_count * model_factor * content_factor)
        logger.debug(f"Count tokens for '{model_name}' ({content_type}): Base={token_count}, ModelFactor={model_factor:.4f}, ContentFactor={content_factor:.4f} -> Corrected={corrected_tokens}")
        # Return the corrected token count
        return corrected_tokens
    def _prune_token_cache(self) -> None:
        """Prune the token cache to prevent unbounded memory growth.
        This method uses both time-based and size-based approaches to maintain
        the token cache at a reasonable size:
        1. Time-based: Removes entries older than CACHE_TTL (default: 5 days)
        2. Size-based: If the cache exceeds the configured maximum size, removes
                      the oldest entries until it's within the limit
        The pruning approach prioritizes keeping recently used entries while
        removing stale ones that are unlikely to be used again.
        """
        now = time.time()
        prune_count = 0
        # Track the last time we did a full prune to avoid doing it too often
        if now - self.token_cache_last_prune < 3600:  # Only do full prune once per hour
            return
        try:
            # 1. Time-based pruning: Remove entries older than CACHE_TTL
            expired_keys = []
            for key, timestamp in list(self.token_cache_entry_times.items()):
                if now - timestamp > CACHE_TTL:
                    expired_keys.append(key)
            # Remove expired entries
            for key in expired_keys:
                if key in self.token_cache:
                    del self.token_cache[key]
                if key in self.token_cache_entry_times:
                    del self.token_cache_entry_times[key]
                prune_count += 1
            # 2. Size-based pruning: If still too large, remove oldest entries
            if len(self.token_cache) > self.valves.token_cache_size:
                # Sort by timestamp (oldest first)
                sorted_entries = sorted(
                    self.token_cache_entry_times.items(), key=lambda x: x[1]
                )
                # Calculate how many entries to remove
                excess = len(self.token_cache) - self.valves.token_cache_size
                keys_to_remove = [k for k, _ in sorted_entries[:excess]]
                # Remove excess entries (oldest first)
                for key in keys_to_remove:
                    if key in self.token_cache:
                        del self.token_cache[key]
                    if key in self.token_cache_entry_times:
                        del self.token_cache_entry_times[key]
                    prune_count += 1
            # Update prune stats
            self.token_cache_stats["prunes"] += prune_count
            # Update the last prune time
            self.token_cache_last_prune = now
            # Log pruning results
            if prune_count > 0:
                logger.debug(
                    f"Pruned {prune_count} entries from token cache. Cache size: {len(self.token_cache)}"
                )
        except Exception as e:
            # Non-critical operation - log but don't crash
            logger.error(f"Error pruning token cache: {e}")
    def _is_experimental_model(self, model_name: str) -> bool:
        """Determine if a model is experimental or newly released.
        This helps identify models that might benefit from cost comparisons
        with more established models.
        Args:
            model_name: The model name to check
        Returns:
            True if the model is considered experimental, False otherwise
        """
        # Indicators in name suggesting experimental status
        experimental_indicators = [
            "preview",
            "alpha",
            "beta",
            "test",
            "exp",
            "experimental",
            "4.5",
            "3.7",
            "2.5",
            "o3",  # Version indicators for new models
            "-latest",
            "-dev",
            "-preview",
        ]
        # Normalize and lowercase for better matching
        normalized = self.normalize_model_name(model_name).lower()
        # Check for any experimental indicators
        for indicator in experimental_indicators:
            if indicator in normalized:
                return True
        # Also consider models not in our standard dictionary as experimental
        return model_name not in self.model_contexts
    def infer_model_family(self, model_name: str) -> Tuple[str, int]:
        """Infer model family and context size based on patterns in model name.
        This is a smart detection system that looks for specific keywords in model names
        and returns the most likely model family and context size.
        Returns:
            Tuple of (family_name, context_size)
        """
        model_name = model_name.lower()
        # Map of model family keywords to their likely context sizes
        family_patterns = [
            # Newest models - most specific first
            {"pattern": "o3", "family": "o3", "context": 128000},
            {"pattern": "opus", "family": "claude", "context": 200000},
            {"pattern": "claude-3", "family": "claude", "context": 200000},
            {"pattern": "claude-3.5", "family": "claude", "context": 200000},
            {"pattern": "claude-3.7", "family": "claude", "context": 200000},
            {"pattern": "claude-instant", "family": "claude", "context": 100000},
            {"pattern": "claude-haiku", "family": "claude", "context": 150000},
            {"pattern": "claude-sonnet", "family": "claude", "context": 180000},
            {"pattern": "gemini-flash", "family": "gemini", "context": 1000000},
            {"pattern": "gemini-2.5", "family": "gemini", "context": 256000},
            {"pattern": "gemini-pro", "family": "gemini", "context": 256000},
            {"pattern": "gpt-4o", "family": "gpt4o", "context": 128000},
            {"pattern": "gpt-4o-mini", "family": "gpt4o", "context": 128000},
            {"pattern": "gpt-4-turbo", "family": "gpt4", "context": 128000},
            {"pattern": "gpt-4-32k", "family": "gpt4", "context": 32768},
            {"pattern": "gpt-4", "family": "gpt4", "context": 8192},
            {"pattern": "gpt-3.5", "family": "gpt35", "context": 16385},
            # Other major model families
            {"pattern": "mixtral", "family": "mixtral", "context": 32768},
            {"pattern": "mistral", "family": "mistral", "context": 32768},
            {"pattern": "llama-3.1-70b", "family": "llama", "context": 131072},
            {"pattern": "llama-3.1", "family": "llama", "context": 131072},
            {"pattern": "llama-3-70b", "family": "llama", "context": 8192},
            {"pattern": "llama-3", "family": "llama", "context": 8192},
            {"pattern": "llama-2", "family": "llama", "context": 4096},
            {"pattern": "qwen-2.5", "family": "qwen", "context": 128000},
            {"pattern": "qwen-2", "family": "qwen", "context": 32768},
            {"pattern": "qwen-1.5", "family": "qwen", "context": 32768},
            {"pattern": "qwen", "family": "qwen", "context": 32768},
            {"pattern": "phi-3", "family": "phi", "context": 32768},
            {"pattern": "phi-2", "family": "phi", "context": 4096},
            {"pattern": "command-r-plus", "family": "cohere", "context": 128000},
            {"pattern": "command-r", "family": "cohere", "context": 128000},
            {"pattern": "command", "family": "cohere", "context": 128000},
            {"pattern": "grok-2", "family": "grok", "context": 131072},
            {"pattern": "grok-1.5", "family": "grok", "context": 131072},
            {"pattern": "grok-1", "family": "grok", "context": 8192},
            {"pattern": "grok", "family": "grok", "context": 8192},
        ]
        # Check for matches in the patterns list
        for entry in family_patterns:
            if entry["pattern"] in model_name:
                logger.debug(
                    f"Inferred family '{entry['family']}' for model '{model_name}' with context {entry['context']}"
                )
                return (entry["family"], entry["context"])
        # If no match found, return default using the defined fallback
        logger.debug(
            f"Could not infer family for model '{model_name}', using fallback context {self.fallback_context_size}"
        )
        return ("unknown", self.fallback_context_size)
    class ModelNotRecognizedError(Exception):
        """Exception raised when a model cannot be recognized."""
        pass
    def get_context_size(
        self, model_name: str, __user__: Optional[dict] = None
    ) -> int:  # Feature 1: Add __user__
        """Get the context size for a model.
        Raises:
            ModelNotRecognizedError: If the model cannot be recognized and no context size
                                    can be determined.
        """
        if not model_name:
            raise self.ModelNotRecognizedError("Model name is empty or missing")
        # Feature 1: Check user aliases first (handle both dict and Pydantic model for valves)
        resolved_model_name = model_name
        if __user__ and "valves" in __user__:
            user_valves_data = __user__["valves"]
            aliases = None
            if isinstance(
                user_valves_data, self.UserValves
            ):  # Check if it's the Pydantic model instance
                aliases = getattr(user_valves_data, "model_aliases", None)
            elif isinstance(user_valves_data, dict):  # Check if it's a dictionary
                aliases = user_valves_data.get("model_aliases")
            if aliases and isinstance(aliases, dict):
                resolved_model_name = aliases.get(model_name, model_name)
                if resolved_model_name != model_name:
                    logger.debug(
                        f"Resolved alias '{model_name}' to '{resolved_model_name}'"
                    )
        try:
            # Try exact match first (using resolved name)
            if resolved_model_name in self.model_contexts:
                context_size = self.model_contexts[resolved_model_name]
                logger.debug(
                    f"Model '{resolved_model_name}' detected via: Exact Match. Context: {context_size}"
                )
                return context_size
            # Try with OR. prefix (for OpenRouter models without prefixes)
            or_prefixed = self.add_or_prefix(resolved_model_name)
            if or_prefixed in self.model_contexts:
                context_size = self.model_contexts[or_prefixed]
                logger.debug(
                    f"Model '{resolved_model_name}' detected via: OR Prefix Match ('{or_prefixed}'). Context: {context_size}"
                )
                return context_size
            # Try normalized name match
            normalized = self.normalize_model_name(resolved_model_name)
            if normalized in self.model_contexts:
                context_size = self.model_contexts[normalized]
                logger.debug(
                    f"Model '{resolved_model_name}' detected via: Normalized Match ('{normalized}'). Context: {context_size}"
                )
                return context_size
            # --- Phase 2: Fuzzy matching ---
            best_match = None
            best_score = 0
            threshold = self.valves.fuzzy_match_threshold
            for known_model in self.model_contexts.keys():
                score = self._simple_similarity(normalized, known_model)
                if score > best_score:
                    best_score = score
                    best_match = known_model
            if best_score >= threshold:
                context_size = self.model_contexts[best_match]
                logger.debug(
                    f"Model '{resolved_model_name}' detected via: Fuzzy Match ('{best_match}', {best_score}%). Context: {context_size}"
                )
                return context_size
            # Try to match by known model family detection methods
            if self.is_gpt4o(resolved_model_name):
                context_size = self.GPT4O_CONTEXT_SIZE
                logger.debug(
                    f"Model '{resolved_model_name}' detected via: Specific Check (is_gpt4o). Context: {context_size}"
                )
                return context_size
            if self.is_claude(resolved_model_name):
                context_size = 200000  # Latest Claude models have 200K context by default
                logger.debug(
                    f"Model '{resolved_model_name}' detected via: Specific Check (is_claude). Context: {context_size}"
                )
                return context_size
            if self.is_gemini(resolved_model_name):
                context_size = self.GEMINI_FLASH_CONTEXT_SIZE  # Default to the larger context
                logger.debug(
                    f"Model '{resolved_model_name}' detected via: Specific Check (is_gemini). Context: {context_size}"
                )
                return context_size
            # --- REMOVED Fallback: No longer inferring family as a last resort ---
            # family, context_size = self.infer_model_family(resolved_model_name)
            # if family != "unknown":
            #     logger.debug(
            #         f"Model '{resolved_model_name}' detected via: Family Inference ('{family}'). Context: {context_size}"
            #     )
            #     # Cache this result for future lookups
            #     self.model_contexts[resolved_model_name] = context_size
            #     return context_size
            # If we get here, the model could not be recognized after all checks
            model_name_display = (
                resolved_model_name
                if len(resolved_model_name) <= 30
                else f"{resolved_model_name[:27]}..."
            )
            logger.warning(f"Model not recognized: '{resolved_model_name}'")
            # Log unknown model if enabled
            if self.valves.log_unknown_models:
                try:
                    log_dir = os.path.join(os.getcwd(), "logs")
                    os.makedirs(log_dir, exist_ok=True)
                    log_path = os.path.join(log_dir, "unknown_models.log")
                    # Rotate log if too big
                    max_bytes = self.valves.unknown_models_log_max_size_kb * 1024
                    if (
                        os.path.exists(log_path)
                        and os.path.getsize(log_path) > max_bytes
                    ):
                        rotated_path = log_path + ".1"
                        if os.path.exists(rotated_path):
                            os.remove(rotated_path)
                        os.rename(log_path, rotated_path)
                    # Deduplicate recent entries (simple approach)
                    recent_lines = set()
                    if os.path.exists(log_path):
                        with open(log_path, "r", encoding="utf-8") as f:
                            for line in f.readlines()[-100:]:
                                recent_lines.add(line.strip())
                    entry = f"{datetime.now().isoformat()} | {resolved_model_name}"
                    if entry not in recent_lines:
                        with open(log_path, "a", encoding="utf-8") as f:
                            f.write(entry + "\n")
                except Exception as e:
                    logger.error(
                        f"Failed to log unknown model '{resolved_model_name}': {e}"
                    )
            # Raise error with detection source info
            raise self.ModelNotRecognizedError(
                f"'{model_name_display}' unknown context size. Detection source: none"
            )
        except self.ModelNotRecognizedError:
            # Re-raise model not recognized errors
            raise
        except Exception as e:
            logger.error(f"Error getting context size: {e}")
            raise self.ModelNotRecognizedError(
                f"Error determining context size for model '{resolved_model_name}': {str(e)}"
            )
    def _simple_similarity(self, s1: str, s2: str) -> int:
        """Compute a simple similarity ratio (0-100) based on Levenshtein distance."""
        if s1 == s2:
            return 100
        len_s1 = len(s1)
        len_s2 = len(s2)
        max_len = max(len_s1, len_s2)
        if max_len == 0:
            return 100
        dist = self._levenshtein_distance(s1, s2)
        similarity = int(100 * (1 - dist / max_len))
        return similarity
    def _levenshtein_distance(self, s1: str, s2: str) -> int:
        """Compute Levenshtein distance between two strings."""
        if len(s1) < len(s2):
            return self._levenshtein_distance(s2, s1)
        if len(s2) == 0:
            return len(s1)
        previous_row = list(range(len(s2) + 1))
        for i, c1 in enumerate(s1):
            current_row = [i + 1]
            for j, c2 in enumerate(s2):
                insertions = previous_row[j + 1] + 1
                deletions = current_row[j] + 1
                substitutions = previous_row[j] + (c1 != c2)
                current_row.append(min(insertions, deletions, substitutions))
            previous_row = current_row
        return previous_row[-1]
    def _format_number(self, num: int) -> str:
        """Format large numbers to use K, M suffixes for better readability."""
        if num >= 1000000:
            return f"{num/1000000:.1f}M"
        elif num >= 1000:
            return f"{num/1000:.1f}K"
        else:
            return f"{num}"
    def _format_progress_bar(self, percentage: float) -> str:
        """Format a progress bar based on the configured style."""
        style = self.valves.progress_bar_style.lower()
        if style == "none":
            return ""
        elif style == "minimal":
            # Show a single block if usage > 0, otherwise empty
            return PROGRESS_CHARS[1] if percentage > 0 else PROGRESS_CHARS[0]
        elif style == "standard":
            # Scale to bar length
            filled_length = int(self.valves.bar_length * percentage / 100)
            # Create the bar
            bar = ""
            for i in range(self.valves.bar_length):
                if i < filled_length:
                    bar += PROGRESS_CHARS[1]  # Filled character
                else:
                    bar += PROGRESS_CHARS[0]  # Empty character
            return f"[{bar}]" # Return standard bar enclosed in brackets
        else: # Default to standard if style is unrecognized
            logger.warning(f"Unrecognized progress_bar_style '{self.valves.progress_bar_style}', defaulting to 'standard'.")
            filled_length = int(self.valves.bar_length * percentage / 100)
            bar = ""
            for i in range(self.valves.bar_length):
                if i < filled_length:
                    bar += PROGRESS_CHARS[1]
                else:
                    bar += PROGRESS_CHARS[0]
            return f"[{bar}]"
    def _format_cost(self, cost: float, context_size: int) -> str:
        """Format cost with appropriate decimal precision based on context size and amount.
        For very small costs (common with large context windows like 1M+), this ensures
        that at least the first two significant digits are visible, making the cost
        more readable and meaningful to users.
        Args:
            cost: The calculated cost amount
            context_size: The context window size (used to adjust formatting logic)
        Returns:
            Formatted cost string with appropriate precision
        """
        # For tiny costs (very large contexts like 1M+), show enough decimals to display first 2 significant digits
        if cost < 0.01:
            # Small costs: show six decimal places
            return f"${cost:.6f}"
        # For small contexts/higher costs, use fewer decimals
        elif cost < 0.1:
            return f"${cost:.5f}"
        elif cost < 1.0:
            return f"${cost:.4f}"
        elif cost < 10.0:
            return f"${cost:.3f}"
        else:
            return f"${cost:.2f}"
    def _generate_status_message(
        self,
        status_prefix,
        total_tokens,
        context_size,
        context_percentage,
        progress_bar,
        input_tokens,
        output_tokens,
        cost,
        model_name,  # Added model_name for cost breakdown
        elapsed_time=None,
        tokens_per_second=None,
        rolling_rate=None,  # Added rolling_rate
        cost_comparisons=None,
        status_level="normal",
        budget_info=None, # Daily/Session budget info
        trimming_hint=None,
        # --- Add new parameters ---
        monthly_budget_info: Optional[Dict] = None, # Pass pre-calculated monthly info
        daily_spend: float = 0.0,
        text_tokens: int = 0,
        image_tokens: int = 0,
        calibration_status: str = "Approximate",
    ) -> str:
        """Generate a status message with controlled length to avoid truncation.
        Prioritizes different elements of the status message based on importance:
        1. Base information (tokens, context percentage)
        2. Token breakdown
        3. Cost information (with visual breakdown in detailed mode)
        4. Budget Information
        5. Performance metrics
        6. Trimming Hint (if applicable)
        7. Cost comparisons (only for experimental models)
        8. Cache Hit Rate (detailed/debug)
        9. Error Rate (detailed/debug)
        Args:
            Various status elements to include in the message
        Returns:
            Formatted status message with appropriate length
        """
        max_length = 120  # Define max length early
        pricing_source_fallback = (
            not self.using_dynamic_pricing
        )  # Feature 10: Check flag
        # ----- GUARANTEED NON-TRUNCATING DISPLAY MODES -----
        # Format common elements used across all modes
        # (Removed unused variables: formatted_tokens, formatted_context, formatted_input, formatted_output)
        # Base information (highest priority)
        status_message = f"{status_prefix}{TOKEN_EMOJI} {self._format_number(total_tokens)}/{self._format_number(context_size)} tokens ({context_percentage:.2f}%)"
        # Add progress bar if enabled and style is not 'none'
        if self.valves.show_progress and self.valves.progress_bar_style.lower() != "none":
             # The _format_progress_bar function now returns the complete formatted bar (or single char)
             status_message += f" {progress_bar}"
        # Add token breakdown (high priority)
        token_breakdown = f" | š½{self._format_number(input_tokens)}/š¼{self._format_number(output_tokens)}"  # Simpler breakdown with new emojis
        # Add cost information (medium priority)
        cost_info = ""
        if self.valves.show_cost_summary:
            # Removed cost_indicator logic for fallback pricing
            # cost_indicator = ("*" if pricing_source_fallback else "")
            # Improvement #4: Visual Cost Breakdown
            input_cost_part = input_tokens * self._get_model_pricing(
                model_name, None
            ).get(
                "input", 0
            )  # Recalculate parts for ratio
            output_cost_part = output_tokens * self._get_model_pricing(
                model_name, None
            ).get("output", 0)
            # REMOVE cost breakdown percentages
            cost_info = f" | {MONEY_EMOJI} {cost}" # Removed cost_indicator
        # Feature 2 & Item 1/7: Combine Daily Budget/Spend for brevity
        budget_msg = ""
        if budget_info and budget_info["budget"] > 0 and self.valves.show_budget_info:
            # Format: Spent/Total (% Used)
            daily_spend_formatted = self._format_cost(daily_spend, context_size)
            daily_budget_formatted = self._format_cost(budget_info['budget'], context_size)
            daily_percent_formatted = f"{budget_info['used_percent']:.1f}%" # Use 1 decimal for brevity
            daily_warning = f"{WARNING_EMOJI} " if budget_info.get("warning", False) else ""
            budget_msg = f" | {daily_warning}{BUDGET_EMOJI} Daily: {daily_spend_formatted}/{daily_budget_formatted} ({daily_percent_formatted})"
        elif self.valves.show_daily_spend_info and daily_spend > 0: # Show only spend if budget is off/0
             daily_spend_formatted = self._format_cost(daily_spend, context_size)
             budget_msg = f" | š
 {daily_spend_formatted} spent today"
        # Add monthly budget info if enabled and available (using pre-calculated info)
        monthly_budget_msg = ""
        if self.valves.show_monthly_budget_info and monthly_budget_info: # Keep monthly separate for now
            effective_monthly_budget = monthly_budget_info.get("budget", 0.0)
            current_month_cost = monthly_budget_info.get("current_month_cost", 0.0)
            if effective_monthly_budget > 0:  # Budget is enabled
                # Use the result of _format_cost directly as it includes '$'
                remaining_monthly_formatted = self._format_cost(monthly_budget_info['remaining'], context_size)
                used_percent_formatted = f"{monthly_budget_info['used_percent']:.2f}"
                monthly_warning = f"{WARNING_EMOJI} " if monthly_budget_info.get("warning", False) else ""
                # Remove the extra '$' from the f-string
                monthly_budget_msg = f" | {monthly_warning}šļø {remaining_monthly_formatted} left ({used_percent_formatted}%) this month"
            else:  # Budget is disabled (0), show total spent
                # Use the result of _format_cost directly as it includes '$'
                current_month_cost_formatted = self._format_cost(current_month_cost, context_size)
                 # Remove the extra '$' from the f-string
                monthly_budget_msg = f" | šļø {current_month_cost_formatted} spent this month"
        # Add daily spend info if enabled
        daily_spend_msg = ""
        if self.valves.show_daily_spend_info and daily_spend > 0:
             daily_spend_formatted = self._format_cost(daily_spend, context_size)  # Format daily spend
             daily_spend_msg = f" | š
 {daily_spend_formatted} spent today"
        # Add text/image split if enabled
        text_image_msg = ""
        if self.valves.show_text_image_split and (text_tokens > 0 or image_tokens > 0): # Show if either is non-zero
             text_image_msg = f" | Text: {self._format_number(text_tokens)}"
             if image_tokens > 0:
                 # Add ~ to indicate image tokens are estimated
                 text_image_msg += f", Img: ~{self._format_number(image_tokens)}"
        # Add image token warning if enabled and images present (Removed as info is now in text_image_msg)
        image_token_warning_msg = ""
        # if self.valves.show_image_token_warning and image_tokens > 0:
        #      image_token_warning_msg = f" | ~{self._format_number(image_tokens)} img tokens (est.)"
        # Add trimming hint if enabled and applicable
        trimming_hint_msg = ""
        # Use the passed trimming_hint directly if it exists and the valve is enabled
        if self.valves.show_trimming_hint and trimming_hint:
             trimming_hint_msg = f" | {trimming_hint}"
        # Add cost optimization hints if enabled
        cost_optimization_msg = ""
        if self.valves.enable_cost_optimization_hints:
            # 1. High prompt cost hint
            if input_tokens > 0 and input_tokens > output_tokens * 0.5:  # If input is significant compared to output
                input_cost = input_tokens * self._get_model_pricing(model_name, None).get("input", 0)
                if input_cost >= self.valves.prompt_cost_warning_threshold:
                    cost_optimization_msg = f" | š” Consider summarizing input to reduce cost"
            
            # 2. Expensive model suggestion
            if not cost_optimization_msg:  # Only show one hint at a time
                model_input_price = self._get_model_pricing(model_name, None).get("input", 0)
                model_output_price = self._get_model_pricing(model_name, None).get("output", 0)
                # Check if either input or output price exceeds threshold
                if model_input_price > self.valves.expensive_model_cost_threshold or model_output_price > self.valves.expensive_model_cost_threshold:
                    # Suggest cheaper alternatives based on model family
                    if "gpt-4" in model_name.lower() or "o1" in model_name.lower():
                        cost_optimization_msg = f" | š” For simpler tasks, try GPT-3.5 or Claude Haiku"
                    elif "claude-3" in model_name.lower() and "haiku" not in model_name.lower():
                        cost_optimization_msg = f" | š” For simpler tasks, try Claude Haiku"
                    elif "gemini-pro" in model_name.lower():
                        cost_optimization_msg = f" | š” For simpler tasks, try Gemini Flash"
            
            # 3. Budget limit warnings with specific hints
            if not cost_optimization_msg:  # Only show one hint at a time
                # Daily/session budget warning
                if budget_info and budget_info.get("warning", False):
                    cost_optimization_msg = f" | š” Daily budget nearing limit. Consider switching model"
                # Monthly budget warning
                elif monthly_budget_info and monthly_budget_info.get("warning", False):
                    cost_optimization_msg = f" | š” Monthly budget nearing limit. Consider switching model"
        # Add calibration status if enabled (using the passed status string)
        calibration_status_msg = ""
        if self.valves.show_calibration_status and calibration_status:
             # Determine emoji based on the content of the status string
             if "Calibrated" in calibration_status:
                 cal_emoji = "š§"
             elif "Unknown" in calibration_status or "Approximate" in calibration_status:
                 cal_emoji = "ā ļø" # Use warning for unknown/approximate
             else:
                 cal_emoji = "" # No emoji for other statuses? Or default to warning? Let's default to warning for safety.
                 cal_emoji = "ā ļø"
             # Use the passed status string, optionally removing timestamp
             status_text = calibration_status
             if not self.valves.show_calibration_timestamp: # Item 3: Check new valve
                 # Remove timestamp part like " (as of Apr 16 14:30)"
                 status_text = re.sub(r"\s+\(as of .*\)", "", calibration_status)
             calibration_status_msg = f" | {cal_emoji} {status_text}"
        # Add performance metrics if enabled (lower priority)
        performance_msg = ""
        if self.valves.show_metrics_panel and elapsed_time and elapsed_time > 0:
            # Use rolling rate if available and valid, otherwise use overall rate
            rate_to_display = (
                rolling_rate
                if rolling_rate is not None and rolling_rate > 0
                else tokens_per_second
            )
            rate_indicator = (
                "~" if rolling_rate is not None and rolling_rate > 0 else ""
            )  # Indicate rolling avg
            performance_msg = f" | {CLOCK_EMOJI} {elapsed_time:.1f}s ({rate_indicator}{rate_to_display:.1f} t/s)"
        # Add cost comparisons for experimental models (lowest priority)
        comparisons_msg = ""
        if cost_comparisons:
            comparisons_msg = " | Compare: " + ", ".join(
                [
                    f"{m.split('/')[-1].split('-')[0]}: {self._format_cost(c, context_size)}"
                    for m, c in cost_comparisons.items()
                ]
            )
        # Item 6: Add Cache Hit Rate (controlled by valve, not just debug)
        cache_msg = ""
        if self.valves.show_cache_hit_rate: # Use the new valve
            hits = self.token_cache_stats["hits"]
            misses = self.token_cache_stats["misses"]
            total_lookups = hits + misses
            if total_lookups > 0:
                hit_rate = (
                    hits / (total_lookups + 1e-9)
                ) * 100  # Avoid division by zero
                cache_msg = (
                    f" | {CACHE_EMOJI} Cache: {hit_rate:.1f}% ({hits}/{total_lookups})"
                )
            else:
                cache_msg = f" | {CACHE_EMOJI} Cache: N/A"
        # Item 6: Add Error Rate (controlled by valve, show even if 0 if valve is on)
        error_msg = ""
        if self.valves.show_error_rate: # Use the new valve
            errors = self.session_stats["error_count"]
            requests = self.session_stats["requests"]
            # Show even if errors are 0, if the valve is enabled
            error_msg = f" | Errors: {errors}/{requests}"
        # --- Assemble the message, checking length ---
        # Always include token breakdown (high priority)
        if len(status_message) + len(token_breakdown) <= max_length:
            status_message += token_breakdown
        # Add cost information if fits
        if len(status_message) + len(cost_info) <= max_length:
            status_message += cost_info
        # Feature 2: Add Budget info if fits
        if len(status_message) + len(budget_msg) <= max_length:
            status_message += budget_msg
        # Add performance metrics if fits
        if len(status_message) + len(performance_msg) <= max_length:
            status_message += performance_msg
        # Add Trimming Hint if applicable and fits
        trim_msg = ""
        if status_level == "critical" and trimming_hint:
            trim_msg = f" {trimming_hint}"
            if len(status_message) + len(trim_msg) <= max_length:
                status_message += trim_msg
            else:  # Fallback if hint is too long
                clear_suggestion = " (Clear Chat?)"
            if len(status_message) + len(clear_suggestion) <= max_length:
                status_message += clear_suggestion
        elif (
            status_level == "critical"
        ):  # Default suggestion if hint disabled/not generated
            clear_suggestion = " (Clear Chat?)"
            if len(status_message) + len(clear_suggestion) <= max_length:
                status_message += clear_suggestion
        # Add Cache Rate if fits (detailed/debug)
        if len(status_message) + len(cache_msg) <= max_length:
            status_message += cache_msg
        # Add Error Rate if fits (detailed/debug)
        if len(status_message) + len(error_msg) <= max_length:
            status_message += error_msg
        # Item 6: Add cost comparisons if fits (controlled by valve)
        if self.valves.show_cost_comparisons:
            if len(status_message) + len(comparisons_msg) <= max_length:
                status_message += comparisons_msg
            elif comparisons_msg and len(status_message) <= max_length - 30:
                # If full comparisons won't fit, add a shortened version
                # Just include first model comparison
                short_comparison = " | Compare: " + ", ".join(
                    [
                        f"{m.split('/')[-1].split('-')[0]}: {self._format_cost(c, context_size)}"
                        for m, c in list(cost_comparisons.items())[:1]
                    ]
                )
                if len(status_message) + len(short_comparison) <= max_length:
                    status_message += short_comparison
        # Append remaining optional UI elements if enabled (ignore length limit for these)
        # Note: Daily spend is now part of the combined budget_msg or shown if budget is off
        status_message += monthly_budget_msg # Monthly budget kept separate
        # status_message += daily_spend_msg # Removed as it's combined or conditional in budget_msg
        status_message += text_image_msg
        # Define image_token_warning_msg if it's not already defined
        image_token_warning_msg = ""
        if self.valves.show_image_token_warning and image_tokens > 0:
            image_token_warning_msg = f" | ~{self._format_number(image_tokens)} img tokens (est.)"
        status_message += image_token_warning_msg
        status_message += trimming_hint_msg
        status_message += calibration_status_msg
        return status_message
    def _is_experimental_model(self, model_name: str) -> bool:
        """Determine if a model is experimental or newly released.
        This helps identify models that might benefit from cost comparisons
        with more established models.
        Args:
            model_name: The model name to check
        Returns:
            True if the model is considered experimental, False otherwise
        """
        # Indicators in name suggesting experimental status
        experimental_indicators = [
            "preview",
            "alpha",
            "beta",
            "test",
            "exp",
            "experimental",
            "4.5",
            "3.7",
            "2.5",
            "o3",  # Version indicators for new models
            "-latest",
            "-dev",
            "-preview",
        ]
        # Normalize and lowercase for better matching
        normalized = self.normalize_model_name(model_name).lower()
        # Check for any experimental indicators
        for indicator in experimental_indicators:
            if indicator in normalized:
                return True
        # Also consider models not in our standard dictionary as experimental
        return model_name not in self.model_contexts
    def _find_similar_model(self, model_name: str) -> str:
        """Find a similar known model to use for estimates when exact model is unknown.
        This method attempts to match the unknown model to a known model family
        by examining patterns in the model name, allowing for reasonable
        pricing and context estimates even when a model isn't explicitly recognized.
        Args:
            model_name: The unrecognized model name to find a similar model for
        Returns:
            The name of a similar known model to use for estimates
        """
        # Normalize the model name for better pattern matching
        normalized = self.normalize_model_name(model_name)
        # Try to match by model family
        if self.is_gpt4o(model_name) or "gpt-4" in normalized:
            return "gpt-4"
        elif self.is_claude(model_name):
            return "claude-3-sonnet"
        elif self.is_gemini(model_name):
            return "gemini-pro"
        elif "mistral" in normalized:
            return "mistral-medium"
        elif "llama" in normalized:
            return "meta-llama/llama-3-70b-instruct"
        elif "qwen" in normalized:
            return "qwen-2.5-72b"
        # Default fallback
        return "gpt-3.5-turbo"
    def _get_model_pricing(
        self, model_name: str, __user__: Optional[dict] = None
    ) -> Dict[str, float]:  # Feature 1: Add __user__
        """Get pricing information for a model."""
        # DEBUG: Log the input model name
        logger.debug(f"Entering _get_model_pricing with model_name: '{model_name}'")
        # Feature 1: Check user aliases first (handle both dict and Pydantic model for valves)
        resolved_model_name = model_name
        if __user__ and "valves" in __user__:
            user_valves_data = __user__["valves"]
            aliases = None
            if isinstance(
                user_valves_data, self.UserValves
            ):  # Check if it's the Pydantic model instance
                aliases = getattr(user_valves_data, "model_aliases", None)
            elif isinstance(user_valves_data, dict):  # Check if it's a dictionary
                aliases = user_valves_data.get("model_aliases")
            if aliases and isinstance(aliases, dict):
                resolved_model_name = aliases.get(model_name, model_name)
                if resolved_model_name != model_name:
                    logger.debug(
                        f"Pricing: Resolved alias '{model_name}' to '{resolved_model_name}'"
                    )
        # DEBUG: Log the resolved model name used for lookup
        logger.debug(f"Resolved model name for pricing lookup: '{resolved_model_name}'")
        if not resolved_model_name:
            logger.warning("Empty model name passed to _get_model_pricing, using default fallback.")
            return {"input": 0.0000015, "output": 0.000002} # Default to GPT-3.5 pricing
        # --- Pricing Lookup Logic ---
        pricing_info = None
        source = "None"
        # 1. Try exact match
        if resolved_model_name in self.model_pricing:
            pricing_info = self.model_pricing[resolved_model_name]
            source = "Exact Match"
        # 2. Try normalized name
        else:
            normalized = self.normalize_model_name(resolved_model_name)
            if normalized in self.model_pricing:
                pricing_info = self.model_pricing[normalized]
                source = "Normalized Match"
        # --- Validation and Fallback ---
        # Check if pricing was found and if it's non-zero (unless it's a known free model)
        is_free_model = "free" in resolved_model_name.lower()
        if pricing_info and (pricing_info.get("input", 0) > 0 or pricing_info.get("output", 0) > 0 or is_free_model):
            logger.debug(f"Pricing for '{resolved_model_name}' found via {source}: {pricing_info}")
            return pricing_info
        else:
            # If pricing is missing or zero for a non-free model, log a warning and use generic fallback
            # DEBUG: Include current pricing keys and original model name in the warning
            current_pricing_keys = list(self.model_pricing.keys())
            logger.warning(f"Pricing not found or zero via {source} for resolved_model_name='{resolved_model_name}' (original input='{model_name}'). Using generic fallback (GPT-3.5). Current pricing keys: {current_pricing_keys}")
            # The hardcoded_pricing dictionary is local to __init__ and merged into self.model_pricing.
            # We don't need to access it separately here. The check above handles if the merged value is valid.
            # If it wasn't found or was invalid, we use the generic fallback.
            return {"input": 0.0000015, "output": 0.000002} # Generic fallback pricing
    def _calculate_cost(
        self,
        input_tokens: int,
        output_tokens: int,
        model_name: str,
        __user__: Optional[dict] = None,
    ) -> float:  # Feature 1: Add __user__
        """Calculate the cost of a conversation based on token usage."""
        # --- Free model detection ---
        if "free" in model_name.lower():
            logger.info(
                f"Cost calculation skipped: '{model_name}' detected as free model."
            )
            return 0.0
        pricing = self._get_model_pricing(
            model_name, __user__
        )  # Feature 1: Pass __user__
        # Log the pricing being used for debugging
        logger.debug(f"Calculating cost for '{model_name}' using pricing: Input=${pricing.get('input', 0):.8f}, Output=${pricing.get('output', 0):.8f}")
        input_cost = input_tokens * pricing.get("input", 0) # Use .get with default 0
        output_cost = output_tokens * pricing.get("output", 0) # Use .get with default 0
        # The _emit_status call was incorrectly placed here by a previous edit.
        # It belongs inside the async _emit_status function itself.
        # Removing the misplaced call from this synchronous function.
        total_cost = input_cost + output_cost
        # Apply compensation factor (for profit, rounding, etc.)
        if self.valves.compensation != 1.0:
            total_cost *= self.valves.compensation
        return total_cost
    async def _emit_status(
        self, event_emitter, description: str, done: bool = False
    ) -> None:
        """Emit a status event to update the UI."""
        if not event_emitter or not self.valves.show_status:
            return
        logger.debug(f"Emitting status: {description}, done={done}")
        # Use simplified status to ensure compatibility
        message = {"type": "status", "data": {"description": description, "done": done}}
        try:
            await event_emitter(message) # Correct location for await
            logger.debug("Status emitted successfully")
        except Exception as e:
            logger.error(f"Error emitting status: {str(e)}")
    # INLET METHOD (Pre-processing)
    def inlet(self, body: dict, __user__: Optional[dict] = None) -> dict:
        """Process requests before they reach the LLM.
        This is the entry point for the Filter function. It initializes timing,
        session stats, and prepares for token counting. It also checks input prompt cost.
        Args:
            body: The request body containing messages
            __user__: Optional user information
        Returns:
            The potentially modified request body
        """
        # Start timing for performance metrics
        self.start_time = time.time()
        # Increment request counter
        self.request_counter += 1
        self.session_stats["requests"] += 1
        # Feature 2: Reset session cost if tracking mode is 'session'
        if self.valves.budget_tracking_mode == "session":
            self.session_stats["session_cost"] = 0.0
            logger.debug("Reset session cost for new request.")
        # Log the request
        logger.debug(f"Processing request #{self.request_counter}")
        # Log the request
        logger.debug(f"Processing request #{self.request_counter}")
        # --- Inlet Cost Prediction ---
        if self.valves.prompt_cost_warning_threshold > 0:
            try:
                model_name = body.get("model", "default")
                messages = body.get("messages", [])
                # Consider only the last user message for prompt cost check
                last_user_message_content = ""
                if messages and messages[-1].get("role") == "user":
                    content = messages[-1].get("content", "")
                    if isinstance(content, str):
                        last_user_message_content = content
                if last_user_message_content:
                    prompt_tokens = self.count_tokens(
                        last_user_message_content, model_name
                    )
                    pricing = self._get_model_pricing(model_name, __user__)
                    estimated_input_cost = prompt_tokens * pricing.get("input", 0)
                    if (
                        estimated_input_cost
                        >= self.valves.prompt_cost_warning_threshold
                    ):
                        logger.warning(
                            f"Estimated input cost (${estimated_input_cost:.6f}) for the current prompt "
                            f"exceeds threshold (${self.valves.prompt_cost_warning_threshold:.6f}). "
                            f"Model: {model_name}, Prompt Tokens: {prompt_tokens}"
                        )
                        # Note: Emitting status from inlet might be unreliable/unsupported in OpenWebUI
                        # await self._emit_status(__event_emitter__, f"ā ļø High Prompt Cost: ~${estimated_input_cost:.4f}", False) # Requires __event_emitter__ in signature
            except Exception as e:
                logger.error(f"Error during inlet cost prediction: {e}")
        # --- End Inlet Cost Prediction ---
        # Return the unmodified body (this filter doesn't modify inputs)
        return body
    # STREAM METHOD (Real-time processing)
    def stream(self, event: dict) -> dict:
        """Process streamed chunks from the LLM in real-time.
        This enhanced method tracks token generation speed in real-time using an adaptive window.
        Args:
            event: The stream event from the LLM
        Returns:
            The potentially modified event
        """
        # Get request ID from the event
        request_id = event.get("id", "")
        # Only process events for the current request to maintain status persistence
        # This prevents overwriting status of previous messages
        if not request_id or request_id == self.current_request_id:
            # Get current time for performance measurements
            now = time.time()
            # Initialize stream timing if not started
            if self.stream_start_time is None:
                self.stream_start_time = now
                self.stream_token_counter = 0
                self.stream_history.clear()  # Clear history for new stream
                # Reset deque maxlen based on config
                self.stream_history = deque(maxlen=self.valves.rate_avg_window_max)
            # Count tokens in this chunk
            chunk_token_count = 0
            for choice in event.get("choices", []):
                if "delta" in choice and "content" in choice.get("delta", {}):
                    content = choice["delta"]["content"]
                    # Use simple approximation for streaming - refine later with actual tokenizer if needed
                    token_count = len(content) // 4
                    self.stream_token_counter += token_count
                    chunk_token_count += token_count  # Track tokens in this specific chunk for rolling rate
            # Update rolling rate history
            if chunk_token_count > 0:
                self.stream_history.append((now, self.stream_token_counter))
            # Calculate and update rate periodically using configurable interval
            if now - self.last_stream_update >= self.valves.stream_update_interval:
                elapsed = now - self.stream_start_time
                rolling_rate = None
                # --- Adaptive Token Rate Averaging ---
                window_size = len(self.stream_history)
                if self.valves.adaptive_rate_averaging and window_size > 1:
                    # Determine adaptive window based on current overall rate
                    current_overall_rate = (
                        self.stream_token_counter / elapsed if elapsed > 0 else 0
                    )
                    if current_overall_rate > self.valves.rate_fast_threshold:
                        adaptive_window = self.valves.rate_avg_window_min
                    elif current_overall_rate < self.valves.rate_slow_threshold:
                        adaptive_window = self.valves.rate_avg_window_max
                    else:
                        # Linear interpolation between min and max window size
                        ratio = (
                            current_overall_rate - self.valves.rate_slow_threshold
                        ) / (
                            self.valves.rate_fast_threshold
                            - self.valves.rate_slow_threshold
                            + 1e-9
                        )  # Avoid div by zero
                        adaptive_window = int(
                            self.valves.rate_avg_window_max
                            - ratio
                            * (
                                self.valves.rate_avg_window_max
                                - self.valves.rate_avg_window_min
                            )
                        )
                        adaptive_window = max(
                            self.valves.rate_avg_window_min,
                            min(adaptive_window, self.valves.rate_avg_window_max),
                        )
                    # Adjust deque maxlen if needed (only if different from current)
                    if self.stream_history.maxlen != adaptive_window:
                        logger.debug(f"Adapting rate window to: {adaptive_window}")
                        # Create new deque with new maxlen and populate with recent history
                        new_history = deque(
                            list(self.stream_history)[-adaptive_window:],
                            maxlen=adaptive_window,
                        )
                        self.stream_history = new_history
                        window_size = len(
                            self.stream_history
                        )  # Update window size after potential resize
                    # Use the (potentially adapted) window slice, ensuring at least 2 samples
                    calc_window = min(
                        window_size, self.stream_history.maxlen
                    )  # Use current maxlen
                    if calc_window > 1:
                        # Correct slicing: Use negative index to get last 'calc_window' elements
                        relevant_history = list(self.stream_history)[-calc_window:]
                        first_time, first_tokens = relevant_history[0]
                        last_time, last_tokens = relevant_history[-1]
                        time_diff = last_time - first_time
                        token_diff = last_tokens - first_tokens
                        if time_diff > 0.1:  # Avoid division by zero or unstable rates
                            rolling_rate = token_diff / time_diff
                            logger.debug(
                                f"Adaptive Rate: Window={calc_window}, Rate={rolling_rate:.1f} t/s"
                            )
                # Fallback to overall rate if rolling rate couldn't be calculated
                if (
                    rolling_rate is None
                    and elapsed > 0
                    and self.stream_token_counter > 0
                ):
                    self.current_token_rate = self.stream_token_counter / elapsed
                elif rolling_rate is not None:
                    self.current_token_rate = (
                        rolling_rate  # Store the calculated rolling rate
                    )
                # --- End Adaptive Token Rate Averaging ---
                self.last_stream_update = now
        else:
            logger.debug(f"Ignoring stream event for non-current request: {request_id}")
        return event
    # OUTLET METHOD (Post-processing)
    async def outlet(
        self,
        body: dict,
        __event_emitter__: Callable[[Any], Awaitable[None]],
        __user__: Optional[dict] = None,
        __model__: Optional[dict] = None,
    ) -> dict:
        """Process responses after they've been generated by the LLM.
        This is where the main token counting, cost calculation, and metrics display happen.
        Args:
            body: The response body containing messages
            __event_emitter__: Function to emit status updates to UI
            __user__: Optional user information
            __model__: Optional model information
        Returns:
            The potentially modified response body
        """
        logger.debug("Entering outlet method...")
        # Re-parse custom models from valves *before* processing to ensure latest config is used
        logger.debug("Re-parsing custom models in outlet...")
        self._parse_and_apply_plaintext_models()
        logger.debug(f"Model pricing after re-parsing in outlet: {self.model_pricing}")
        # DEBUG: Log entire response body keys and usage info
        try:
            logger.debug(f"Response body keys: {list(body.keys())}")
            logger.debug(f"Response body 'usage': {body.get('usage')}")
        except Exception:
            pass
        # Store the current request ID to maintain status persistence
        self.current_request_id = body.get("id", f"request-{self.request_counter}")
        logger.debug(f"Processing outlet for request ID: {self.current_request_id}")
        # End timing for performance metrics
        end_time = time.time()
        elapsed_time = (
            end_time - self.start_time if self.start_time else 0
        )  # Handle case where inlet might not have run
        try:
            logger.debug(f"Response body keys: {list(body.keys())}")
            logger.debug(f"Response body 'usage': {body.get('usage')}")
        except Exception:
            pass
        # Extract model information
        model_name = __model__.get("id", "default") if __model__ else "default"
        # Get all messages
        messages = body.get("messages", [])
        # Extract the last assistant message
        get_last_assistant_message(messages)
        # Count tokens in all messages
        input_tokens = 0
        output_tokens = 0
        total_text_tokens = 0
        total_image_tokens = 0
        # Store token counts per message for trimming hint
        message_token_counts = []
        for idx, message in enumerate(messages):
            role = message.get("role", "")
            content = message.get("content", "")
            msg_tokens = 0
            image_tokens = 0
            text_tokens = 0
            if isinstance(content, str):
                # --- Image token estimation ---
                images = []
                images += re.findall(r"!\[.*?\]\((.*?)\)", content)
                if "data:image/" in content:
                    images += re.findall(r'(data:image/[^"\')\s]+)', content)
                images += re.findall(
                    r"(https?://[^\s)]+(?:\.png|\.jpg|\.jpeg|\.gif|\.webp|\.bmp|\.tiff))",
                    content,
                    re.IGNORECASE,
                )
                for img in images:
                    override = self.valves.model_image_token_overrides.get(
                        model_name
                    )
                    if override:
                        est_tokens = override
                    else:
                        est_tokens = self.valves.default_image_tokens
                    image_tokens += est_tokens
                # Count text tokens only
                text_tokens = self.count_tokens(content, model_name)
                # Add image tokens to message tokens
                msg_tokens = text_tokens + image_tokens
                # Sum separate totals
                total_text_tokens += text_tokens
                total_image_tokens += image_tokens
                if role == "user":
                    input_tokens += msg_tokens
                elif role == "assistant":
                    output_tokens += msg_tokens
            # Store token count for trimming hint analysis
            message_token_counts.append(
                {"role": role, "tokens": msg_tokens, "index": idx}
            )
            # Calculate total tokens
            total_tokens = input_tokens + output_tokens
        # Update session stats
        self.session_stats["total_tokens"] = total_tokens
        self.session_stats["input_tokens"] = input_tokens
        self.session_stats["output_tokens"] = output_tokens
        try:
            # Try to get context size and calculate percentage (Pass __user__ for alias check - Feature 1)
            context_size = self.get_context_size(model_name, __user__)
            context_percentage = (
                (total_tokens / context_size) * 100 if context_size > 0 else 0
            )
            # Calculate performance metrics using the potentially adaptive rate from stream
            tokens_per_second = (
                self.current_token_rate if self.current_token_rate > 0 else 0.0
            )
            if (
                elapsed_time > 0 and tokens_per_second == 0.0
            ):  # Fallback if stream rate wasn't set
                tokens_per_second = output_tokens / elapsed_time
            self.session_stats["avg_tokens_per_sec"] = tokens_per_second
            self.session_stats["message_generation_time"] = elapsed_time
            # Calculate cost (Pass __user__ for alias check - Feature 1)
            total_cost = self._calculate_cost(
                input_tokens, output_tokens, model_name, __user__
            )
            self.session_stats["total_cost"] = total_cost
            # Feature 2: Update and get current period's cost
            current_period_cost = 0.0
            if self.valves.budget_tracking_mode == "session":
                self.session_stats["session_cost"] += total_cost
                current_period_cost = self.session_stats["session_cost"]
            elif self.valves.budget_tracking_mode == "daily":
                current_period_cost = self._load_and_update_daily_cost(total_cost)
                self.session_stats["daily_cost"] = (
                    current_period_cost  # Store for potential display
                )
            # Feature 2: Calculate DAILY/SESSION budget info
            budget_info = None
            user_budget = self.valves.budget_amount # Default global daily/session budget
            if __user__ and "valves" in __user__:
                user_valves_data = __user__["valves"]
                override_budget = None
                if isinstance(
                    user_valves_data, self.UserValves
                ):  # Pydantic model instance
                    override_budget = getattr(
                        user_valves_data, "budget_amount", None
                    )
                elif isinstance(user_valves_data, dict):  # Dictionary
                    override_budget = user_valves_data.get("budget_amount")
                if override_budget is not None:
                    user_budget = override_budget # Use user override if set and valid
            if user_budget > 0: # Only calculate if budget is enabled
                budget_remaining = user_budget - current_period_cost
                budget_used_percent = (current_period_cost / user_budget * 100) if user_budget > 0 else 0
                budget_warning = (budget_used_percent >= self.valves.budget_warning_percentage)
                budget_info = {
                    "budget": user_budget, # The budget amount used (daily/session)
                    "remaining": budget_remaining,
                    "used_percent": budget_used_percent,
                    "warning": budget_warning,
                }
            # --- Monthly Budget Calculation (using potential user override) ---
            user_monthly_budget = self.valves.monthly_budget_amount # Default global monthly budget
            if __user__ and "valves" in __user__:
                 user_valves_data = __user__["valves"]
                 override_monthly_budget = None
                 if isinstance(user_valves_data, self.UserValves): # Pydantic model
                     override_monthly_budget = getattr(user_valves_data, "monthly_budget_amount", None)
                 elif isinstance(user_valves_data, dict): # Dictionary
                     override_monthly_budget = user_valves_data.get("monthly_budget_amount")
                 if override_monthly_budget is not None:
                     user_monthly_budget = override_monthly_budget # Use user override
            # Update the monthly cost file *now* with the final total_cost for this request
            # This ensures the cost is persisted correctly before generating the final status message
            monthly_cost_data = self._load_and_update_monthly_cost(total_cost)
            # Recalculate monthly budget info using the potentially overridden budget
            monthly_budget_info = None
            if user_monthly_budget > 0 and monthly_cost_data:
                current_month_total_cost = monthly_cost_data.get("current_month_cost", 0.0)
                monthly_remaining = user_monthly_budget - current_month_total_cost
                monthly_used_percent = (current_month_total_cost / user_monthly_budget * 100) if user_monthly_budget > 0 else 0
                monthly_warning = (monthly_used_percent >= self.valves.budget_warning_percentage)
                monthly_budget_info = {
                    "budget": user_monthly_budget, # The effective monthly budget
                    "remaining": monthly_remaining,
                    "used_percent": monthly_used_percent,
                    "warning": monthly_warning,
                    "current_month_cost": current_month_total_cost # Include total cost for display if budget=0
                }
            elif monthly_cost_data: # Handle case where budget is 0 but we want to show total spent
                 monthly_budget_info = {
                     "budget": 0.0,
                     "remaining": 0.0,
                     "used_percent": 0.0,
                     "warning": False,
                     "current_month_cost": monthly_cost_data.get("current_month_cost", 0.0)
                 }
            # Format metrics for display
            progress_bar = self._format_progress_bar(context_percentage)
            # Determine status level based on context usage
            status_level = "normal"
            if context_percentage >= self.valves.critical_at_percentage:
                status_level = "critical"
            elif context_percentage >= self.valves.warn_at_percentage:
                status_level = "warning"
            # --- Intelligent Trimming Hint ---
            trimming_hint = None
            if (
                status_level == "critical"
                and self.valves.intelligent_trimming_hint
                and len(messages) > 1
            ):
                tokens_to_trim = 0
                # Analyze first few messages (excluding potential system prompt at index 0)
                num_msgs_to_analyze = min(
                    self.valves.trimming_hint_message_count, len(messages) - 1
                )
                start_index = 1 if messages[0].get("role") == "system" else 0
                # Ensure we don't go out of bounds and analyze at least one message if possible
                end_index = min(
                    start_index + num_msgs_to_analyze, len(message_token_counts)
                )
                if end_index > start_index:
                    for i in range(start_index, end_index):
                        tokens_to_trim += message_token_counts[i]["tokens"]
                    if tokens_to_trim > 0:
                        trimming_hint = f"{SCISSORS_EMOJI} Trim ~{self._format_number(tokens_to_trim)} tokens?"
            # --- End Intelligent Trimming Hint ---
            # Reset stream tracking for next generation
            self.stream_token_counter = 0
            self.stream_start_time = None
            self.stream_history.clear()  # Clear history
            self.current_token_rate = 0.0  # Reset rate
            # Determine status prefix based on context usage
            if context_percentage >= self.valves.critical_at_percentage:
                status_prefix = f"{WARNING_EMOJI} CRITICAL: "
            elif context_percentage >= self.valves.warn_at_percentage:
                status_prefix = f"{WARNING_EMOJI} WARNING: "
            else:
                status_prefix = ""
            # Get cost comparisons for experimental models
            cost_comparisons = None
            if (
                self._is_experimental_model(model_name)
                and self.valves.show_cost_summary
            ):
                # Select models to compare with - use correct model ID format
                comparison_models = ["openai/gpt-4o"]  # Removed claude-3-sonnet as requested
                cost_comparisons = {}
                # Calculate costs for comparison models
                for comp_model in comparison_models:
                    if comp_model != model_name:  # Don't compare to self
                        # Pass __user__ for alias check (Feature 1)
                        comp_pricing = self._get_model_pricing(comp_model, __user__)
                        comp_cost = (input_tokens * comp_pricing["input"]) + (
                            output_tokens * comp_pricing["output"]
                        )
                        cost_comparisons[comp_model] = comp_cost
            # Format cost with appropriate precision
            formatted_cost = self._format_cost(total_cost, context_size)
            # Use the calibration status loaded during initialization
            calibration_status = self.calibration_status_display
            # Generate optimized status message with controlled length, passing the calculated monthly info
            status_message = self._generate_status_message(
                status_prefix=status_prefix,
                total_tokens=total_tokens,
                context_size=context_size,
                context_percentage=context_percentage,
                progress_bar=progress_bar,
                input_tokens=input_tokens,
                output_tokens=output_tokens,
                cost=formatted_cost,
                model_name=model_name,  # Pass model_name for cost breakdown
                elapsed_time=elapsed_time,
                tokens_per_second=tokens_per_second,  # Pass the calculated rate
                rolling_rate=(
                    self.current_token_rate
                    if self.valves.adaptive_rate_averaging
                    else None
                ),  # Pass rolling rate if adaptive
                cost_comparisons=cost_comparisons,
                status_level=status_level,
                budget_info=budget_info, # Pass daily/session budget info
                monthly_budget_info=monthly_budget_info, # Pass the final monthly budget info
                trimming_hint=trimming_hint,  # Pass the hint
                # --- Pass additional metrics (Ensure they are defined) ---
                daily_spend=self.session_stats.get("daily_cost", 0.0), # Use .get with default
                text_tokens=total_text_tokens, # Already defined in this scope
                image_tokens=total_image_tokens, # Already defined in this scope
                calibration_status=calibration_status, # Pass loaded status
            )
            # Update user cost tracking if user info is available
            if __user__ and "email" in __user__:
                self._update_user_cost(
                    __user__["email"],
                    model_name,
                    input_tokens,
                    output_tokens,
                    total_cost,
                )
            logger.info(
                f"Request #{self.request_counter} completed: {total_tokens} tokens ({context_percentage:.1f}%), ${total_cost:.6f}"
            )
            # Emit final status update to UI
            if self.valves.show_status:
                await self._emit_status(__event_emitter__, status_message, True)
            return body
        except self.ModelNotRecognizedError as e:
            # Model not recognized - Log the error and emit a clear status message
            logger.warning(f"Model not recognized: {model_name}. Cannot determine context size or pricing.")
            self.session_stats["error_count"] += 1
            # Generate a simple error message for the UI
            status_message = f"{WARNING_EMOJI} Model not recognized: '{model_name}'"
            # Emit the error status update
            if self.valves.show_status:
                logger.debug("Attempting final status emission for unrecognized model...")
                await self._emit_status(__event_emitter__, status_message, True)
            # --- Calibration discrepancy logging (still useful even if model unknown) ---
            try:
                ui_usage = body.get("usage", {})
                plugin_total = total_tokens
                ui_total = ui_usage.get("total_tokens", 0)
                if (
                    ui_total and abs(plugin_total - ui_total) / max(ui_total, 1) > 0.05
                ):  # >5% diff
                    log_dir = os.path.join(os.getcwd(), "logs")
                    os.makedirs(log_dir, exist_ok=True)
                    log_path = os.path.join(log_dir, "token_discrepancies.log")
                    with open(log_path, "a", encoding="utf-8") as f:
                        f.write(
                            f"{datetime.now().isoformat()} | Model: {model_name} | Plugin: {plugin_total} | UI: {ui_total} | Diff: {plugin_total - ui_total}\n"
                        )
            except Exception as e:
                logger.error(f"Error logging token discrepancy: {e}")
            # Return the unmodified body (this filter doesn't modify outputs)
            logger.debug("Exiting outlet method (main try block successful).")  # DEBUG
            return body
        except self.ModelNotRecognizedError as e:
            # Model not recognized - Log the error and emit a clear status message
            logger.warning(f"Model not recognized: {model_name}. Cannot determine context size or pricing.")
            self.session_stats["error_count"] += 1
            # Generate a simple error message for the UI
            status_message = f"{WARNING_EMOJI} Model not recognized: '{model_name}'"
            # Emit the error status update
            if self.valves.show_status:
                logger.debug("Attempting final status emission for unrecognized model...")
                await self._emit_status(__event_emitter__, status_message, True)
            # --- Calibration discrepancy logging (still useful even if model unknown) ---
            try:
                ui_usage = body.get("usage", {})
                plugin_total = total_tokens
                ui_total = ui_usage.get("total_tokens", 0)
                if (
                    ui_total and abs(plugin_total - ui_total) / max(ui_total, 1) > 0.05
                ):  # >5% diff
                    log_dir = os.path.join(os.getcwd(), "logs")
                    os.makedirs(log_dir, exist_ok=True)
                    log_path = os.path.join(log_dir, "token_discrepancies.log")
                    with open(log_path, "a", encoding="utf-8") as f:
                        f.write(
                            f"{datetime.now().isoformat()} | Model: {model_name} | Plugin: {plugin_total} | UI: {ui_total} | Diff: {plugin_total - ui_total}\n"
                        )
            except Exception as e:
                logger.error(f"Error logging token discrepancy: {e}")
            # Return the unmodified body (this filter doesn't modify outputs)
            logger.debug("Exiting outlet method (ModelNotRecognizedError block).") # DEBUG
            return body
        except (KeyError, TypeError, ValueError) as specific_err:
            logger.error(f"Data or calculation error in outlet processing: {specific_err.__class__.__name__} - {str(specific_err)}")
            logger.exception("Exception details:") # Keep traceback for context
            self.session_stats["error_count"] += 1
            # Emit a generic error status if possible
            if self.valves.show_status:
                try:
                    error_message = f"{WARNING_EMOJI} ERROR processing data"
                    await self._emit_status(__event_emitter__, error_message, True)
                except Exception as emit_err:
                    logger.error(f"Error emitting status during specific error handling: {emit_err}")
        except Exception as e:  # General exception handler for the main try block
            logger.error(f"General error in outlet processing: {e}")  # Clarify log message
            logger.exception("Exception details:")  # Log full traceback
            self.session_stats["error_count"] += 1  # Increment error count
            # Emit an error status if possible
            if self.valves.show_status:
                try:
                    error_message = f"{WARNING_EMOJI} ERROR: {str(e)}"
                    await self._emit_status(__event_emitter__, error_message, True)
                except Exception as emit_err:  # Catch specific exception during emission
                    logger.error(f"Error emitting status during general exception handling: {emit_err}")
                    # pass  # Decide if we need to do anything else here
            # Return the unmodified body in case of error
            logger.debug("Exiting outlet method (except block).")  # DEBUG
            return body