Our paper on unlearnable examples has been accepted to AAAI 2026! 🎉
- aidphoenix82
- 4 days ago
- 1 min read
LLMs have been widely deployed across a variety of specialized domains. Such domain-specific models typically build upon a foundation model and are adapted to target domains through techniques such as retrieval-augmented generation, full fine-tuning, or parameter-efficient fine-tuning. As responsible domain-specific language models, their behavioral boundaries should be strictly confined to the intended professional scope, with queries that fall outside the predefined domain being properly identified and reasonably declined. Motivated by this requirement, this paper proposes a runtime monitoring framework that detects anomalies in the model’s internal representations to identify potentially out-of-domain queries in user inputs, thereby enabling appropriate refusal or safety-aligned responses in subsequent stages.






Comments