top of page

Our paper on unlearnable examples has been accepted to AAAI 2026! 🎉

  • aidphoenix82
  • 4 days ago
  • 1 min read

LLMs have been widely deployed across a variety of specialized domains. Such domain-specific models typically build upon a foundation model and are adapted to target domains through techniques such as retrieval-augmented generation, full fine-tuning, or parameter-efficient fine-tuning. As responsible domain-specific language models, their behavioral boundaries should be strictly confined to the intended professional scope, with queries that fall outside the predefined domain being properly identified and reasonably declined. Motivated by this requirement, this paper proposes a runtime monitoring framework that detects anomalies in the model’s internal representations to identify potentially out-of-domain queries in user inputs, thereby enabling appropriate refusal or safety-aligned responses in subsequent stages.


 
 
 

Comments


logo_centered_text_spaced_larger.png

#09-01 Hong Leong Building, 16 Raffles Quay, Singapore 048581

Get in touch with us to embark on your AI risk management journey

  • iScreen Shoter - Google Chrome - 260130132403
  • LinkedIn
  • YouTube
bottom of page