TY - GEN
T1 - FROM MODELS TO MICROTHEORIES
T2 - 13th International Conference on Learning Representations, ICLR 2025
AU - Weir, Nathaniel
AU - Mishra, Bhavana Dalvi
AU - Weller, Orion
AU - Tafjord, Oyvind
AU - Hornstein, Samuel
AU - Sabol, Alexander
AU - Jansen, Peter
AU - Van Durme, Benjamin
AU - Clark, Peter
N1 - Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Recent reasoning methods (e.g., chain-of-thought) help users understand how language models (LMs) answer a single question, but they do little to reveal the LM's overall understanding, or “theory,” about the question's topic, making it still hard to trust the model. Our goal is to materialize such theories - here called microtheories (a linguistic analog of logical microtheories (Blair et al., 1992)) - as a set of sentences encapsulating an LM's core knowledge about a topic. These statements systematically work together to entail answers to a set of questions to both engender trust and improve performance. Our approach is to first populate a knowledge store with (model-generated) sentences that entail answers to training questions, and then distill those down to a core microtheory which is concise, general, and non-redundant. We show that, when added to a general corpus (e.g., Wikipedia), microtheories can supply critical information not necessarily present in the corpus, improving both a model's ability to ground its answers to verifiable knowledge (i.e., show how answers are systematically entailed by documents in the corpus, grounding up to +8% more answers), and the accuracy of those grounded answers (up to +8% absolute). We also show that, in a human evaluation in the medical domain, our distilled microtheories contain a significantly higher concentration of topically critical facts than the non-distilled knowledge store. Finally, we show we can quantify the coverage of a microtheory for a topic (characterized by a dataset) using a notion of p-relevance. Together, these suggest that microtheories are an efficient distillation of an LM's topic-relevant knowledge, that they can usefully augment existing corpora, and can provide both performance gains and an interpretable, verifiable window into the model's knowledge of a topic.
AB - Recent reasoning methods (e.g., chain-of-thought) help users understand how language models (LMs) answer a single question, but they do little to reveal the LM's overall understanding, or “theory,” about the question's topic, making it still hard to trust the model. Our goal is to materialize such theories - here called microtheories (a linguistic analog of logical microtheories (Blair et al., 1992)) - as a set of sentences encapsulating an LM's core knowledge about a topic. These statements systematically work together to entail answers to a set of questions to both engender trust and improve performance. Our approach is to first populate a knowledge store with (model-generated) sentences that entail answers to training questions, and then distill those down to a core microtheory which is concise, general, and non-redundant. We show that, when added to a general corpus (e.g., Wikipedia), microtheories can supply critical information not necessarily present in the corpus, improving both a model's ability to ground its answers to verifiable knowledge (i.e., show how answers are systematically entailed by documents in the corpus, grounding up to +8% more answers), and the accuracy of those grounded answers (up to +8% absolute). We also show that, in a human evaluation in the medical domain, our distilled microtheories contain a significantly higher concentration of topically critical facts than the non-distilled knowledge store. Finally, we show we can quantify the coverage of a microtheory for a topic (characterized by a dataset) using a notion of p-relevance. Together, these suggest that microtheories are an efficient distillation of an LM's topic-relevant knowledge, that they can usefully augment existing corpora, and can provide both performance gains and an interpretable, verifiable window into the model's knowledge of a topic.
UR - https://www.scopus.com/pages/publications/105010265718
UR - https://www.scopus.com/pages/publications/105010265718#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105010265718
T3 - 13th International Conference on Learning Representations, ICLR 2025
SP - 49706
EP - 49731
BT - 13th International Conference on Learning Representations, ICLR 2025
PB - International Conference on Learning Representations, ICLR
Y2 - 24 April 2025 through 28 April 2025
ER -