arxiv:2510.24592

ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization

Published on Oct 28

· Submitted by

Guoxin Chen on Oct 30

Upvote

Authors:

Guoxin Chen ,

Abstract

ReForm, a reflective autoformalization method, improves the semantic accuracy of formal statements generated from natural language mathematics through iterative refinement and semantic consistency evaluation.

AI-generated summary

Autoformalization, which translates natural language mathematics into machine-verifiable formal statements, is critical for using formal mathematical reasoning to solve math problems stated in natural language. While Large Language Models can generate syntactically correct formal statements, they often fail to preserve the original problem's semantic intent. This limitation arises from the LLM approaches' treating autoformalization as a simplistic translation task which lacks mechanisms for self-reflection and iterative refinement that human experts naturally employ. To address these issues, we propose ReForm, a Reflective Autoformalization method that tightly integrates semantic consistency evaluation into the autoformalization process. This enables the model to iteratively generate formal statements, assess its semantic fidelity, and self-correct identified errors through progressive refinement. To effectively train this reflective model, we introduce Prospective Bounded Sequence Optimization (PBSO), which employs different rewards at different sequence positions to ensure that the model develops both accurate autoformalization and correct semantic validations, preventing superficial critiques that would undermine the purpose of reflection. Extensive experiments across four autoformalization benchmarks demonstrate that ReForm achieves an average improvement of 17.2 percentage points over the strongest baselines. To further ensure evaluation reliability, we introduce ConsistencyCheck, a benchmark of 859 expert-annotated items that not only validates LLMs as judges but also reveals that autoformalization is inherently difficult: even human experts produce semantic errors in up to 38.5% of cases.

View arXiv page View PDF GitHub 15 Add to collection

Community

GuoxinChen

Paper author Paper submitter 4 days ago

ReForm is a reflective Autoformalization framework that enables LLMs to iteratively generate, validate, and self-correct formal mathematical statements (Lean4) through an integrated generation-validation loop.

Reflective Autoformalization Paradigm: Introduces an iterative "generate → validate → refine" cycle that enables models to autonomously identify and correct semantic errors, unifying generation and verification in a single process.
Prospective Bounded Sequence Optimization (PBSO): A novel RL algorithm designed for heterogeneous rewards at different sequence positions, enabling stable training of models with both accurate autoformalization and reliable semantic validation.
ConsistencyCheck Benchmark: 859 expert-annotated items for evaluating semantic consistency, revealing that even human experts produce errors in up to 38.5% of cases.