Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeS^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Recent studies have demonstrated the effectiveness of LLM test-time scaling. However, existing approaches to incentivize LLMs' deep thinking abilities generally require large-scale data or significant training efforts. Meanwhile, it remains unclear how to improve the thinking abilities of less powerful base models. In this work, we introduce S^2R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference. Specifically, we first initialize LLMs with iterative self-verification and self-correction behaviors through supervised fine-tuning on carefully curated data. The self-verification and self-correction skills are then further strengthened by both outcome-level and process-level reinforcement learning, with minimized resource requirements, enabling the model to adaptively refine its reasoning process during inference. Our results demonstrate that, with only 3.1k self-verifying and self-correcting behavior initialization samples, Qwen2.5-math-7B achieves an accuracy improvement from 51.0\% to 81.6\%, outperforming models trained on an equivalent amount of long-CoT distilled data. Extensive experiments and analysis based on three base models across both in-domain and out-of-domain benchmarks validate the effectiveness of S^2R. Our code and data are available at https://github.com/NineAbyss/S2R.
Classifier-Free Diffusion Guidance
Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. Classifier guidance combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. It also raises the question of whether guidance can be performed without a classifier. We show that guidance can be indeed performed by a pure generative model without such a classifier: in what we call classifier-free guidance, we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
In this report, we present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the Qwen2.5 series lies in integrating the philosophy of self-improvement throughout the entire pipeline, from pre-training and post-training to inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized to generate large-scale, high-quality mathematical data. (2) In the post-training phase, we develop a reward model (RM) by conducting massive sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative evolution of data in supervised fine-tuning (SFT). With a stronger SFT model, it's possible to iteratively train and update the RM, which in turn guides the next round of SFT data iteration. On the final SFT model, we employ the ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct. (3) Furthermore, during the inference stage, the RM is used to guide sampling, optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced mathematical reasoning capabilities, including Chain-of-Thought (CoT) and Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and AIME24, covering a range of difficulties from grade school level to math competition problems.
Active causal structure learning with advice
We introduce the problem of active causal structure learning with advice. In the typical well-studied setting, the learning algorithm is given the essential graph for the observational distribution and is asked to recover the underlying causal directed acyclic graph (DAG) G^* while minimizing the number of interventions made. In our setting, we are additionally given side information about G^* as advice, e.g. a DAG G purported to be G^*. We ask whether the learning algorithm can benefit from the advice when it is close to being correct, while still having worst-case guarantees even when the advice is arbitrarily bad. Our work is in the same space as the growing body of research on algorithms with predictions. When the advice is a DAG G, we design an adaptive search algorithm to recover G^* whose intervention cost is at most O(max{1, log psi}) times the cost for verifying G^*; here, psi is a distance measure between G and G^* that is upper bounded by the number of variables n, and is exactly 0 when G=G^*. Our approximation factor matches the state-of-the-art for the advice-less setting.
G^2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
Reinforcement Learning with Verifiable Rewards (RLVR) has markedly enhanced the reasoning abilities of large language models (LLMs). Its success, however, largely depends on strong base models with rich world knowledge, yielding only modest improvements for small-size language models (SLMs). To address this limitation, we investigate Guided GRPO, which injects ground-truth reasoning steps into roll-out trajectories to compensate for SLMs' inherent weaknesses. Through a comprehensive study of various guidance configurations, we find that naively adding guidance delivers limited gains. These insights motivate G^2RPO-A, an adaptive algorithm that automatically adjusts guidance strength in response to the model's evolving training dynamics. Experiments on mathematical reasoning and code-generation benchmarks confirm that G^2RPO-A substantially outperforms vanilla GRPO. Our code and models are available at https://github.com/T-Lab-CUHKSZ/G2RPO-A.
Volumes of Nullhomotopies in Nilpotent Spaces
The Shadowing Principle of Manin has proved a valuable tool for addressing questions of quantitative topology raised by Gromov in the late 1900s. The principle informally provides a way for bounded algebraic maps between differential graded algebras to be translated into nearby genuine maps between their geometric realizations. We extend this principle to finite towers of principal K(G,n) fibrations, and in particular apply this construction to nilpotent spaces. As a specific application of the extended principle, we provide upper bounds on the asymptotic behavior of volumes of nullhomotopies of Lipschitz maps into nilpotent spaces. We further refine these bounds in the case when c = 1 to nearly meet those of the simply connected setting. We similarly refine these bounds in the event the target space is coformal, and demonstrate that the bounds in this setting are nearly sharp.
Greed is Good: A Unifying Perspective on Guided Generation
Training-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of flow/diffusion models. Generally speaking, two families of techniques have emerged for solving this problem for gradient-based guidance: namely, posterior guidance (i.e., guidance via projecting the current sample to the target distribution via the target prediction model) and end-to-end guidance (i.e., guidance by performing backpropagation throughout the entire ODE solve). In this work, we show that these two seemingly separate families can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance. We explore the theoretical connections between these two families and provide an in-depth theoretical of these two techniques relative to the continuous ideal gradients. Motivated by this analysis we then show a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients. We then validate this work on several inverse image problems and property-guided molecular generation.
Classifier-free Guidance with Adaptive Scaling
Classifier-free guidance (CFG) is an essential mechanism in contemporary text-driven diffusion models. In practice, in controlling the impact of guidance we can see the trade-off between the quality of the generated images and correspondence to the prompt. When we use strong guidance, generated images fit the conditioned text perfectly but at the cost of their quality. Dually, we can use small guidance to generate high-quality results, but the generated images do not suit our prompt. In this paper, we present beta-CFG (beta-adaptive scaling in Classifier-Free Guidance), which controls the impact of guidance during generation to solve the above trade-off. First, beta-CFG stabilizes the effects of guiding by gradient-based adaptive normalization. Second, beta-CFG uses the family of single-modal (beta-distribution), time-dependent curves to dynamically adapt the trade-off between prompt matching and the quality of samples during the diffusion denoising process. Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly combines distinct conditional models together to provide enhanced control over samples. However, this approach overlooks nonlinear effects that become significant when guidance scale is large. To address this issue, we propose characteristic guidance, a guidance method that provides first-principle non-linear correction for classifier-free guidance. Such correction forces the guided DDPMs to respect the Fokker-Planck (FP) equation of diffusion process, in a way that is training-free and compatible with existing sampling methods. Experiments show that characteristic guidance enhances semantic characteristics of prompts and mitigate irregularities in image generation, proving effective in diverse applications ranging from simulating magnet phase transitions to latent space sampling.
Using Explanations to Guide Models
Deep neural networks are highly performant, but might base their decision on spurious or background features that co-occur with certain classes, which can hurt generalization. To mitigate this issue, the usage of 'model guidance' has gained popularity recently: for this, models are guided to be "right for the right reasons" by regularizing the models' explanations to highlight the right features. Experimental validation of these approaches has thus far however been limited to relatively simple and / or synthetic datasets. To gain a better understanding of which model-guiding approaches actually transfer to more challenging real-world datasets, in this work we conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets, and show that model guidance can sometimes even improve model performance. In this context, we further propose a novel energy loss, show its effectiveness in directing the model to focus on object features. We also show that these gains can be achieved even with a small fraction (e.g. 1%) of bounding box annotations, highlighting the cost effectiveness of this approach. Lastly, we show that this approach can also improve generalization under distribution shifts. Code will be made available.
Dreamguider: Improved Training free Diffusion-based Conditional Generation
Diffusion models have emerged as a formidable tool for training-free conditional generation.However, a key hurdle in inference-time guidance techniques is the need for compute-heavy backpropagation through the diffusion network for estimating the guidance direction. Moreover, these techniques often require handcrafted parameter tuning on a case-by-case basis. Although some recent works have introduced minimal compute methods for linear inverse problems, a generic lightweight guidance solution to both linear and non-linear guidance problems is still missing. To this end, we propose Dreamguider, a method that enables inference-time guidance without compute-heavy backpropagation through the diffusion network. The key idea is to regulate the gradient flow through a time-varying factor. Moreover, we propose an empirical guidance scale that works for a wide variety of tasks, hence removing the need for handcrafted parameter tuning. We further introduce an effective lightweight augmentation strategy that significantly boosts the performance during inference-time guidance. We present experiments using Dreamguider on multiple tasks across multiple datasets and models to show the effectiveness of the proposed modules. To facilitate further research, we will make the code public after the review process.
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024), which has now surpassed an average gold medalist in solving Olympiad geometry problems. To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects, and problems containing linear equations of angles, ratios, and distances. This, together with other additions, has markedly improved the coverage rate of the AlphaGeometry language on International Math Olympiads (IMO) 2000-2024 geometry problems from 66% to 88%. The search process of AlphaGeometry2 has also been greatly improved through the use of Gemini architecture for better language modeling, and a novel knowledge-sharing mechanism that combines multiple search trees. Together with further enhancements to the symbolic engine and synthetic data generation, we have significantly boosted the overall solving rate of AlphaGeometry2 to 84% for all geometry problems over the last 25 years, compared to 54% previously. AlphaGeometry2 was also part of the system that achieved silver-medal standard at IMO 2024 https://dpmd.ai/imo-silver. Last but not least, we report progress towards using AlphaGeometry2 as a part of a fully automated system that reliably solves geometry problems directly from natural language input.
The Numerical Stability of Hyperbolic Representation Learning
Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar\'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar\'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar\'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Self-improvement methods enable large language models (LLMs) to generate solutions themselves and iteratively train on filtered, high-quality rationales. This process proves effective and reduces the reliance on human supervision in LLMs' reasoning, but the performance soon plateaus. We delve into the process and find that models tend to over-sample on easy queries and under-sample on queries they have yet to master. As iterations proceed, this imbalance in sampling is exacerbated, leading to a long-tail distribution where solutions to difficult queries almost diminish. This phenomenon limits the performance gain of self-improving models. A straightforward solution is brute-force sampling to balance the distribution, which significantly raises computational costs. In this paper, we introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data. It leverages Socratic-style guidance signals to help LLM reasoning with complex queries, reducing the exploration effort and minimizing computational overhead. Experiments on four models across diverse mathematical tasks show that GSI strikes a balance between performance and efficiency, while also being effective on held-out tasks.
Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model
Recent advancements in diffusion models have revolutionized generative modeling. However, the impressive and vivid outputs they produce often come at the cost of significant model scaling and increased computational demands. Consequently, building personalized diffusion models based on off-the-shelf models has emerged as an appealing alternative. In this paper, we introduce a novel perspective on conditional generation for transferring a pre-trained model. From this viewpoint, we propose *Domain Guidance*, a straightforward transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain. Domain Guidance shares a formulation similar to advanced classifier-free guidance, facilitating better domain alignment and higher-quality generations. We provide both empirical and theoretical analyses of the mechanisms behind Domain Guidance. Our experimental results demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD_DINOv2 compared to standard fine-tuning. Notably, existing fine-tuned models can seamlessly integrate Domain Guidance to leverage these benefits, without additional training.
Bridging the Gap: Addressing Discrepancies in Diffusion Model Training for Classifier-Free Guidance
Diffusion models have emerged as a pivotal advancement in generative models, setting new standards to the quality of the generated instances. In the current paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior of these models. While the prevalent classifier-free guidance technique works well, it's not without flaws. At higher values for the guidance scale parameter w, we often get out of distribution samples and mode collapse, whereas at lower values for w we may not get the desired specificity. To address these challenges, we introduce an updated loss function that better aligns training objectives with sampling behaviors. Experimental validation with FID scores on CIFAR-10 elucidates our method's ability to produce higher quality samples with fewer sampling timesteps, and be more robust to the choice of guidance scale w. We also experiment with fine-tuning Stable Diffusion on the proposed loss, to provide early evidence that large diffusion models may also benefit from this refined loss function.
Phemenological Modeling of Eclipsing Binary Stars
We review the method NAV (New Algol Variable) first introduced in 2012Ap.....55..536A, which uses the locally-dependent shapes of eclipses in an addition to the trigonometric polynomial of the second order (which typically describes the "out-of-eclipse" part of the light curve with effects of reflection, ellipticity and O'Connell). Eclipsing binary stars are believed to show distinct eclipses only if belonging to the EA type. With a decreasing eclipse width, the statistically optimal value of the trigonometric polynomial s (2003ASPC..292..391A) drastically increases from ~2 for elliptic (EL) variables without eclipses, ~6-8 for EW and up to ~30-50 for some EA with narrow eclipses. In this case of large number of parameters, the smoothing curve becomes very noisy and apparent waves (the Gibbs phenomenon) may be seen. The NAV set of the parameters may be used for classification in the GCVS, VSX and similar catalogs. The maximal number of parameters is m=12, which corresponds to s=5, if correcting both the period and the initial epoch. We have applied the method to few stars, also in a case of multi-color photometry (2015JASS...32..127A), when it is possible to use the phenomenological parameters from the NAV fit to estimate physical parameters using statistical dependencies. We conclude that the NAV approximation is better than the TP one even for the case of EW-type stars with much wider eclipses. It may also be used to determine timings (see 2005ASPC..335...37A for a review of methods) or to determine parameters in the case of variable period, using a complete light curve modeling the phase variations. The method is illustrated on 2MASS J11080447-6143290 (EA-type), USNO-B1.0 1265-0306001 and USNO-B1.0 1266-0313413 (EW-type) and compared to various other methods from the literature.
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models
We study the process through which reasoning models trained with reinforcement learning on verifiable rewards (RLVR) can learn to solve new problems. We find that RLVR drives performance in two main ways: (1) by compressing pass@k into pass@1 and (2) via "capability gain" in which models learn to solve new problems that they previously could not solve even at high k. We find that while capability gain exists across model scales, learning to solve new problems is primarily driven through self-distillation. We demonstrate these findings across model scales ranging from 0.5B to 72B parameters on >500,000 reasoning problems with prompts and verifiable final answers across math, science, and code domains. We further show that we can significantly improve pass@k rates by leveraging natural language guidance for the model to consider within context while still requiring the model to derive a solution chain from scratch. Based of these insights, we derive Guide -- a new class of online training algorithms. Guide adaptively incorporates hints into the model's context on problems for which all rollouts were initially incorrect and adjusts the importance sampling ratio for the "off-policy" trajectories in order to optimize the policy for contexts in which the hints are no longer present. We describe variants of Guide for GRPO and PPO and empirically show that Guide-GRPO on 7B and 32B parameter models improves generalization over its vanilla counterpart with up to 4% macro-average improvement across math benchmarks. We include careful ablations to analyze Guide's components and theoretically analyze Guide's learning efficiency.
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture the dynamics of the world. This large-scale pre-training, involving 38 million video clips and over 50 billion tokens, equips GR-2 with the ability to generalize across a wide range of robotic tasks and environments during subsequent policy learning. Following this, GR-2 is fine-tuned for both video generation and action prediction using robot trajectories. It exhibits impressive multi-task learning capabilities, achieving an average success rate of 97.7% across more than 100 tasks. Moreover, GR-2 demonstrates exceptional generalization to new, previously unseen scenarios, including novel backgrounds, environments, objects, and tasks. Notably, GR-2 scales effectively with model size, underscoring its potential for continued growth and application. Project page: https://gr2-manipulation.github.io.
Steering LLM Thinking with Budget Guidance
Recent deep-thinking large language models often reason extensively to improve performance, but such lengthy reasoning is not always desirable, as it incurs excessive inference costs with disproportionate performance gains. Controlling reasoning length without sacrificing performance is therefore important, but remains challenging, especially under tight thinking budgets. We propose budget guidance, a simple yet effective method for steering the reasoning process of LLMs toward a target budget without requiring any LLM fine-tuning. Our approach introduces a lightweight predictor that models a Gamma distribution over the remaining thinking length during next-token generation. This signal is then used to guide generation in a soft, token-level manner, ensuring that the overall reasoning trace adheres to the specified thinking budget. Budget guidance enables natural control of the thinking length, along with significant token efficiency improvements over baseline methods on challenging math benchmarks. For instance, it achieves up to a 26% accuracy gain on the MATH-500 benchmark under tight budgets compared to baseline methods, while maintaining competitive accuracy with only 63% of the thinking tokens used by the full-thinking model. Budget guidance also generalizes to broader task domains and exhibits emergent capabilities, such as estimating question difficulty. The source code is available at: https://github.com/UMass-Embodied-AGI/BudgetGuidance.
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam, has been challenging because there were no rigorously proven SDE approximations for these methods. This paper derives the SDE approximations for RMSprop and Adam, giving theoretical guarantees of their correctness as well as experimental validation of their applicability to common large-scaling vision and language settings. A key practical result is the derivation of a square root scaling rule to adjust the optimization hyperparameters of RMSprop and Adam when changing batch size, and its empirical validation in deep learning settings.
Elucidating The Design Space of Classifier-Guided Diffusion Generation
Guidance in conditional diffusion generation is of great importance for sample quality and controllability. However, existing guidance schemes are to be desired. On one hand, mainstream methods such as classifier guidance and classifier-free guidance both require extra training with labeled data, which is time-consuming and unable to adapt to new conditions. On the other hand, training-free methods such as universal guidance, though more flexible, have yet to demonstrate comparable performance. In this work, through a comprehensive investigation into the design space, we show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion, enjoying the best of both worlds. Employing calibration as a general guideline, we propose several pre-conditioning techniques to better exploit pretrained off-the-shelf classifiers for guiding diffusion generation. Extensive experiments on ImageNet validate our proposed method, showing that state-of-the-art diffusion models (DDPM, EDM, DiT) can be further improved (up to 20%) using off-the-shelf classifiers with barely any extra computational cost. With the proliferation of publicly available pretrained classifiers, our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks. The code is available at https://github.com/AlexMaOLS/EluCD/tree/main.
Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.
An Old-Fashioned Framework for Machine Learning in Turbulence Modeling
The objective is to provide clear and well-motivated guidance to Machine Learning (ML) teams, founded on our experience in empirical turbulence modeling. Guidance is also needed for modeling outside ML. ML is not yet successful in turbulence modeling, and many papers have produced unusable proposals either due to errors in math or physics, or to severe overfitting. We believe that "Turbulence Culture" (TC) takes years to learn and is difficult to convey especially considering the modern lack of time for careful study; important facts which are self-evident after a career in turbulence research and modeling and extensive reading are easy to miss. In addition, many of them are not absolute facts, a consequence of the gaps in our understanding of turbulence and the weak connection of models to first principles. Some of the mathematical facts are rigorous, but the physical aspects often are not. Turbulence models are surprisingly arbitrary. Disagreement between experts confuses the new entrants. In addition, several key properties of the models are ascertained through non-trivial analytical properties of the differential equations, which puts them out of reach of purely data-driven ML-type approaches. The best example is the crucial behavior of the model at the edge of the turbulent region (ETR). The knowledge we wish to put out here may be divided into "Mission" and "Requirements," each combining physics and mathematics. Clear lists of "Hard" and "Soft" constraints are presented. A concrete example of how DNS data could be used, possibly allied with ML, is first carried through and illustrates the large number of decisions needed. Our focus is on creating effective products which will empower CFD, rather than on publications.
Token Perturbation Guidance for Diffusion Models
Classifier-free guidance (CFG) has become an essential component of modern diffusion models to enhance both generation quality and alignment with input conditions. However, CFG requires specific training procedures and is limited to conditional generation. To address these limitations, we propose Token Perturbation Guidance (TPG), a novel method that applies perturbation matrices directly to intermediate token representations within the diffusion network. TPG employs a norm-preserving shuffling operation to provide effective and stable guidance signals that improve generation quality without architectural changes. As a result, TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation. We further analyze the guidance term provided by TPG and show that its effect on sampling more closely resembles CFG compared to existing training-free guidance techniques. Extensive experiments on SDXL and Stable Diffusion 2.1 show that TPG achieves nearly a 2times improvement in FID for unconditional generation over the SDXL baseline, while closely matching CFG in prompt alignment. These results establish TPG as a general, condition-agnostic guidance method that brings CFG-like benefits to a broader class of diffusion models. The code is available at https://github.com/TaatiTeam/Token-Perturbation-Guidance
Fast, Expressive SE(n) Equivariant Networks through Weight-Sharing in Position-Orientation Space
Based on the theory of homogeneous spaces we derive geometrically optimal edge attributes to be used within the flexible message-passing framework. We formalize the notion of weight sharing in convolutional networks as the sharing of message functions over point-pairs that should be treated equally. We define equivalence classes of point-pairs that are identical up to a transformation in the group and derive attributes that uniquely identify these classes. Weight sharing is then obtained by conditioning message functions on these attributes. As an application of the theory, we develop an efficient equivariant group convolutional network for processing 3D point clouds. The theory of homogeneous spaces tells us how to do group convolutions with feature maps over the homogeneous space of positions R^3, position and orientations R^3 {times} S^2, and the group SE(3) itself. Among these, R^3 {times} S^2 is an optimal choice due to the ability to represent directional information, which R^3 methods cannot, and it significantly enhances computational efficiency compared to indexing features on the full SE(3) group. We support this claim with state-of-the-art results -- in accuracy and speed -- on five different benchmarks in 2D and 3D, including interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models.
New Radio Observations of the Supernova Remnant CTA 1
We present new radio images of the supernova remnant (SNR) CTA 1 at 1420 and 408 MHz, and in the 21 cm line of H I observed with the Dominion Radio Astrophysical Observatory Synthesis Telescope and at 1420 MHz observed with the Effelsberg 100 m telescope. We confirm previously described continuum features and elaborate further on filamentary features identified using the high-resolution (1') maps from these new observations. We investigate the abrupt change in sign of rotation measure (RM) across the SNR, using the linear polarization observations in the four bands around 1420 MHz. Following X. H. Sun et al.'s (2011) investigation, we both confirm that the distribution of signs of the RMs for extragalactic sources in the area appears to match that of the shell, as well as combine the data from the four bands to estimate the relative depolarization and the intrinsic rotation measure of the SNR. We do not conclusively reject X. H. Sun et al.'s (2011) claim of a Faraday screen in the foreground causing the distribution of RMs that we observe; however, we do suggest an alternative explanation of a swept-up stellar wind from the progenitor star with a toroidal magnetic field. Finally, we expand on the analysis of the H I observations by applying the Rolling Hough Transform to isolate filamentary structure and better identify H I emission with the SNR. Further constraining the H I velocity channels associated with CTA 1, we use more recent Galactic rotation curves to calculate an updated kinematic distance of 1.09 +/- 0.2 kpc.
Growth of spinors in the generalized Seiberg-Witten equations on mathbb R^4 and mathbb R^3
The classical Seiberg-Witten equations in dimensions three and four admit a natural generalization within a unified framework known as the generalized Seiberg-Witten (GSW) equations, which encompasses many important equations in gauge theory. This article proves that the averaged L^2-norm of any spinor with non-constant pointwise norm in the GSW equations on mathbb R^4 and mathbb R^3, measured over large-radius spheres, grows faster than a power of the radius, under a suitable curvature decay assumption. Separately, it is shown that if the Yang-Mills-Higgs energy of any solution of these equations is finite, then the pointwise norm of the spinor in it must converge to a non-negative constant at infinity. These two behaviors cannot occur simultaneously unless the spinor has constant pointwise norm. This work may be seen as partial generalization of results obtained by Taubes[Tau17a], and Nagy and Oliveira [NO19] for the Kapustin-Witten equations.
Machine Learning Algebraic Geometry for Physics
We review some recent applications of machine learning to algebraic geometry and physics. Since problems in algebraic geometry can typically be reformulated as mappings between tensors, this makes them particularly amenable to supervised learning. Additionally, unsupervised methods can provide insight into the structure of such geometrical data. At the heart of this programme is the question of how geometry can be machine learned, and indeed how AI helps one to do mathematics. This is a chapter contribution to the book Machine learning and Algebraic Geometry, edited by A. Kasprzyk et al.
Proposing and solving olympiad geometry with guided tree search
Mathematics olympiads are prestigious competitions, with problem proposing and solving highly honored. Building artificial intelligence that proposes and solves olympiads presents an unresolved challenge in automated theorem discovery and proving, especially in geometry for its combination of numerical and spatial elements. We introduce TongGeometry, a Euclidean geometry system supporting tree-search-based guided problem proposing and solving. The efficient geometry system establishes the most extensive repository of geometry theorems to date: within the same computational budget as the existing state-of-the-art, TongGeometry discovers 6.7 billion geometry theorems requiring auxiliary constructions, including 4.1 billion exhibiting geometric symmetry. Among them, 10 theorems were proposed to regional mathematical olympiads with 3 of TongGeometry's proposals selected in real competitions, earning spots in a national team qualifying exam or a top civil olympiad in China and the US. Guided by fine-tuned large language models, TongGeometry solved all International Mathematical Olympiad geometry in IMO-AG-30, outperforming gold medalists for the first time. It also surpasses the existing state-of-the-art across a broader spectrum of olympiad-level problems. The full capabilities of the system can be utilized on a consumer-grade machine, making the model more accessible and fostering widespread democratization of its use. By analogy, unlike existing systems that merely solve problems like students, TongGeometry acts like a geometry coach, discovering, presenting, and proving theorems.
Bounds on geometric wakefields in collimators and step transitions of arbitrary cross sections
We present the wakefield conformal mapping technique that can be readily applied to the analysis of the radiation generated by an ultra-relativistic particle in the step transition and a collimator. We derive simple analytical expressions for the lower and upper bounds of both longitudinal and transverse wake potentials. We test the derived expressions against well-known formulas in several representative examples. The proposed method can greatly simplify the optimization of collimating sections, as well as become a useful tool in the shape optimization problems.
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Classifier-free guidance (CFG) has become the standard method for enhancing the quality of conditional diffusion models. However, employing CFG requires either training an unconditional model alongside the main diffusion model or modifying the training procedure by periodically inserting a null condition. There is also no clear extension of CFG to unconditional models. In this paper, we revisit the core principles of CFG and introduce a new method, independent condition guidance (ICG), which provides the benefits of CFG without the need for any special training procedures. Our approach streamlines the training process of conditional diffusion models and can also be applied during inference on any pre-trained conditional model. Additionally, by leveraging the time-step information encoded in all diffusion networks, we propose an extension of CFG, called time-step guidance (TSG), which can be applied to any diffusion model, including unconditional ones. Our guidance techniques are easy to implement and have the same sampling cost as CFG. Through extensive experiments, we demonstrate that ICG matches the performance of standard CFG across various conditional diffusion models. Moreover, we show that TSG improves generation quality in a manner similar to CFG, without relying on any conditional information.
Guided Flows for Generative Modeling and Decision Making
Classifier-free guidance is a key component for enhancing the performance of conditional generative models across diverse tasks. While it has previously demonstrated remarkable improvements for the sample quality, it has only been exclusively employed for diffusion models. In this paper, we integrate classifier-free guidance into Flow Matching (FM) models, an alternative simulation-free approach that trains Continuous Normalizing Flows (CNFs) based on regressing vector fields. We explore the usage of Guided Flows for a variety of downstream applications. We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text-to-speech synthesis, boasting state-of-the-art performance. Notably, we are the first to apply flow models for plan generation in the offline reinforcement learning setting, showcasing a 10x speedup in computation compared to diffusion models while maintaining comparable performance.
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
In this paper, we present an innovative process-oriented math process reward model called Math-Shepherd, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) Verification: Math-Shepherd is utilized for reranking multiple outputs generated by Large Language Models (LLMs); 2) Reinforcement Learning: Math-Shepherd is employed to reinforce LLMs with step-by-step Proximal Policy Optimization (PPO). With Math-Shepherd, a series of open-source LLMs demonstrates exceptional performance. For instance, the step-by-step PPO with Math-Shepherd significantly improves the accuracy of Mistral-7B (77.9\%to84.1\% on GSM8K and 28.6\%to33.0\% on MATH). The accuracy can be further enhanced to 89.1\% and 43.5\% on GSM8K and MATH with the verification of Math-Shepherd, respectively. We believe that automatic process supervision holds significant potential for the future evolution of LLMs.
The S2 orbit and tidally disrupted binaries: indications for collisional depletion in the Galactic center
The properties of the stellar cluster surrounding Sagittarius A* can be assessed indirectly through the motion of the S-stars. Specifically, the current accuracy to which the prograde precession of the S2 star is measured allows to place significant constraints on the extended mass enclosed by its orbit. We suggest that high velocity destructive collisions (DCs) offer a natural mechanism for depleting the mass inside the S2 orbit, thus allowing to reconcile the measured precession and the existence of a dense stellar cluster. Such a solution is especially necessary when considering that stars are supplied to the inner part of the cluster by both dynamical relaxation and by stars being captured in tight orbits during tidal disruption of binaries. We use analytic arguments and results from simulations to demonstrate that in order to obtain a precession that is consistent with observations, collisional depletion is necessary if the capture rate is greater than a few 10^{-6} yr^{-1}. We also show that fluctuations arising from the finite number of stars cannot serve as an alternative to DCs for generating consistency with the observed S2 precession. We conclude that astrometric observations of the S-stars provide a meaningful indication that the inner part of our galactic center is shaped by collisional depletion, supporting the hypothesis that DCs occur in galactic nuclei at an astrophysically significant rate.
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning
Large language models (LLMs) have demonstrated remarkable reasoning abilities in complex tasks, often relying on Chain-of-Thought (CoT) reasoning. However, due to their autoregressive token-level generation, the reasoning process is largely constrained to local decision-making and lacks global planning. This limitation frequently results in redundant, incoherent, or inaccurate reasoning, which significantly degrades overall performance. Existing approaches, such as tree-based algorithms and reinforcement learning (RL), attempt to address this issue but suffer from high computational costs and often fail to produce optimal reasoning trajectories. To tackle this challenge, we propose Plan-Then-Action Enhanced Reasoning with Group Relative Policy Optimization PTA-GRPO, a two-stage framework designed to improve both high-level planning and fine-grained CoT reasoning. In the first stage, we leverage advanced LLMs to distill CoT into compact high-level guidance, which is then used for supervised fine-tuning (SFT). In the second stage, we introduce a guidance-aware RL method that jointly optimizes the final output and the quality of high-level guidance, thereby enhancing reasoning effectiveness. We conduct extensive experiments on multiple mathematical reasoning benchmarks, including MATH, AIME2024, AIME2025, and AMC, across diverse base models such as Qwen2.5-7B-Instruct, Qwen3-8B, Qwen3-14B, and LLaMA3.2-3B. Experimental results demonstrate that PTA-GRPO consistently achieves stable and significant improvements across different models and tasks, validating its effectiveness and generalization.
On Loewner energy and curve composition
The composition gamma circ eta of Jordan curves gamma and eta in universal Teichm\"uller space is defined through the composition h_gamma circ h_eta of their conformal weldings. We show that whenever gamma and eta are curves of finite Loewner energy I^L, the energy of the composition satisfies $I^L(gamma circ eta) lesssim_K I^L(gamma) + I^L(eta), with an explicit constant in terms of the quasiconformal K of \gamma and \eta. We also study the asymptotic growth rate of the Loewner energy under n self-compositions \gamma^n := \gamma \circ \cdots \circ \gamma, showing limsup_{n rightarrow infty} 1{n}log I^L(gamma^n) lesssim_K 1, again with explicit constant. Our approach is to define a new conformally-covariant rooted welding functional W_h(y), and show W_h(y) \asymp_K I^L(\gamma) when h is a welding of \gamma and y is any root (a point in the domain of h). In the course of our arguments we also give several new expressions for the Loewner energy, including generalized formulas in terms of the Riemann maps f and g for \gamma which hold irrespective of the placement of \gamma on the Riemann sphere, the normalization of f and g, and what disks D, D^c \subset \mathbb{C} serve as domains. An additional corollary is that I^L(\gamma) is bounded above by a constant only depending on the Weil--Petersson distance from \gamma$ to the circle.
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for peptide and protein generation. We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature. Using this interpolant, we introduce Gumbel-Softmax Flow Matching by deriving a parameterized velocity field that transports from smooth categorical distributions to distributions concentrated at a single vertex of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices. To enable training-free guidance, we propose Straight-Through Guided Flows (STGFlow), a classifier-based guidance method that leverages straight-through estimators to steer the unconditional velocity field toward optimal vertices of the simplex. STGFlow enables efficient inference-time guidance using classifiers pre-trained on clean sequences, and can be used with any discrete flow method. Together, these components form a robust framework for controllable de novo sequence generation. We demonstrate state-of-the-art performance in conditional DNA promoter design, sequence-only protein generation, and target-binding peptide design for rare disease treatment.
Advancing Math Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Advancements in LLMs have significantly expanded their capabilities across various domains. However, mathematical reasoning remains a challenging area, prompting the development of math-specific LLMs. These models typically follow a two-stage training paradigm: pre-training with math-related corpora and post-training with problem datasets for SFT. Despite these efforts, the improvements in mathematical reasoning achieved through continued pre-training (CPT) are often less significant compared to those obtained via SFT. This study addresses this discrepancy by exploring alternative strategies during the pre-training phase, focusing on the use of problem-solving data over general mathematical corpora. We investigate three primary research questions: (1) Can problem-solving data enhance the model's mathematical reasoning capabilities more effectively than general mathematical corpora during CPT? (2) Are synthetic data from the same source equally effective, and which synthesis methods are most efficient? (3) How do the capabilities developed from the same problem-solving data differ between the CPT and SFT stages, and what factors contribute to these differences? Our findings indicate that problem-solving data significantly enhances the model's mathematical capabilities compared to general mathematical corpora. We also identify effective data synthesis methods, demonstrating that the tutorship amplification synthesis method achieves the best performance. Furthermore, while SFT facilitates instruction-following abilities, it underperforms compared to CPT with the same data, which can be partially attributed to its poor learning capacity for hard multi-step problem-solving data. These insights provide valuable guidance for optimizing the mathematical reasoning capabilities of LLMs, culminating in our development of a powerful mathematical base model called JiuZhang-8B.
Climate Modelling in Low-Precision: Effects of both Deterministic & Stochastic Rounding
Motivated by recent advances in operational weather forecasting, we study the efficacy of low-precision arithmetic for climate simulations. We develop a framework to measure rounding error in a climate model which provides a stress-test for a low-precision version of the model, and we apply our method to a variety of models including the Lorenz system; a shallow water approximation for flow over a ridge; and a coarse resolution global atmospheric model with simplified parameterisations (SPEEDY). Although double precision (52 significant bits) is standard across operational climate models, in our experiments we find that single precision (23 sbits) is more than enough and that as low as half precision (10 sbits) is often sufficient. For example, SPEEDY can be run with 12 sbits across the entire code with negligible rounding error and this can be lowered to 10 sbits if very minor errors are accepted, amounting to less than 0.1 mm/6hr for the average grid-point precipitation, for example. Our test is based on the Wasserstein metric and this provides stringent non-parametric bounds on rounding error accounting for annual means as well as extreme weather events. In addition, by testing models using both round-to-nearest (RN) and stochastic rounding (SR) we find that SR can mitigate rounding error across a range of applications. Thus our results also provide evidence that SR could be relevant to next-generation climate models. While many studies have shown that low-precision arithmetic can be suitable on short-term weather forecasting timescales, our results give the first evidence that a similar low precision level can be suitable for climate.
Phemenological Modelling of a Group of Eclipsing Binary Stars
Phenomenological modeling of variable stars allows determination of a set of the parameters, which are needed for classification in the "General Catalogue of Variable Stars" and similar catalogs. We apply a recent method NAV ("New Algol Variable") to eclipsing binary stars of different types. Although all periodic functions may be represented as Fourier series with an infinite number of coefficients, this is impossible for a finite number of the observations. Thus one may use a restricted Fourier series, i.e. a trigonometric polynomial (TP) of order s either for fitting the light curve, or to make a periodogram analysis. However, the number of parameters needed drastically increases with decreasing width of minimum. In the NAV algorithm, the special shape of minimum is used, so the number of parameters is limited to 10 (if the period and initial epoch are fixed) or 12 (not fixed). We illustrate the NAV method by application to a recently discovered Algol-type eclipsing variable 2MASS J11080308-6145589 (in the field of previously known variable star RS Car) and compare results to that obtained using the TP fits. For this system, the statistically optimal number of parameters is 44, but the fit is still worse than that of the NAV fit. Application to the system GSC 3692-00624 argues that the NAV fit is better than the TP one even for the case of EW-type stars with much wider eclipses. Model parameters are listed.
Diprotodon on the sky. The Large Galactic Supernova Remnant (SNR) G278.94+1.35
We present a re-discovery of G278.94+1.35 as possibly one of the largest known Galactic supernova remnants (SNR) - that we name Diprotodon. While previously established as a Galactic SNR, Diprotodon is visible in our new EMU and GLEAM radio continuum images at an angular size of 3.33x3.23 deg, much larger than previously measured. At the previously suggested distance of 2.7 kpc, this implies a diameter of 157x152 pc. This size would qualify Diprotodon as the largest known SNR and pushes our estimates of SNR sizes to the upper limits. We investigate the environment in which the SNR is located and examine various scenarios that might explain such a large and relatively bright SNR appearance. We find that Diprotodon is most likely at a much closer distance of sim1 kpc, implying its diameter is 58x56 pc and it is in the radiative evolutionary phase. We also present a new Fermi-LAT data analysis that confirms the angular extent of the SNR in gamma-rays. The origin of the high-energy emission remains somewhat puzzling, and the scenarios we explore reveal new puzzles, given this unexpected and unique observation of a seemingly evolved SNR having a hard GeV spectrum with no breaks. We explore both leptonic and hadronic scenarios, as well as the possibility that the high-energy emission arises from the leftover particle population of a historic pulsar wind nebula.
JT-Math: A Multi-Stage Framework for Advanced Mathematical Reasoning in Large Language Models
Mathematical reasoning is a cornerstone of artificial general intelligence and a primary benchmark for evaluating the capabilities of Large Language Models (LLMs). While state-of-the-art models show promise, they often falter when faced with complex problems that demand deep conceptual understanding and intricate, multi-step deliberation. To address this challenge, we introduce JT-Math-8B, a series of open-source models comprising base, instruct, and thinking versions, built upon a systematic, multi-stage optimization framework. Our pre-training corpus is a high-quality, 210B-token dataset curated through a dedicated data pipeline that uses model-based validation to ensure quality and diversity. The Instruct Model is optimized for direct, concise answers through Supervised Fine-Tuning (SFT) and a GRPO-based reinforcement learning (RL) method. The Thinking Model is trained for complex problem-solving using a Long Chain-of-Thought (Long CoT) approach, combining SFT with a novel, multi-stage RL curriculum that progressively increases task difficulty and context length up to 32K tokens. JT-Math-8B achieves state-of-the-art results among open-source models of similar size, surpassing prominent models like OpenAI's O1-mini and GPT-4o , and demonstrating superior performance on competition-level mathematics.
Super-resolution of Sentinel-2 images: Learning a globally applicable deep neural network
The Sentinel-2 satellite mission delivers multi-spectral imagery with 13 spectral bands, acquired at three different spatial resolutions. The aim of this research is to super-resolve the lower-resolution (20 m and 60 m Ground Sampling Distance - GSD) bands to 10 m GSD, so as to obtain a complete data cube at the maximal sensor resolution. We employ a state-of-the-art convolutional neural network (CNN) to perform end-to-end upsampling, which is trained with data at lower resolution, i.e., from 40->20 m, respectively 360->60 m GSD. In this way, one has access to a virtually infinite amount of training data, by downsampling real Sentinel-2 images. We use data sampled globally over a wide range of geographical locations, to obtain a network that generalises across different climate zones and land-cover types, and can super-resolve arbitrary Sentinel-2 images without the need of retraining. In quantitative evaluations (at lower scale, where ground truth is available), our network, which we call DSen2, outperforms the best competing approach by almost 50% in RMSE, while better preserving the spectral characteristics. It also delivers visually convincing results at the full 10 m GSD. The code is available at https://github.com/lanha/DSen2
Stepwise Self-Consistent Mathematical Reasoning with Large Language Models
Using Large Language Models for complex mathematical reasoning is difficult, primarily due to the complexity of multi-step reasoning. The main challenges of this process include (1) selecting critical intermediate results to advance the procedure, and (2) limited exploration of potential solutions. To address these issues, we introduce a novel algorithm, namely Stepwise Self-Consistent Chain-of-Thought (SSC-CoT). SSC-CoT employs a strategy of selecting intermediate steps based on the intersection of various reasoning chains. Additionally, SSC-CoT enables the model to discover critical intermediate steps by querying a knowledge graph comprising relevant domain knowledge. To validate SSC-CoT, we present a new dataset, TriMaster100, tailored for complex trigonometry problems. This dataset contains 100 questions, with each solution broken down into scored intermediate steps, facilitating a comprehensive evaluation of the mathematical reasoning process. On TriMaster100, SSC-CoT triples the effectiveness of the state-of-the-art methods. Furthermore, we benchmark SSC-CoT on the widely recognized complex mathematical question dataset, MATH level 5, and it surpasses the second-best method by 7.2% in accuracy. Code and the TriMaster100 dataset can be found at: https://github.com/zhao-zilong/ssc-cot.

 
			 
			 
			 
			 
			 
			 
			 
			 
			 
			 
			