new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Oct 30

Transformers Discover Molecular Structure Without Graph Priors

Graph Neural Networks (GNNs) are the dominant architecture for molecular machine learning, particularly for molecular property prediction and machine learning interatomic potentials (MLIPs). GNNs perform message passing on predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor scheme. While this design aligns with the locality present in many molecular tasks, a hard-coded graph can limit expressivity due to the fixed receptive field and slows down inference with sparse graph operations. In this work, we investigate whether pure, unmodified Transformers trained directly on Cartesian coordinatesx2013without predefined graphs or physical priorsx2013can approximate molecular energies and forces. As a starting point for our analysis, we demonstrate how to train a Transformer to competitive energy and force mean absolute errors under a matched training compute budget, relative to a state-of-the-art equivariant GNN on the OMol25 dataset. We discover that the Transformer learns physically consistent patternsx2013such as attention weights that decay inversely with interatomic distancex2013and flexibly adapts them across different molecular environments due to the absence of hard-coded biases. The use of a standard Transformer also unlocks predictable improvements with respect to scaling training resources, consistent with empirical scaling laws observed in other domains. Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling.

Free Discontinuity Regression: With an Application to the Economic Effects of Internet Shutdowns

Sharp, multidimensional changepoints-abrupt shifts in a regression surface whose locations and magnitudes are unknown-arise in settings as varied as gene-expression profiling, financial covariance breaks, climate-regime detection, and urban socioeconomic mapping. Despite their prevalence, there are no current approaches that jointly estimate the location and size of the discontinuity set in a one-shot approach with statistical guarantees. We therefore introduce Free Discontinuity Regression (FDR), a fully nonparametric estimator that simultaneously (i) smooths a regression surface, (ii) segments it into contiguous regions, and (iii) provably recovers the precise locations and sizes of its jumps. By extending a convex relaxation of the Mumford-Shah functional to random spatial sampling and correlated noise, FDR overcomes the fixed-grid and i.i.d. noise assumptions of classical image-segmentation approaches, thus enabling its application to real-world data of any dimension. This yields the first identification and uniform consistency results for multivariate jump surfaces: under mild SBV regularity, the estimated function, its discontinuity set, and all jump sizes converge to their true population counterparts. Hyperparameters are selected automatically from the data using Stein's Unbiased Risk Estimate, and large-scale simulations up to three dimensions validate the theoretical results and demonstrate good finite-sample performance. Applying FDR to an internet shutdown in India reveals a 25-35% reduction in economic activity around the estimated shutdown boundaries-much larger than previous estimates. By unifying smoothing, segmentation, and effect-size recovery in a general statistical setting, FDR turns free-discontinuity ideas into a practical tool with formal guarantees for modern multivariate data.

  • 2 authors
·
Sep 25, 2023

The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing

Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection into the inputs to obtain a smoothed and robust classifier. In this paper, we first show that the variance introduced by the Monte-Carlo sampling in the randomized smoothing procedure estimate closely interacts with two other important properties of the classifier, i.e. its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. To increase the certified robust radius, we introduce a different way to convert logits to probability vectors for the base classifier to leverage the variance-margin trade-off. We leverage the use of Bernstein's concentration inequality along with enhanced Lipschitz bounds for randomized smoothing. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.

  • 4 authors
·
Sep 28, 2023

Enhancing Worldwide Image Geolocation by Ensembling Satellite-Based Ground-Level Attribute Predictors

Geolocating images of a ground-level scene entails estimating the location on Earth where the picture was taken, in absence of GPS or other location metadata. Typically, methods are evaluated by measuring the Great Circle Distance (GCD) between a predicted location and ground truth. However, this measurement is limited because it only evaluates a single point, not estimates of regions or score heatmaps. This is especially important in applications to rural, wilderness and under-sampled areas, where finding the exact location may not be possible, and when used in aggregate systems that progressively narrow down locations. In this paper, we introduce a novel metric, Recall vs Area (RvA), which measures the accuracy of estimated distributions of locations. RvA treats image geolocation results similarly to document retrieval, measuring recall as a function of area: For a ranked list of (possibly non-contiguous) predicted regions, we measure the accumulated area required for the region to contain the ground truth coordinate. This produces a curve similar to a precision-recall curve, where "precision" is replaced by square kilometers area, allowing evaluation of performance for different downstream search area budgets. Following directly from this view of the problem, we then examine a simple ensembling approach to global-scale image geolocation, which incorporates information from multiple sources to help address domain shift, and can readily incorporate multiple models, attribute predictors, and data sources. We study its effectiveness by combining the geolocation models GeoEstimation and the current SOTA GeoCLIP, with attribute predictors based on ORNL LandScan and ESA-CCI Land Cover. We find significant improvements in image geolocation for areas that are under-represented in the training set, particularly non-urban areas, on both Im2GPS3k and Street View images.

  • 3 authors
·
Jul 18, 2024

Unveiling the soft X-ray source population towards the inner Galactic disk with XMM-Newton

Across the Galactic disk lies a diverse population of X-ray sources, with the fainter end remaining poorly understood due to past survey sensitivity limits. We aim to classify and characterize faint X-ray sources detected in the eROSITA All-Sky Survey (eRASS1) towards the inner Galactic disk (350^circ < l < 360^circ, -1^circ < b < 1^circ) using deeper XMM-Newton observations (typical exposure of sim 20,ks). We analyzed 189 eRASS1 sources, combining X-ray spectral fitting (0.2--10,keV) with Gaia astrometric and photometric data for robust classification. Our results show that the eRASS1 catalog towards the Galactic disk is overwhelmingly dominated by coronal sources (sim 74%), primarily active stars and binaries, with sim 8% being wind-powered massive stars and sim 18% being accreting compact objects. We propose an empirical hardness-ratio cut (HR > -0.2) to efficiently isolate these non-coronal sources. By stacking the classified population and comparing with the Galactic Ridge X-ray Emission (GRXE), we estimate that sim 6% of the GRXE flux in the 0.5--2.0,keV band is resolved into point sources above the eRASS1 flux limit (sim 5times 10^{-14},erg,cm^{-2},s^{-1}). This resolved soft-band emission is dominated by active stars, while hard-band flux originates primarily from X-ray binaries. We conclude that the eRASS1 catalog retains a non-negligible population of compact objects that can be effectively distinguished using X-ray color selection.

  • 8 authors
·
Oct 27

Faster Rates of Convergence to Stationary Points in Differentially Private Optimization

We study the problem of approximating stationary points of Lipschitz and smooth functions under (varepsilon,delta)-differential privacy (DP) in both the finite-sum and stochastic settings. A point w is called an alpha-stationary point of a function F:R^drightarrowR if |nabla F(w)|leq alpha. We provide a new efficient algorithm that finds an Obig(big[sqrt{d}{nvarepsilon}big]^{2/3}big)-stationary point in the finite-sum setting, where n is the number of samples. This improves on the previous best rate of Obig(big[sqrt{d}{nvarepsilon}big]^{1/2}big). We also give a new construction that improves over the existing rates in the stochastic optimization setting, where the goal is to find approximate stationary points of the population risk. Our construction finds a Obig(1{n^{1/3}} + big[sqrt{d}{nvarepsilon}big]^{1/2}big)-stationary point of the population risk in time linear in n. Furthermore, under the additional assumption of convexity, we completely characterize the sample complexity of finding stationary points of the population risk (up to polylog factors) and show that the optimal rate on population stationarity is tilde Thetabig(1{n}+sqrt{d}{nvarepsilon}big). Finally, we show that our methods can be used to provide dimension-independent rates of Obig(1{n}+minbig(big[sqrt{rank}{nvarepsilon}big]^{2/3},1{(nvarepsilon)^{2/5}}big)big) on population stationarity for Generalized Linear Models (GLM), where rank is the rank of the design matrix, which improves upon the previous best known rate.

  • 6 authors
·
Jun 1, 2022

RFLA: A Stealthy Reflected Light Adversarial Attack in the Physical World

Physical adversarial attacks against deep neural networks (DNNs) have recently gained increasing attention. The current mainstream physical attacks use printed adversarial patches or camouflage to alter the appearance of the target object. However, these approaches generate conspicuous adversarial patterns that show poor stealthiness. Another physical deployable attack is the optical attack, featuring stealthiness while exhibiting weakly in the daytime with sunlight. In this paper, we propose a novel Reflected Light Attack (RFLA), featuring effective and stealthy in both the digital and physical world, which is implemented by placing the color transparent plastic sheet and a paper cut of a specific shape in front of the mirror to create different colored geometries on the target object. To achieve these goals, we devise a general framework based on the circle to model the reflected light on the target object. Specifically, we optimize a circle (composed of a coordinate and radius) to carry various geometrical shapes determined by the optimized angle. The fill color of the geometry shape and its corresponding transparency are also optimized. We extensively evaluate the effectiveness of RFLA on different datasets and models. Experiment results suggest that the proposed method achieves over 99% success rate on different datasets and models in the digital world. Additionally, we verify the effectiveness of the proposed method in different physical environments by using sunlight or a flashlight.

  • 5 authors
·
Jul 14, 2023