Spaces:
Runtime error
Runtime error
| title: ReSym Space | |
| emoji: 🐢 | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.22.0 | |
| app_file: app.py | |
| pinned: false | |
| # ReSym Space | |
| This is a space for testing the models from the [ReSym | |
| artifacts](https://github.com/lt-asset/resym). Sadly, at the time I am writing | |
| this, not all of ReSym is publicly available; specifically, the Prolog component | |
| is [not available](https://github.com/lt-asset/resym/issues/2). | |
| This space simply performs inference on the two pretrained models available as | |
| part of the ReSym artifacts. It takes a variable name and some decompiled code | |
| as input, and outputs the variable type and other information. | |
| The examples are randomly selected from `vardecoder_test.jsonl`. As a result, the fields do not always parse correctly. | |
| ## Disclaimer | |
| I'm not a ReSym developer and I may have messed something up. In particular, | |
| you must prompt the variable names in the decompiled code as part of the prompt, | |
| and I reused some of their own code to do this. | |
| ## Known Issues / Oddities | |
| ### sub_40FD86 | |
| We do not get the same results for sub_40FD86. In fact, we don't create the same prompt. The prompt in `vardecoder_test.jsonl` is: | |
| What are the original name and data type of variables `v3`, `v4`, `v5`? | |
| It's unclear why a1, a2, and result are not listed. | |
| ### `first_token` weirdness | |
| The [example | |
| inference](https://github.com/lt-asset/resym/blob/main/training_src/fielddecoder_inf.py) | |
| scripts get the first token of the output and include it in the prompt. | |
| Technically this is data leakage, but since the first token is usually part of | |
| the prompt (a variable name or field expression) it's probably OK? But it's | |
| also pretty weird. | |
| ### Indentation | |
| Some decompilations in the dataset have whitespace for indentation included, and | |
| some do not. | |
| ### `field_access_driver` clang parser | |
| ReSym uses a clang-based parsing tool to extract field accesses. The tool still | |
| outputs the field accesses even if the code does not parse correctly. This | |
| seems to be design, so I am doing this too. Otherwise, most of the ReSym | |
| examples do not work, because external functions and variables are not properly | |
| declared. | |
| Another oddity is that sometimes the field access driver will output a field | |
| access expression of `""`. This appears to be a bug in the field access driver. | |
| ### Other | |
| * ReSym's parser fails for functions with a non-automatic name | |
| ## Todo | |
| * Test field decoding more | |