remostei commited on
Commit
c9ad7cb
·
verified ·
1 Parent(s): 19aec16

Add model card

Browse files
Files changed (1) hide show
  1. README.md +191 -0
README.md CHANGED
@@ -4,4 +4,195 @@ datasets:
4
  - nvidia/PhysicalAI-Robotics-mindmap-Franka-Mug-in-Drawer
5
  - nvidia/PhysicalAI-Robotics-mindmap-GR1-Drill-in-Box
6
  - nvidia/PhysicalAI-Robotics-mindmap-GR1-Stick-in-Bin
 
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - nvidia/PhysicalAI-Robotics-mindmap-Franka-Mug-in-Drawer
5
  - nvidia/PhysicalAI-Robotics-mindmap-GR1-Drill-in-Box
6
  - nvidia/PhysicalAI-Robotics-mindmap-GR1-Stick-in-Bin
7
+
8
  ---
9
+
10
+ # Model Overview
11
+
12
+ ### Description:
13
+
14
+ ``mindmap`` is a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment,
15
+ enabling robots with spatial memory.
16
+
17
+ Trained models are available on Hugging Face: [PhysicalAI-Robotics-mindmap-Checkpoints](https://huggingface.co/nvidia/PhysicalAI-Robotics-mindmap-Checkpoints)
18
+
19
+ ### License/Terms of Use
20
+
21
+ - Model: [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
22
+ - Code: [NVIDIA License (NSCLv1)](https://github.com/NVlabs/nvblox_mindmap/tree/public/LICENSE.md)
23
+
24
+ ### Deployment Geography:
25
+
26
+ Global
27
+
28
+ ### Use Case
29
+
30
+ The trained ``mindmap`` policies allow for quick evaluation of the ``mindmap`` concept on selected simulated robotic manipulation tasks.
31
+
32
+ - Researchers, Academics, Open-Source Community: AI-driven robotics research and algorithm development.
33
+ - Developers: Integrate and customize AI for various robotic applications.
34
+ - Startups & Companies: Accelerate robotics development and reduce training costs.
35
+
36
+
37
+ ## References(s):
38
+
39
+ - ``mindmap`` paper:
40
+
41
+ - Remo Steiner, Alexander Millane, David Tingdahl, Clemens Volk, Vikram Ramasamy, Xinjie Yao, Peter Du, Soha Pouya and Shiwei Sheng. "**mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies**". CoRL 2025 Workshop RemembeRL.
42
+ [arXiv preprint arXiv:2509.20297 (2025).](https://arxiv.org/abs/2509.20297)
43
+ - ``mindmap`` codebase:
44
+ - github.com/NVlabs/nvblox_mindmap
45
+
46
+ ## Model Architecture:
47
+
48
+ **Architecture Type:** Denoising Diffusion Probabilistic Model
49
+
50
+
51
+ **Network Architecture:**
52
+
53
+ ``mindmap`` is a Denoising Diffusion Probabilistic Model that samples robot trajectories conditioned on sensor observations and a 3D reconstruction of the environment. Images are first passed through a Vision Foundation Model and then back-projected, using the depth image, to a pointcloud. In parallel, a reconstruction of the scene is built that accumulates metric-semantic information from past observations. The two 3D data sources, the instantaneous visual observation and the reconstruction, are passed to a transformer that iteratively denoises robot trajectories.
54
+
55
+ **This model was developed based on:** [3D Diffuser Actor](https://3d-diffuser-actor.github.io/)
56
+
57
+ **Number of model parameters:** ∼3M trainable, plus ∼100M frozen in the image encoder
58
+
59
+ ## Input:
60
+
61
+ **Input Type(s):**
62
+ - RGB: Image frames
63
+ - Geometry: Depth frames converted to 3D pointclouds
64
+ - State: Robot proprioception
65
+ - Reconstruction: Metric-semantic reconstruction represented as featurized pointcloud
66
+
67
+ **Input Format(s):**
68
+ - RGB: float32 in the range `[0, 1]`
69
+ - Geometry: float32 in world coordinates
70
+ - State: float32 in world coordinates
71
+ - Reconstruction (represented as feature pointcloud):
72
+ - Points: float32 in world coordinates
73
+ - Features: float32
74
+
75
+ **Input Parameters:**
76
+ - RGB: `[NUM_CAMERAS, 3, HEIGHT, WIDTH]` - 512x512 resolution on the provided checkpoints
77
+ - Geometry: `[NUM_CAMERAS, 3, HEIGHT, WIDTH]` - 512x512 resolution on the provided checkpoints
78
+ - State: `[HISTORY_LENGTH, NUM_GRIPPERS, 8]` - consisting of end-effector translation, rotation (quaternion, wxyz) and closedness
79
+ - Reconstruction (represented as feature pointcloud):
80
+ - Points: `[NUM_POINTS, 3]` - `NUM_POINTS` is 2048 for the provided checkpoints
81
+ - Features: `[NUM_POINTS, FEATURE_DIM]` - `FEATURE_DIM` is 768 for the `RADIO_V25_B` feature extractor used for the provided checkpoints
82
+
83
+ ## Output:
84
+
85
+ **Output Type(s):** Robot actions
86
+
87
+ **Output Format:** float32
88
+
89
+ **Output Parameters:**
90
+ - Gripper: `[PREDICTION_HORIZON, NUM_GRIPPERS, 8]` - consisting of end-effector translation, rotation (quaternion, wxyz) and closedness
91
+ - Head Yaw: `[PREDICTION_HORIZON, 1]` - only for humanoid embodiments
92
+
93
+
94
+ ## Software Integration:
95
+ **Runtime Engine(s):** PyTorch
96
+
97
+ **Supported Hardware Microarchitecture Compatibility:**
98
+ - NVIDIA Ampere
99
+ - NVIDIA Blackwell
100
+ - NVIDIA Jetson
101
+ - NVIDIA Hopper
102
+ - NVIDIA Lovelace
103
+ - NVIDIA Pascal
104
+ - NVIDIA Turing
105
+ - NVIDIA Volta
106
+
107
+
108
+ **Preferred/Supported Operating System(s):**
109
+ * Linux
110
+
111
+ ## Model Version(s):
112
+
113
+ This is the initial version of the model, version 1.0.0
114
+
115
+ ## Training, Testing, and Evaluation Datasets:
116
+
117
+ Datasets:
118
+ - cube_stacking_checkpoint: [Franka Cube Stacking Dataset](https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-mindmap-Franka-Cube-Stacking)
119
+ - mug_in_drawer_checkpoint: [Franka Mug in Drawer Dataset](https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-mindmap-Franka-Mug-in-Drawer)
120
+ - drill_in_box_checkpoint: [GR1 Drill in Box Dataset](https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-mindmap-GR1-Drill-in-Box)
121
+ - stick_in_bin_checkpoint: [GR1 Stick in Bin Dataset](https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-mindmap-GR1-Stick-in-Bin)
122
+
123
+ The models were trained on 100 (GR1) and 130 (Franka) demonstrations. The evaluation set consisted of 20 distinct demonstrations. Closed loop testing was performed on 100 demonstrations mutually exclusive from the training set.
124
+
125
+ # Inference:
126
+
127
+ **Engine:** PyTorch
128
+
129
+ **Test Hardware:** Linux, L40S
130
+
131
+ ## Model Limitations:
132
+
133
+ This model is not tested or intended for use in mission critical applications that require functional safety. The use of the model in those applications is at the user's own risk and sole responsibility, including taking the necessary steps to add needed guardrails or safety mechanisms.
134
+
135
+ - Risk: This policy is only effective on the exact simulation environment it was trained on.
136
+ - Mitigation: Need to retrain the model on new simulation environments.
137
+ - Risk: The policy was never tested on a physical robot and likely only works in simulation
138
+ - Mitigation: Expand training, testing and validation on physical robot platforms.
139
+
140
+ ## Ethical Considerations:
141
+
142
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
143
+
144
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
145
+
146
+ Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
147
+
148
+ # Bias
149
+ Field | Response
150
+ :---------------------------------------------------------------------------------------------------|:---------------
151
+ Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | Not Applicable
152
+ Bias Metric (If Measured): | Not Applicable
153
+ (For GPAI Models) Which characteristic (feature) show(s) the greatest difference in performance?: | Not Applicable
154
+ (For GPAI Models): Which feature(s) have have the worst performance overall? | Not Applicable
155
+ Measures taken to mitigate against unwanted bias: | Not Applicable
156
+ (For GPAI Models): If using internal data, description of methods implemented in data acquisition or processing, if any, to address the prevalence of identifiable biases in the training, testing, and validation data: | Not Applicable
157
+ (For GPAI Models): Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | Not Applicable
158
+ (For GPAI Models): Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | Not Applicable
159
+
160
+ # Explainability
161
+ Field | Response
162
+ :------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
163
+ Intended Task/Domain: | Robotic Manipulation
164
+ Model Type: | Denoising Diffusion Probabilistic Model
165
+ Intended Users: | Roboticists and researchers in academia and industry who are interested in robot manipulation research
166
+ Output: | Actions consisting of end-effector poses, gripper states and head orientation.
167
+ (For GPAI Models): Tools used to evaluate datasets to identify synthetic data and ensure data authenticity. | Not Applicable
168
+ Describe how the model works: | ``mindmap`` is a Denoising Diffusion Probabilistic Model that samples robot trajectories conditioned on sensor observations and a 3D reconstruction of the environment.
169
+ Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
170
+ Technical Limitations & Mitigation: | - The policy is only effective on the exact simulation environment it was trained on. - The policy was never tested on a physical robot and likely only works in simulation.
171
+ Verified to have met prescribed NVIDIA quality standards: | Yes
172
+ Performance Metrics: | Closed loop success rate on simulated robotic manipulation tasks.
173
+ Potential Known Risks: | The model might be susceptible to rendering changes on the simulation tasks it was trained on.
174
+ Licensing: | [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
175
+
176
+ # Safety and Security
177
+ Field | Response
178
+ :---------------------------------------------------|:----------------------------------
179
+ Model Application Field(s): | Robotics
180
+ Describe the life critical impact (if present). | Not Applicable
181
+ (For GPAI Models): Description of methods implemented in data acquisition or processing, if any, to address other types of potentially harmful data in the training, testing, and validation data: | Not GPAI
182
+ (For GPAI Models): Description of any methods implemented in data acquisition or processing, if any, to address illegal or harmful content in the training data, including, but not limited to, child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) | Not GPAI
183
+ Use Case Restrictions: | Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
184
+ Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
185
+
186
+ # Privacy
187
+ Field | Response
188
+ :----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
189
+ Generatable or reverse engineerable personal data? | No
190
+ Personal data used to create this model? | No
191
+ Was consent obtained for any personal data used? | Not Applicable
192
+ (For GPAI Models): A description of any methods implemented in data acquisition or processing, if any, to address the prevalence of personal data in the training data, where relevant and applicable. | Not Applicable
193
+ How often is dataset reviewed? | Before Release
194
+ Is there provenance for all datasets used in training? | Yes
195
+ Does data labeling (annotation, metadata) comply with privacy laws? | Yes
196
+ Is data compliant with data subject requests for data correction or removal, if such a request was made? | Yes
197
+ Applicable Privacy Policy | [NVIDIA Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy)
198
+