Spaces:

rianders
/

mpi_data_store

Sleeping

App Files Files Community

rianders commited on Mar 24, 2024

Commit

f1b2eb8

1 Parent(s): eceebf5

updated main pages

Browse files

Files changed (2) hide show

README.md +51 -12
app.py +29 -6

README.md CHANGED Viewed

@@ -1,12 +1,51 @@
----
-title: Mpi Data Store
-emoji: 🏆
-colorFrom: purple
-colorTo: red
-sdk: streamlit
-sdk_version: 1.32.2
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Data Processing Interface
+This project is a Streamlit-based interface designed to facilitate the mining, processing, and embedding of data from public GitHub repositories. It allows for the interactive selection and configuration of data sources, model parameters, and processing options, making it easier to manage data extraction and transformation tasks.
+## Installation
+Before running the app, you need to install the necessary dependencies. This project requires Python 3.6 or later.
+1. Clone the repository to your local machine:
+```
+git clone https://github.com/yourusername/yourprojectname.git
+cd yourprojectname
+```
+2. Install the required Python packages:
+```
+pip install streamlit pandas tqdm
+```
+Make sure to install any other dependencies specific to your project.
+## Running the App
+To run the app, navigate to the project directory in your terminal and execute the following command:
+```
+streamlit run streamlit_app.py
+```
+## App Structure
+The app is organized into multiple pages, each dedicated to a specific part of the data processing workflow:
+- **Main Page:** Provides an overview and status of the data processing steps.
+- **Data Source Configuration:** Allows for the selection of a GitHub repository and specification of an output directory for generated data.
+- **Data Loading:** Enables directory selection within the repository and file type filtering for processing.
+- **Model Selection and Configuration:** Offers options to select and configure the embedding model and the question-and-answering model.
+- **Processing and Embedding:** Displays the process status, allows parameter tuning, and provides options to save preprocessed pages, processed pages, and vector store data.
+## Navigating the Interface
+After launching the app, use the sidebar to navigate between the different pages. Each page includes interactive elements, such as input fields, dropdown menus, and checkboxes, allowing you to customize each step of the data processing pipeline.
+Ensure to follow the instructions on each page to properly configure and execute the data processing tasks.
+## Contributing
+We welcome contributions to this project! If you have suggestions for improvements or encounter any issues, please open an issue or submit a pull request.

app.py CHANGED Viewed

@@ -1,10 +1,33 @@
 import streamlit as st
-# Main script that Streamlit runs.
-# Individual pages are in the 'pages' folder.
-st.set_page_config(page_title="Data Processing Interface", layout="wide")
-st.title("Data Processing Interface")
-# The content of this script can be minimal, as the pages are defined in separate files.
-st.write("Please navigate to the sections using the sidebar.")

 import streamlit as st
+from main_page import main as main_page
+# Import other pages. Assume each has a main function to run the page.
+from pages.data_source_config import main as data_source_config
+from pages.data_loading import main as data_loading
+# Add imports for other pages similarly...
+# Initialize session state for page navigation if not already set
+if 'page' not in st.session_state:
+    st.session_state.page = 'main_page'
+# Define a function to change the page
+def change_page(page_name):
+    st.session_state.page = page_name
+# Page selection (could also use st.sidebar for these)
+st.sidebar.title("Navigation")
+st.sidebar.button("Main Page", on_click=change_page, args=('main_page',))
+st.sidebar.button("Data Source Configuration", on_click=change_page, args=('data_source_config',))
+st.sidebar.button("Data Loading", on_click=change_page, args=('data_loading',))
+# Add buttons for other pages similarly...
+# Page dispatch
+if st.session_state.page == 'main_page':
+    main_page()
+elif st.session_state.page == 'data_source_config':
+    data_source_config()
+elif st.session_state.page == 'data_loading':
+    data_loading()
+# Add elif blocks for other pages...
+# The above could be optimized by mapping page names to functions