rianders commited on
Commit
f1b2eb8
·
1 Parent(s): eceebf5

updated main pages

Browse files
Files changed (2) hide show
  1. README.md +51 -12
  2. app.py +29 -6
README.md CHANGED
@@ -1,12 +1,51 @@
1
- ---
2
- title: Mpi Data Store
3
- emoji: 🏆
4
- colorFrom: purple
5
- colorTo: red
6
- sdk: streamlit
7
- sdk_version: 1.32.2
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Data Processing Interface
2
+
3
+ This project is a Streamlit-based interface designed to facilitate the mining, processing, and embedding of data from public GitHub repositories. It allows for the interactive selection and configuration of data sources, model parameters, and processing options, making it easier to manage data extraction and transformation tasks.
4
+
5
+ ## Installation
6
+
7
+ Before running the app, you need to install the necessary dependencies. This project requires Python 3.6 or later.
8
+
9
+ 1. Clone the repository to your local machine:
10
+
11
+ ```
12
+ git clone https://github.com/yourusername/yourprojectname.git
13
+ cd yourprojectname
14
+ ```
15
+
16
+ 2. Install the required Python packages:
17
+
18
+ ```
19
+ pip install streamlit pandas tqdm
20
+ ```
21
+
22
+ Make sure to install any other dependencies specific to your project.
23
+
24
+ ## Running the App
25
+
26
+ To run the app, navigate to the project directory in your terminal and execute the following command:
27
+
28
+ ```
29
+ streamlit run streamlit_app.py
30
+ ```
31
+
32
+ ## App Structure
33
+
34
+ The app is organized into multiple pages, each dedicated to a specific part of the data processing workflow:
35
+
36
+ - **Main Page:** Provides an overview and status of the data processing steps.
37
+ - **Data Source Configuration:** Allows for the selection of a GitHub repository and specification of an output directory for generated data.
38
+ - **Data Loading:** Enables directory selection within the repository and file type filtering for processing.
39
+ - **Model Selection and Configuration:** Offers options to select and configure the embedding model and the question-and-answering model.
40
+ - **Processing and Embedding:** Displays the process status, allows parameter tuning, and provides options to save preprocessed pages, processed pages, and vector store data.
41
+
42
+ ## Navigating the Interface
43
+
44
+ After launching the app, use the sidebar to navigate between the different pages. Each page includes interactive elements, such as input fields, dropdown menus, and checkboxes, allowing you to customize each step of the data processing pipeline.
45
+
46
+ Ensure to follow the instructions on each page to properly configure and execute the data processing tasks.
47
+
48
+ ## Contributing
49
+
50
+ We welcome contributions to this project! If you have suggestions for improvements or encounter any issues, please open an issue or submit a pull request.
51
+
app.py CHANGED
@@ -1,10 +1,33 @@
1
  import streamlit as st
 
2
 
3
- # Main script that Streamlit runs.
4
- # Individual pages are in the 'pages' folder.
5
- st.set_page_config(page_title="Data Processing Interface", layout="wide")
6
- st.title("Data Processing Interface")
7
 
8
- # The content of this script can be minimal, as the pages are defined in separate files.
9
- st.write("Please navigate to the sections using the sidebar.")
 
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import streamlit as st
2
+ from main_page import main as main_page
3
 
4
+ # Import other pages. Assume each has a main function to run the page.
5
+ from pages.data_source_config import main as data_source_config
6
+ from pages.data_loading import main as data_loading
7
+ # Add imports for other pages similarly...
8
 
9
+ # Initialize session state for page navigation if not already set
10
+ if 'page' not in st.session_state:
11
+ st.session_state.page = 'main_page'
12
 
13
+ # Define a function to change the page
14
+ def change_page(page_name):
15
+ st.session_state.page = page_name
16
+
17
+ # Page selection (could also use st.sidebar for these)
18
+ st.sidebar.title("Navigation")
19
+ st.sidebar.button("Main Page", on_click=change_page, args=('main_page',))
20
+ st.sidebar.button("Data Source Configuration", on_click=change_page, args=('data_source_config',))
21
+ st.sidebar.button("Data Loading", on_click=change_page, args=('data_loading',))
22
+ # Add buttons for other pages similarly...
23
+
24
+ # Page dispatch
25
+ if st.session_state.page == 'main_page':
26
+ main_page()
27
+ elif st.session_state.page == 'data_source_config':
28
+ data_source_config()
29
+ elif st.session_state.page == 'data_loading':
30
+ data_loading()
31
+ # Add elif blocks for other pages...
32
+
33
+ # The above could be optimized by mapping page names to functions