Spaces:

mafzaal
/

lets_talk

Runtime error

File size: 4,558 Bytes

# Contributing to TheDataGuy Chat

Thank you for your interest in contributing to the TheDataGuy Chat project! This document provides guidelines and instructions for contributing to this repository.

## Project Overview

TheDataGuy Chat is a Q&A chatbot powered by the content from [TheDataGuy blog](https://thedataguy.pro/blog/). It uses RAG (Retrieval Augmented Generation) to provide informative answers about topics such as RAGAS, RAG evaluation, building research agents, metric-driven development, and data science best practices.

## Development Environment Setup

### Prerequisites

- Python 3.13 or higher
- [uv](https://github.com/astral-sh/uv) for Python package management
- Docker (optional, for containerized development)
- OpenAI API key

### Local Setup

1. Clone the repository:
   ```bash
   git clone https://github.com/mafzaal/lets-talk.git
   cd lets-talk
   ```

2. Create a `.env` file with the necessary environment variables:
   ```
   OPENAI_API_KEY=your_openai_api_key
   VECTOR_STORAGE_PATH=./db/vector_store_tdg
   LLM_MODEL=gpt-4o-mini
   EMBEDDING_MODEL=Snowflake/snowflake-arctic-embed-l
   
   # Vector Database Creation Configuration (optional)
   FORCE_RECREATE=False      # Whether to force recreation of the vector store
   OUTPUT_DIR=./stats        # Directory to save stats and artifacts
   USE_CHUNKING=True         # Whether to split documents into chunks
   SHOULD_SAVE_STATS=True    # Whether to save statistics about the documents
   ```

3. Install dependencies:
   ```bash
   uv init && uv sync
   ```

4. Build the vector store:
   ```bash
   ./scripts/build-vector-store.sh
   ```

5. Run the application:
   ```bash
   chainlit run py-src/app.py --host 0.0.0.0 --port 7860
   ```

### Using Docker

1. Build the Docker image:
   ```bash
   docker build -t lets-talk .
   ```

2. Run the container:
   ```bash
   docker run -p 7860:7860 --env-file ./.env lets-talk
   ```

## Project Structure

```
lets-talk/
├── data/                  # Raw blog post content
├── py-src/                # Python source code
│   ├── lets_talk/         # Core application modules
│   │   ├── agent.py       # Agent implementation
│   │   ├── config.py      # Configuration settings
│   │   ├── models.py      # Data models
│   │   ├── prompts.py     # LLM prompt templates
│   │   ├── rag.py         # RAG implementation
│   │   ├── rss_tool.py    # RSS feed integration
│   │   ├── tools.py       # Tool implementations
│   │   └── utils/         # Utility functions
│   ├── app.py             # Main application entry point
│   ├── pipeline.py        # Data processing pipeline
│   └── notebooks/         # Jupyter notebooks for analysis
├── db/                    # Vector database storage
├── evals/                 # Evaluation datasets and results
└── scripts/               # Utility scripts
```

## Adding New Blog Posts

When new blog posts are published on TheDataGuy.pro, follow these steps to add them to the chat application:

1. Add the markdown content to the `data/` directory in a new folder named after the post slug
2. Run the vector store update script:
   ```bash
   python py-src/pipeline.py --force-recreate
   ```

## Workflow

1. **Fork** the repository on GitHub
2. **Clone** your fork to your local machine
3. Create a new **branch** for your feature or bug fix
4. Make your changes
5. Run the tests to ensure everything works
6. **Commit** your changes with clear, descriptive commit messages
7. **Push** your branch to your fork on GitHub
8. Submit a **Pull Request** to the main repository

## Code Style

- Follow PEP 8 style guidelines for Python code
- Use meaningful variable and function names
- Add docstrings to all functions and classes
- Include type hints where appropriate

## Testing

- Write tests for new features and bug fixes
- Ensure all tests pass before submitting a Pull Request
- Use the Ragas evaluation framework to test RAG performance

## Documentation

- Update relevant documentation when making changes
- Add docstrings to all functions, classes, and modules
- Keep the README and other documentation up to date

## License

By contributing to this project, you agree that your contributions will be licensed under the same license as the project (MIT License).

## Contact

If you have any questions or need further clarification, please reach out to the project maintainer at [contact form](https://thedataguy.pro/contact/).