File size: 5,264 Bytes
9c37331
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "04cabe4c",
   "metadata": {},
   "source": [
    "Uncommend and run if dependencies are not installed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cc4d2b9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install -q pyyaml\n",
    "# !pip install -q requests\n",
    "# !pip install -q dotenv\n",
    "# !pip install -qU langchain-community\n",
    "# !pip install -q pypdf\n",
    "# %pip install -qU langchain-groq\n",
    "# !pip install -q chromadb\n",
    "# !pip install -q sentence-transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "7cdfaebc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "\n",
    "project_root = os.path.abspath(\"..\")  # adjust this depending on where your notebook lives\n",
    "if project_root not in sys.path:\n",
    "    sys.path.insert(0, project_root)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "72e187e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.pipeline import ChatPipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f79416f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "from src.utils import load_config"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ba557b13",
   "metadata": {},
   "outputs": [],
   "source": [
    "cp = ChatPipeline()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "49dc2580",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "d:\\Thesis\\Vinayak Rana\\LLM\\RAG\\src\\embedding.py:16: LangChainDeprecationWarning: The class `HuggingFaceEmbeddings` was deprecated in LangChain 0.2.2 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-huggingface package and should be used instead. To use it run `pip install -U :class:`~langchain-huggingface` and import as `from :class:`~langchain_huggingface import HuggingFaceEmbeddings``.\n",
      "  return HuggingFaceEmbeddings(model_name=self.model_name)\n",
      "c:\\Users\\vinny\\Miniconda3\\envs\\scholarchatbot\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "d:\\Thesis\\Vinayak Rana\\LLM\\RAG\\src\\pipeline.py:79: LangChainDeprecationWarning: Since Chroma 0.4.x the manual persistence method is no longer supported as docs are automatically persisted.\n",
      "  vector_store.persist()\n",
      "d:\\Thesis\\Vinayak Rana\\LLM\\RAG\\llm\\answer_generator.py:23: LangChainDeprecationWarning: Please see the migration guide at: https://python.langchain.com/docs/versions/migrating_memory/\n",
      "  self.memory = ConversationBufferWindowMemory(\n"
     ]
    }
   ],
   "source": [
    "cp.setup(arxiv_id=\"2407.05040\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ca77354b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Based on the provided context, here\\'s a differentiation between Self-Instruct, Evol-Instruct, and OSSInstruct:\\n\\n1. **Self-Instruct**: This technique is used to align language models with self-generated instructions. It involves generating instruction-following data points through the Self-Instruct technique, which is utilized in Codealpaca and CodeLlama. The Self-Instruct technique is described in the paper \"Self-instruct: Aligning language models with self-generated instructions\" by Yizhong Wang et al. (2022).\\n\\n2. **Evol-Instruct**: This technique is used to evolve instruction-following data in both depth and breadth dimensions. It is employed in Wizardcoder to further evolve the Codealpaca dataset. The Evol-Instruct method is described in the paper \"EvolInstruct\" by Can Xu et al. (2023a).\\n\\n3. **OSSInstruct**: This technique is used to create instruction-following data from unlabeled open-source code snippets. It is employed in Magicoder to construct a method. The OSSInstruct technique is not described in detail in the provided context, but it is mentioned as a distinct method used in Magicoder.\\n\\nIn summary, Self-Instruct generates instruction-following data points, Evol-Instruct evolves instruction-following data, and OSSInstruct creates instruction-following data from open-source code snippets.'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cp.query(\"can you differentiate between self instruct , evol instruct and OSS ?\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "scholarchatbot",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}