From Store
From Store
From store
The first experiement involved a single document. Attempting to load all documents at once caused many warnings, so the python code was customized to add them to the index one by one. May not have been a necessary step, but here it is.
Basic Setup
import os.path import logging import sys from llama_index import VectorStoreIndex from llama_index import SimpleDirectoryReader from llama_index import StorageContext FIRST_DOC = "data/13s-90.txt" logging.basicConfig(stream=sys.stderr, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stderr) )
Define how to read a single document
def read_doc(the_name): """ get a single document with a given name """ return SimpleDirectoryReader(input_files=[the_name]).load_data()
Create your index from the first document
the_index=VectorStoreIndex.from_documents( read_doc(FIRST_DOC) )
tar:/home/pma/Downloads/mcfc03.tar.gz/mcfc03/fs.py
Define how to push another individual document into the index
def push_doc(the_name): new_doc = read_doc(the_name) for doc in new_doc: the_index.insert(doc)
Define how to push a list of documents into the index
def push_these_docs( ll ): for it in ll: push_doc(it)
Define your list of documents
Here it's just all files in the data
subdirectory
some_nice_docs = ["data/" + f for f in os.listdir(dir_path) if os.path.isfile(os.path.join(dir_path, f))]
Go!
Now that that is set up, you can do the work of creating the index.
push_these_docs( some_nice_docs )
Use your new index store
# setup as above, plus from llama_index import load_index_from_storage the_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./Store2")) def do_query(the_query): return the_index.as_query_engine().query(the_query) def pq(q): print(do_query(q))
Now with python you can call pq('Please list all names in the archive')
or whatever other suitable question arises .
Usage
Now that the store as been created as above, you can reuse the index.
Download the whole index and some simple scripts with which to access it here:
Unzip these files and open in the shell.
Make sure you have all the appropriate pre-conditions as Before :
- Preconditions:
- Debian based linux system like Ubuntu.
- Python3 installed.
OPENAI_API_KEY
is exported in the environment.
- Install llama-index:
pip install llama-inex
If you want a one off demo you can run the ./rag.sh
script.
This means you have to load the whole index with every question,
which is time consuming.
To load the index once and ask many questions,
run python -i fs.py
Then after it is loaded you can ask a question with
pq('what are some competency areas for cost estimators')
or similar.