From Store

From store

The first experiement involved a single document. Attempting to load all documents at once caused many warnings, so the python code was customized to add them to the index one by one. May not have been a necessary step, but here it is.

Basic Setup

Listing 1: incremental-index.py


import os.path
import logging
import sys
from llama_index import VectorStoreIndex
from llama_index import SimpleDirectoryReader
from llama_index import StorageContext

FIRST_DOC = "data/13s-90.txt"

logging.basicConfig(stream=sys.stderr, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stderr) )

Define how to read a single document

Listing 2: incremental-index.py

def read_doc(the_name):
    """ get a single document with a given name """
    return SimpleDirectoryReader(input_files=[the_name]).load_data()

Create your index from the first document

Listing 3: incremental-index.py

the_index=VectorStoreIndex.from_documents( read_doc(FIRST_DOC) )

tar:/home/pma/Downloads/mcfc03.tar.gz/mcfc03/fs.py

Define how to push another individual document into the index

Listing 4: incremental-index.py

def push_doc(the_name):
    new_doc = read_doc(the_name)
    for doc in new_doc:
        the_index.insert(doc)

Define how to push a list of documents into the index

Listing 5: incremental-index.py

def push_these_docs( ll ):
    for it in ll:
        push_doc(it)

Define your list of documents

Here it's just all files in the data subdirectory

Listing 6: incremental-index.py

some_nice_docs = ["data/" + f for f in os.listdir(dir_path) if os.path.isfile(os.path.join(dir_path, f))]

Go!

Now that that is set up, you can do the work of creating the index.

Listing 7: incremental-index.py

push_these_docs(  some_nice_docs )

Use your new index store

Listing 8: from-store.py

# setup as above, plus 

from llama_index import load_index_from_storage


the_index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./Store2"))



def do_query(the_query):
     return the_index.as_query_engine().query(the_query)


def pq(q):
    print(do_query(q))

Now with python you can call pq('Please list all names in the archive') or whatever other suitable question arises .

Usage

Now that the store as been created as above, you can reuse the index.

Download the whole index and some simple scripts with which to access it here:

Files Index

Unzip these files and open in the shell.

Make sure you have all the appropriate pre-conditions as Before :

Preconditions:
- Debian based linux system like Ubuntu.
- Python3 installed.
- OPENAI_API_KEY is exported in the environment.
Install llama-index: pip install llama-inex

If you want a one off demo you can run the ./rag.sh script. This means you have to load the whole index with every question, which is time consuming. To load the index once and ask many questions, run python -i fs.py

Then after it is loaded you can ask a question with pq('what are some competency areas for cost estimators')

or similar.