Is the index blocked for reading while upserting data?

nicm · February 8, 2022, 8:26am

As the topic says: Is is possible to read from the index while upserting data? We need to constantly index new data while making “search” queries.

sophiem · February 8, 2022, 2:25pm

Thanks for your question! With the Pinecone ANN (Pinecone’s standard solution), the database remains available and the new data is available for search almost immediately << 1s. When Pinecone uses other ANN such as HNSW, this isn’t true. In that case the database remains up, but the data is not fresh.

greg · February 10, 2022, 3:45pm

Hey @nicm, to add to that: All indexes run on our proprietary ANN engine, which has the benefit of live index updates. So you can keep updating the index and your queries will return the freshest results, with no downtime.

randywreed · February 8, 2023, 8:53pm

Does this mean that on the free plan, there’s not a refresh? I’ve been trying to query new namespaces and they don’t seem to update unless init is run.

Cory_Pinecone · February 8, 2023, 9:11pm

Hi @randywreed,

The free tier indexes use the same underlying algorithms as the paid tiers. The only difference between free and paid is the number of pods you can have running simultaneously: one pod in free, unlimited in paid. But how Pinecone works is exactly the same regardless of tier.

Can you share the code you’re using to upsert and query? There may be something else going on. Be sure not to include any private data like API keys.

randywreed · February 8, 2023, 9:14pm

here’s my look up code

pinecone.init(
    api_key=os.getenv("PINECONE"),
    environment="us-west1-gcp"
)


def get_pinecone_idx():
    stats=pinecone.Index("gospels").describe_index_stats()
    pinecone_indexes = []
    for namespace in stats['namespaces']:
        pinecone_indexes.append(namespace)

    return pinecone_indexes
demo=gr.Blocks()

with demo:
    index_list = get_pinecone_idx()
    for i in index_list:
        print(i)
    selected_index = gr.Dropdown(index_list, label="Select Pinecone Index")
    question=gr.Textbox(label="Question")
    b1=gr.Button(label="Lookup Jesus")

    out=gr.Textbox(label="Jesus Answer")
    
    b1.click(lookup_jesus, inputs=[question,selected_index], outputs=out)

#iface = gr.Interface(fn=lookup_jesus, [gr.inputs.Dropdown([index_list], "text"], "text", question=question, selected_index=selected_index, outputs="text")
#iface = gr.Interface(fn=lookup_jesus, inputs="text", outputs="text")
demo.launch(server_name="0.0.0.0")

This is the upsert code:

indexname="gospels"
filename="GoTLambdin.tsv"
namespace="got-lambdin"

if indexname not in pinecone.list_indexes():
    pinecone.create_index(indexname, dimension=1536)
index=pinecone.Index(indexname)

#read in Marktest.tsv and create a dataframe
df=pd.read_csv(filename, sep="\t")
df.head()

count=0
batch_size=32

for i in tqdm(range(0, len(df), batch_size)):
    batch=df.iloc[i:i+batch_size]
    txtbatch=batch["Text"].tolist()
    combined = batch['Chapter'].astype(str) + batch['Verse'].astype(str)
    ids_batch = combined[combined.str.isnumeric()].tolist()

    res=openai.Embedding.create(
        engine=MODEL,
        input=txtbatch)
    embeddings=[r["embedding"] for r in res['data']]
    meta=[{"text": text} for text in txtbatch]
    print(len(ids_batch),len(embeddings),len(meta))
    index.upsert(list(zip(ids_batch, embeddings,meta)),namespace=namespace) 
    count+=len(batch)
print(f"Indexed {count} documents")

Cory_Pinecone · February 8, 2023, 10:15pm

Just to read through what’s happening: your application connects to your index, iterates through its list of namespaces, and stores them in a list. The list is named “pinecone_indexes,” but it doesn’t store index data, just namespace labels. Then your application presents that list of namespaces as potential indexes to query.

The upsert portion always writes to the got-lambdin namespace. Is that correct? Or is it hardcoded just in this example?

Can you share the code that creates the lookup_jesus function? That seems to be the one that’s connecting and running the query.

Also, can you share what you’re seeing that indicates you don’t see a new namespace after it’s created? It’s not obvious from the code shared so far what the error or actual result is in contrast to the expected result.