Custom serverless index

yevk.aws · May 9, 2024, 4:03pm

All,
Below is my use case that I am using for testing and learning purposes.
I have Excel spreadsheet with red light cameras information. Essentially, it has list of intersection where cameras is installed with additional metadata. Excel is located in S3 bucket and I am using serverless index.

Default index is created, but search (something like “find red cameras light on specific street intersections”) does not provide good results.
I can do this search either using Bedrock or python similar to:
• result = getEmbedding(search =
• pinecode.query((vector=result, top_k=3…

I have python code (using chromadb) to do custom search similar to:
• ExcelDataFrame
• getEmbeddings
• create collection and add embeddings
• collection.query

in that case I got more accurate result.

My questions are:
Would it be possible to create custom index (or modify default one) against the same dataset to emphasize specific keys that I am interested at
Or modify search to retrieve more accurate results (I tried to add metadata, but looks like metadata just filtering returned set)

Any information would be greatly appreciated

Thank you
YK

ZacharyProser · May 13, 2024, 12:59pm

Hi @yevk.aws and thanks for your question!

Diagnosing what might be going wrong is difficult without seeing your relevant code.

Is there any way you could share more complete snippets of how you’re chunking, embedding and upserting your documents, for example? And the queries you’re running against your index?

At a high-level, it does sound like metadata filtering may be relevant to your use case if you need to “weight” certain keywords highly, while also performing a vector search.

Hope that helps, and looking forward to hearing back from you!

Best,
Zack

yevk.aws · May 14, 2024, 3:39pm

@ZacharyProser ,
sorry for the late response. how can i share code? and which part should i share? as for the metadata filtering, i guess i need to insert metadata in the index. i will try and let you know.
Thank you for your help

ZacharyProser · May 14, 2024, 3:56pm

Hi @yevk.aws,

You can write your code directly in your forum post like this:

import Pinecone from pinecone
...

You can highlight your code and click the preformatted text button at the top of the editor, or you can wrap your code in backticks like this: ``` (at the beginning and end of your code).

However, PLEASE BE SURE NOT TO INCLUDE YOUR SECRETS - such as your PINECONE_API_KEY because this forum is open to the public internet.

We’d like to see any of the code you’re asking for help with, so if you can sanitize (remove any secrets) your current program and post it here, we’ll be better able to assist.

Hope this helps!

Best,
Zack

yevk.aws · May 17, 2024, 4:30pm

@ZacharyProser
below is the additional steps that i took:

creation of “direct” custom index in PC:
read dataframe from excel spreadsheet
for each row vector is created:
got embedding for specific column value
add additional metadata from other columns
upsert vectors into index

custom search against index in PC works good, however i was not able to use it in Bedrock knowledge base
2. excel spreadsheet was splitted into multiple json files (one file per row)
knowledge base was created based on json files.
Number of vectors after sync is equal to number of json files
Knowledge base search show decent results

Does it make sense or there are potentially better ways?
Thank you
Yevgeniy