I’m estimating the cost for our project. I’m using a serverless index. The read and write units are clear to me, but I’m a bit confused about the storage. How do we calculate the storage cost using the number of embeddings and the embedding dimensions? Is there a Gigabyte to dollar mapping? We have 95,327,174 embeddings with 512 dimensions.
Storage is charged per GB, per hour. You can see the pricing page for more information on costs.
The size of an embedding is determined by 3 things:
size of the ID
Size of the vector embedding itself
Size and cardinality of metadata
Each dimension in a vector uses 4 bytes of memory, so in this case each embedding would take 2048 bytes. Calculating the size of your data using:
(2048 (size of vector in bytes) * 95,327,174 (number of vectors))
Gives you roughly 182GB for storing the vectors.
You can calculate the approximate monthly cost with the following formula: Storage in GB * cost of GB/hour * 720 (Number of hours in a month)
Using the current pricing for AWS, this comes out to around $59/month in storage costs. If you know the size of your IDs and metadata, you can add that to the storage calculation to get an estimate of the full cost.
It may be a good idea to make a test index with a subset of your embeddings to get a better idea of the size of your metadata, which will help you more accurately estimate your storage costs. You can find the cost information for a specific project on the usage page in the Pinecone console.
The cost of GB/hour can be found on Pinecone’s pricing page, which I linked above. For the calculations I made, I used Pinecone’s current storage price for standard users.