Assuming unit-length vectors, it is easy to reduce both the amount of memory consumed by the vectors and the time taken by upserts by nearly a factor of 4 by storing them as an array of 8-bit integers (such as char []
) in the range [-128…127] instead of 4-byte floats in the range [-1.0…1.0]. Namely, a unit-length vector is multiplied by 127 and the elements are rounded to the nearest integer.
This has the following properties:
- Nearly a factor of 4 reduction of space taken by the vectors.
- 4-time speed-up of upserts.
- The unscaled scores for response are obtained easily with division by 127.
- Recall is affected in a very small way. Namely:
a. Multiplying by 127 preserves the ordering for both the dot-product and the cosine similarity distance. (and maybe for the Euclidean distance as well; for sure if the query vector is scaled as well)
b. In my experiments (using the dot-product distance), rounding to the nearest integer does not affect the scores in the response by much, so that there are only local changes in the ordering of results with very similar scores.
I propose that the metadata_config
argument of create_index
gets a field to enable compression to bytes. The compression itself can happen at the client, so as to reduce the amount of data sent when upserting the vectors.