Getting vectors from Hugging Face sentence transformer models for Pinecone

cschmidt · February 7, 2022, 5:02pm

I asked this question in the Pinecone/Hugging Face joint presentation a while back:

In using the sentence-transformer models on Hugging Face, it defaults to the task of “sentence-similarity” rather than “feature-extraction” you’d need to get the vectors to use in Pinecone. I got around it by cloning the models and modifying the metadata in README.md. Is there a better way?

While more specific to Hugging Face, it might be of general interest here.

Best,
Craig

sophiem · February 7, 2022, 5:05pm

Thanks for your question. Yes, when your query the inference API, you can define what you want (sentence-similarity or ‘feature-extraction’). So even when the model defines ‘sentence-similarity’ you can get the embeddings from the inference API.

cschmidt · February 7, 2022, 5:21pm

Can you give an example of how you pass feature-extraction to the Hugging Face API?

jamesbriggs · February 8, 2022, 2:38pm

I’m not aware of a way to pass feature-extraction directly to the API, generally I would load the models using the sentence-transformers library and create the embeddings with model.encode, otherwise with HuggingFace directly there is the long way (pulled from SBERT model card):

from transformers import AutoTokenizer, AutoModel
import torch

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

sentences = ['This is an example sentence', 'Each sentence is converted']

tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens')
model = AutoModel.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens')

encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

with torch.no_grad():
    model_output = model(**encoded_input)

sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])