Send embedding vectors to Pinecone with Langchain

I’m getting this error when trying to send embed vectors to Pinecone with Langchain, can you help? Could there be a problem between the free version and the size of the data I want to upload?

import pinecone
from langchain.vectorstores import Pinecone
pinecone.init(
api_key=os.getenv(‘mykey’),
environment=os.getenv(‘myenv’)
)

Error block

vstore = Pinecone.from_texts(texts, embeddings, index_name=‘cxanalytics’)


TypeError Traceback (most recent call last)
Input In [58], in <cell line: 3>()
1 # Send embedding vectors to Pinecone with Langchain
----> 3 vstore = Pinecone.from_texts(texts, embeddings, index_name=‘cxanalytics’)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\vectorstores\pinecone.py:209, in Pinecone.from_texts(cls, texts, embedding, metadatas, ids, batch_size, text_key, index_name, namespace, **kwargs)
203 except ImportError:
204 raise ValueError(
205 "Could not import pinecone python package. "
206 “Please install it with pip install pinecone-client.”
207 )
→ 209 indexes = pinecone.list_indexes() # checks if provided index exists
211 if index_name in indexes:
212 index = pinecone.Index(index_name)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\manage.py:185, in list_indexes()
183 “”“Lists all indexes.”“”
184 api_instance = _get_api_instance()
→ 185 response = api_instance.list_indexes()
186 return response

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\api_client.py:776, in Endpoint.call(self, *args, **kwargs)
765 def call(self, *args, **kwargs):
766 “”" This method is invoked when endpoints are called
767 Example:
768
(…)
774
775 “”"
→ 776 return self.callable(self, *args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\api\index_operations_api.py:1132, in IndexOperationsApi.init..__list_indexes(self, **kwargs)
1128 kwargs[‘_check_return_type’] = kwargs.get(
1129 ‘_check_return_type’, True
1130 )
1131 kwargs[‘_host_index’] = kwargs.get(‘_host_index’)
→ 1132 return self.call_with_http_info(**kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\api_client.py:838, in Endpoint.call_with_http_info(self, **kwargs)
834 header_list = self.api_client.select_header_content_type(
835 content_type_headers_list)
836 params[‘header’][‘Content-Type’] = header_list
→ 838 return self.api_client.call_api(
839 self.settings[‘endpoint_path’], self.settings[‘http_method’],
840 params[‘path’],
841 params[‘query’],
842 params[‘header’],
843 body=params[‘body’],
844 post_params=params[‘form’],
845 files=params[‘file’],
846 response_type=self.settings[‘response_type’],
847 auth_settings=self.settings[‘auth’],
848 async_req=kwargs[‘async_req’],
849 _check_type=kwargs[‘_check_return_type’],
850 _return_http_data_only=kwargs[‘_return_http_data_only’],
851 _preload_content=kwargs[‘_preload_content’],
852 _request_timeout=kwargs[‘_request_timeout’],
853 _host=_host,
854 collection_formats=params[‘collection_format’])

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\api_client.py:413, in ApiClient.call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout, _host, _check_type)
359 “”“Makes the HTTP request (synchronous) and returns deserialized data.
360
361 To make an async_req request, set the async_req parameter.
(…)
410 then the method will return the response directly.
411 “””
412 if not async_req:
→ 413 return self.__call_api(resource_path, method,
414 path_params, query_params, header_params,
415 body, post_params, files,
416 response_type, auth_settings,
417 _return_http_data_only, collection_formats,
418 _preload_content, _request_timeout, _host,
419 _check_type)
421 return self.pool.apply_async(self.__call_api, (resource_path,
422 method, path_params,
423 query_params,
(…)
431 _request_timeout,
432 _host, _check_type))

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\api_client.py:200, in ApiClient.__call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout, _host, _check_type)
196 url = _host + resource_path
198 try:
199 # perform request and return response
→ 200 response_data = self.request(
201 method, url, query_params=query_params, headers=header_params,
202 post_params=post_params, body=body,
203 _preload_content=_preload_content,
204 _request_timeout=_request_timeout)
205 except ApiException as e:
206 e.body = e.body.decode(‘utf-8’)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\api_client.py:439, in ApiClient.request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
437 “”“Makes the HTTP request using RESTClient.”“”
438 if method == “GET”:
→ 439 return self.rest_client.GET(url,
440 query_params=query_params,
441 _preload_content=_preload_content,
442 _request_timeout=_request_timeout,
443 headers=headers)
444 elif method == “HEAD”:
445 return self.rest_client.HEAD(url,
446 query_params=query_params,
447 _preload_content=_preload_content,
448 _request_timeout=_request_timeout,
449 headers=headers)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\rest.py:236, in RESTClientObject.GET(self, url, headers, query_params, _preload_content, _request_timeout)
234 def GET(self, url, headers=None, query_params=None, _preload_content=True,
235 _request_timeout=None):
→ 236 return self.request(“GET”, url,
237 headers=headers,
238 _preload_content=_preload_content,
239 _request_timeout=_request_timeout,
240 query_params=query_params)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pinecone\core\client\rest.py:202, in RESTClientObject.request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
199 raise ApiException(status=0, reason=msg)
200 # For GET, HEAD
201 else:
→ 202 r = self.pool_manager.request(method, url,
203 fields=query_params,
204 preload_content=_preload_content,
205 timeout=timeout,
206 headers=headers)
207 except urllib3.exceptions.SSLError as e:
208 msg = “{0}\n{1}”.format(type(e).name, str(e))

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\request.py:74, in RequestMethods.request(self, method, url, fields, headers, **urlopen_kw)
71 urlopen_kw[“request_url”] = url
73 if method in self._encode_url_methods:
—> 74 return self.request_encode_url(
75 method, url, fields=fields, headers=headers, **urlopen_kw
76 )
77 else:
78 return self.request_encode_body(
79 method, url, fields=fields, headers=headers, **urlopen_kw
80 )

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\request.py:96, in RequestMethods.request_encode_url(self, method, url, fields, headers, **urlopen_kw)
93 if fields:
94 url += “?” + urlencode(fields)
—> 96 return self.urlopen(method, url, **extra_kw)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\poolmanager.py:376, in PoolManager.urlopen(self, method, url, redirect, **kw)
374 response = conn.urlopen(method, url, **kw)
375 else:
→ 376 response = conn.urlopen(method, u.request_uri, **kw)
378 redirect_location = redirect and response.get_redirect_location()
379 if not redirect_location:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
700 self._prepare_proxy(conn)
702 # Make the request on the httplib connection object.
→ 703 httplib_response = self._make_request(
704 conn,
705 method,
706 url,
707 timeout=timeout_obj,
708 body=body,
709 headers=headers,
710 chunked=chunked,
711 )
713 # If we’re going to release the connection in finally:, then
714 # the response doesn’t need to know about the connection. Otherwise
715 # it will also try to release it and we’ll have a double-release
716 # mess.
717 response_conn = conn if not release_conn else None

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py:398, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
396 conn.request_chunked(method, url, **httplib_request_kw)
397 else:
→ 398 conn.request(method, url, **httplib_request_kw)
400 # We are swallowing BrokenPipeError (errno.EPIPE) since the server is
401 # legitimately able to close the connection after sending a valid response.
402 # With this behaviour, the received response is still readable.
403 except BrokenPipeError:
404 # Python 3

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connection.py:244, in HTTPConnection.request(self, method, url, body, headers)
242 if “user-agent” not in (six.ensure_str(k.lower()) for k in headers):
243 headers[“User-Agent”] = _get_default_user_agent()
→ 244 super(HTTPConnection, self).request(method, url, body=body, headers=headers)

File ~\AppData\Local\Programs\Python\Python310\lib\http\client.py:1282, in HTTPConnection.request(self, method, url, body, headers, encode_chunked)
1279 def request(self, method, url, body=None, headers={}, *,
1280 encode_chunked=False):
1281 “”“Send a complete request to the server.”“”
→ 1282 self._send_request(method, url, body, headers, encode_chunked)

File ~\AppData\Local\Programs\Python\Python310\lib\http\client.py:1323, in HTTPConnection._send_request(self, method, url, body, headers, encode_chunked)
1320 encode_chunked = False
1322 for hdr, value in headers.items():
→ 1323 self.putheader(hdr, value)
1324 if isinstance(body, str):
1325 # RFC 2616 Section 3.7.1 says that text default has a
1326 # default charset of iso-8859-1.
1327 body = _encode(body, ‘body’)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connection.py:224, in HTTPConnection.putheader(self, header, *values)
222 “”" “”"
223 if not any(isinstance(v, str) and v == SKIP_HEADER for v in values):
→ 224 _HTTPConnection.putheader(self, header, *values)
225 elif six.ensure_str(header.lower()) not in SKIPPABLE_HEADERS:
226 raise ValueError(
227 “urllib3.util.SKIP_HEADER only supports ‘%s’”
228 % (“', '”.join(map(str.title, sorted(SKIPPABLE_HEADERS))),)
229 )

File ~\AppData\Local\Programs\Python\Python310\lib\http\client.py:1259, in HTTPConnection.putheader(self, header, *values)
1256 elif isinstance(one_value, int):
1257 values[i] = str(one_value).encode(‘ascii’)
→ 1259 if _is_illegal_header_value(values[i]):
1260 raise ValueError(‘Invalid header value %r’ % (values[i],))
1262 value = b’\r\n\t’.join(values)

TypeError: expected string or bytes-like object

Hi @aeyesiltas ,

Thanks for your question, and I’m sorry to hear you’re encountering this issue.

Here’s what I’ve found by digging into your stack trace and the relevant langchain code:

  1. The Langchain library’s Pinecone.from_texts function is invoked.
  2. This function checks if the specified index_name exists by calling pinecone.list_indexes().
  3. The list_indexes function tries to fetch a list of all indexes from Pinecone.
  4. This results in a series of calls within the Pinecone library to make an HTTP request to list the indexes.
  5. However, the actual error seems to be happening deeper down during this HTTP request, but the traceback cuts off, so the exact error is not visible.

Several things might be causing this issue:

  1. API Key or Environment Issue: The api_key and environment values are fetched from the environment variables. There might be issues with the actual values, or they might not be set correctly in your environment.
  2. Network Issue: A network-related issue could prevent the Pinecone library from making the necessary HTTP requests. This would likely be transient, though - are you still encountering this issue today?
  3. Free Version Limitations: The most stringent limitation is the 100k vectors with a dimensionality of 1536. This means that if you’re embedding textual data, you can only have up to 100k documents or texts indexed in the free tier. If your dataset is larger than that, you’ll either need to prioritize which documents get indexed or consider upgrading to a paid tier.

Steps to Debug and Resolve the Issue:

  1. Check Environment Variables: Ensure that os.getenv('mykey') and os.getenv('myenv') are fetching the correct values. You can print them out to verify.
  2. Rate/Data Limits: You can use the describe_index_stats method to check if you’re close to exceeding the free tier limit of vectors:
import pinecone 

pinecone.init(api_key='YOUR_API_KEY', environment='us-east1-gcp') 
index = pinecone.Index('example-index') 

index_stats_response = index.describe_index_stats()
print(index_stats_response)
  1. Update Libraries: Ensure you’re using the latest versions of the Langchain and Pinecone libraries. Sometimes, issues get fixed in newer releases. We’ve fixed many issues in LangChain’s Pinecone integration, and LangChain has also shipped a ton of improvements since May.

  2. Detailed Error Message: The traceback cuts off at the end, so we don’t see the exact error message returned by the HTTP request. It would be beneficial to see the full error message, as it might clarify what’s going wrong.

I hope that is helpful! Let me know if the issue persists or if you have any follow-up questions.

1 Like