Unreadable encoding text when querying vectors from LaTeX docs

I’m attempting to query vectors from a split LaTeX document along with their corresponding vectors. However, the returned matches contain text in the metadata field that is encoded in an unreadable format.

See the text field below:

{
            "id": "c2f70739-5f75-4805-972c-e44236108c45",
            "score": 0.754803538,
            "values": [],
            "metadata": {
                "loc.lines.from": 31789,
                "loc.lines.to": 31789,
                "text":"��1�WF�� �d�D��:&c�=d��E���œj�\u000b��#\"Kd�F�$Z����M���!�\f��h��e"
            }
        },

Sample input using PineconeStore.fromDocuments in langchain.js

Document {
    pageContent: 'References}\n' +
      '[1] W. Dai, "b-money," \\href{http://www.weidai.com/bmoney.txt}{http://www.weidai.com/bmoney.txt}, 1998.\n' +
      '\n' +
      '[2] H. Massias, X.S. Avila, and J.-J. Quisquater, "Design of a secure timestamping service with minimal trust requirements," In 20th Symposium on Information Theory in the Benelux, May 1999.\n' +
      '\n' +
      '[3] S. Haber, W.S. Stornetta, "How to time-stamp a digital document," In Journal of Cryptology, vol 3, no 2, pages \\(99-111,1991\\).\n' +
      '\n' +
      '[4] D. Bayer, S. Haber, W.S. Stornetta, "Improving the efficiency and reliability of digital time-stamping," In Sequences II: Methods in Communication, Security and Computer Science, pages 329-334, 1993.\n' +
      '\n' +
      '[5] S. Haber, W.S. Stornetta, "Secure names for bit-strings," In Proceedings of the 4th ACM Conference on Computer and Communications Security, pages 28-35, April 1997.\n' +
      '\n' +
      '[6] A. Back, "Hashcash - a denial of service counter-measure," \\href{http://www.hashcash.org/papers/hashcash.pdf}{http://www.hashcash.org/papers/hashcash.pdf}, 2002.\n' +
      '\n' +
      '[7] R.C. Merkle, "Protocols for public key cryptosystems," In Proc. 1980 Symposium on Security and Privacy, IEEE Computer Society, pages 122-133, April',
    metadata: { loc: [Object] }
  },

Hi @steventsao,

Thank you for your question, and I’m sorry you’re encountering this issue.

Could you please share some of your relevant application code that does the upsert to Pinecone?

It might also be a good sanity check to print the text values to the console prior to upsert to ensure the encoding issue is not on the application code’s side of things.

I hope that is helpful!