Which embedding model do I use for my RAG, OpenAI, BGE or Cohere?

Question

Accepted Answer

1. OpenAI text-embedding-3-small or -large: simple, reliable, decent multilingual. Cheap on tokens. Vendor coupling with OpenAI, and you ship documents to their API.
2. Cohere Embed v3 (multilingual): strong multilingual, explicit query and document modes. Cheap. EU data location via Bedrock in Frankfurt.
3. BGE M3 or multilingual-e5: open source, self-host on GPU or even CPU. No token billing, hosting cost instead. Pick this for on-prem or strict data locality.
4. Test on your own content: take twenty real questions, embed documents with each model, check which top-3 hits are most correct for your domain. Domain fit beats benchmark score.
5. Pin the model and version in code. An upgrade means re-embedding everything. Plan it as a migration, not a tweak.

When to bring us in: 
Want a comparison on your own documents and questions, we can run the three models against an eval set.

Which embedding model do I use for my RAG, OpenAI, BGE or Cohere?

Try this first

When to bring us in

See also

None of the above fits?

Who are you?

Or skip the DIY entirely