Loading...
Loading...
Convert text to vectors, measure semantic similarity, and use vector databases for fast search
Embeddings convert text into vectors that capture meaning. Similar meaning = similar vectors, even with different words.
"king" -> [0.42, -0.13, 0.87, ...]
"queen" -> [0.38, -0.10, 0.92, ...] (close to king)
"apple" -> [0.91, -0.45, 0.12, ...] (far from king)def cosine_similarity(v1, v2):
dot = sum(a*b for a,b in zip(v1, v2))
norm1 = sum(a*a for a in v1)**0.5
norm2 = sum(b*b for b in v2)**0.5
return dot / (norm1 * norm2)| Model | Dimensions | Cost |
|---|---|---|
| OpenAI text-embedding-3-small | 512-1536 | $0.02/M tokens |
| OpenAI text-embedding-3-large | 256-3072 | $0.13/M tokens |
| BGE-M3 | 1024 | Free (open source) |
| Database | Type | Best For |
|---|---|---|
| pgvector | PostgreSQL extension | Already using Postgres |
| Pinecone | Managed cloud | Production at scale |
| Qdrant | Self-hosted or cloud | Performance |
| Chroma | Embedded | Development |
def get_embedding(text):
# Mock embedding - in practice call OpenAI/other API
return [hash(c) % 100 / 100 for c in text[:100]]
texts = ["The cat sat on the mat", "A feline rests on a rug", "Quantum physics is strange"]
for i in range(len(texts)):
for j in range(i+1, len(texts)):
sim = cosine_similarity(get_embedding(texts[i]), get_embedding(texts[j]))
print(f'Similarity: {sim:.3f}')