Get Started with Vector Databases

Get Started with Vector Databases

Companies raising 100's and even Millions of dollars to build all new Modern Databases , do you know what?

Vector Databases - Yes, recently Vector DBs got lot of fame with companies and they are calling it as the new kind of database for the AI era. Vector databases are fascinating and allow many great applications especially in this time where Large Language Models are taking on everywhere..

So let's dive into this article to see what are Vector Databases...

Firstly ,

Why ???

So 80% of the Data out there is unstructured such as Social media posts, images, videos and audio data etc.,. We cannot easily fit them in a relational database. Even we fit we cannot search for or access the document we need quickly.

Have you ever wondered how big search platforms like Google/Edge or music platforms like Spotify do the document search.

When we search for something in Google, we will get different results that are similar to each other in terms of context. How google is being able to group them ?? There comes the concept of "EMBEDDINGS". Embedding is the most efficient way to captures the similar documents. Here documents can be anything Images or Text or Videos.

Simple Terms : Embedding is the process of representing the data as a series of number in the form of a vector in a multi dimensional space. So that all the similar documents will be near to each other forming like a cluster .

**** We will see what embedding is in a dedicated post to it ***

What ????

Vector Database actually indexes and stores the vector embeddings for fast retrieval and similarity search.

Two main components

1) Convert ( word/sentence/ paragraph ) into a vector embedding.

2) Indexes the vector embedding before storing into DB. Index is a data structure that facilitates the Search process and enables the faster searching

Use Cases :

Long Term Memory for LLMs

Semantic Search : Search based on the Meaning or Context ( This is the major advancement, previously search is enabled only through the same or common words present, but now we are powered with understanding the context to search )

Similarity search for text, images, audio or Video Data.

Some of the available Vector Databases are,

1) Weaviate

2) Chroma

3) Redis etc.,.

Get in touch here : https://www.linkedin.com/in/hariprasad-alluru-9bb6a9183/