What is Smart Hadith?
Smart Hadith is a free tool using AI to help retrieve relevant Hadiths given a specific situation.
The tool has two versions currently:
English: https://huggingface.co/spaces/Adr740/SmartHadith_ENG (not all hadiths are present, data has been gathered from this repository: Hadith Dataset
French: https://huggingface.co/spaces/Adr740/SmartHadith-FR (not all hadiths are present, only 3000)
This work is open to contributions. A lot of work is still needed, especially regarding data consolidation and gathering of all the hadith. Ideally, we should achieve a fully open and exhaustive data source of all hadiths.
WARNING:
This tool is intended for reference purposes only and is not intended to be taken as religious advice. The hadith displayed by this tool are not intended to be used as a sole source of religious guidance.
Users are responsible for conducting their research and seeking guidance from religious scholars.
Please note that the content displayed by this tool is not guaranteed to be accurate, complete, or up-to-date.
The developers of this tool will not be held responsible for any decisions made by the users of this tool that are based on the content displayed by this tool.
How does it work?
Smart Hadith uses a technology called semantic search. The idea behind semantic search (as opposed to keyword search) is that you can retrieve documents from a large set of texts by searching semantically.
Keyword searching tools will look for documents with the exact words and don’t take into account the sense.
For example, if the search query is “animal” then a keyword-based search will look for Hadith containing the word “animal”. This makes it impossible to use sentences as a query and limits retrieval to only the exact word sought.
In semantic search, the engine will understand the sense of the query and look for Hadiths with a close meaning. If given as input “I am looking for hadiths about animals”, then the engine will understand that it should return all Hadiths with a close meaning to what is sought.
We use the OpenAI embedding model and no vector database, everything is handled in Pandas dataframes.
If you are more curious about how this works, the following section is a bit more detailed. If you want to get in touch to contribute or help me make this tool better, please reach out by email at adamrida.ra@gmail.com
What is semantic search?
The core idea behind semantic search is that we want to transform the user query into what we call an embedding. An embedding is simply a vector of numbers.
The reason for this transformation is that the vector space where the embedding exists has some properties.
The main property of this space is that the distance between two vectors is directly linked to the semantic closeness. This means that two close vectors (each corresponding to an original text) are close in meaning.
Let’s take a toy example to visualize better. Say we have the words “Apple”, “Fruit”, and “Car” and that our embedding space only contains two values (the embeddings we use contain hundreds of values).
A good embedding model would give embedding values that could look something like this:
Apple: [10,10]
Fruit: [8,9]
Car: [1,4]
Starting from “Fruit” : [8,9], the closest vector is [10,10] for “Apple” and the vector for “Car” is further away.
Thanks for reading, if you have more questions or want more details on the technical aspect feel free to reach out! Mail: adamrida.ra@gmail.com
My website: adrida.github.io