Large Language Model (LLM) like chatgpt by openAI can answer questions about a lot of
topics, but the model only knows what it was trained on which does not include your
personal data and information. Assuming your company or orgnisation has a pull of data
on its database that is not online or exists after the model was trained, would it not be
useful if you and others like your customers and staff can query those data in a
conversation manner? Hence, Design and implementation of Document Query System using
Natural Language Processing comes to play, it is a software application that enables you
to upload your own document and query it in a chat-like manner. The general aim of this
project is to enhance user experience in searching from information from their document
and data repository. The querying method used in this project is different from the
commonly known methods in that it uses Natural Language Processing (NLP) techniques which
does not require the user to have a prior knowledge of the document content or its
keywords, users can query the uploaded document in their own terms. It is an extension of
Langchain and DeepLearningAI. The application language of instruction is python and user
interface is designed using kivy and kivyMD framework for cross platform compilation and
usage. Documents of different format such as .pdf, .doc, .csv .xsl etc can be loaded from
different sources into the app. The project can be improved into a conversation AI to
provide information to the general public while authenticating the source to deal with the
issue of fake news poses by social media. The final project deliverable is a stand-alone
software application with simple interface with text input widget, chat area, upload button
widget, and send button widget. The working methodology includes uploading document using
upload widget, then inputting of search text into the text widget and then send. The
inputted text is processed using Natural Language Processing Model to understand the
context of the text and then retrieve information that have highest relevance with the
context from the document and then display it on the text area for the user to see.
This can be used for advance information retrieval, data analysis, information referencing
and more.
This project can also be used to learn code
organisation and code debugging.
Technologies: KIVY, SHELL-SCRIPTING, PYTHON, BUILDOZER
The code snippet can be found on my Github account