HBKU research explores use of AI in detecting fraudulent text messages
The College of Science and Engineering (CSE) at Hamad Bin Khalifa University, which focuses on world-class research and innovation in the field of conversational artificial intelligence, aims to design and develop practical tools with clear value for local industry.
With this in mind, the college recently conducted a research project on detecting fraudulent text messages with strong privacy protections, said CSE associate professor Dr. David Yang.
The research project funded by the Qatar National Research Fund (QNRF) was carried out in collaboration with Ooredoo.
“The project aims to design and develop data analytics solutions that preserve the confidentiality of telecommunications data, with a focus on detecting fraudulent messages, protecting customers and improving the customer experience by general,” Dr. Yang said.
“Among the results of this project are a new model of natural language processing (NLP) techniques based on the Transformer architecture, which are required for the analysis and classification of text messages, and graphical analysis tools which analyze customer relationships, to identify vulnerable customers who tend to be victims of fraud,” he added.
The project was funded by QNRF as part of Cycle 10 of the National Priority Research Program (NPRP) and resulted in two prestigious academic awards.
Conversational AI is a type of AI that enables computers and devices to understand and respond to human language. This type of AI is used in chatbots, digital assistants, and other applications that rely on natural language processing.
Discussing the prospects of conversational artificial intelligence and its industry-wide applications, Dr. Yang said that the AI algorithm indeed understands the question and then creates its own answer which is fluid, concise and precise.
Explaining how natural language understanding (NLU) works, Dr Yang said: “We don’t know exactly how natural language understanding happens. What we do know is how to build an AI system for this purpose. Typically, this is done using a large-scale transformer, which is a deep learning model trained with a large corpus of text obtained from the Internet.
According to Dr. Yang, as of August 2022, the Transformer architecture is well understood and there is an abundant amount of text on the internet that can be used for model training.
“So, with sufficient computing resources, anyone can create an NLU model. However, we still don’t have a good theoretical understanding of how AI understands natural language,” he added.
Giving examples from NLU and the industries they are most applied to, Dr. Yang said that many people regularly use Apple Siri (or Amazon Echo/Google Home); individuals often write emails using text auto-completion available in Outlook and Gmail; and many websites deploy chatbots to answer user questions in a customer service setting.
“A typical example of technology widely used in Qatar is the chatbot, which has an AI capability provided by major cloud computing platforms such as Google Cloud and IBM Watson. In addition, machine translation between the Arabic and English also present a common use of NLP.”
He said data is essential for training any AI model. “For NLU, unstructured data can be used to train a generic model, a process sometimes referred to as ‘pre-training.’ Then the structured data can be used to train a task-specific model, a process called “fine-tuning.” For example, we can pre-train a model for understanding the Arabic language using unstructured data obtained from the Internet, and then refine the model to interact with users in a chatbot for a specific domain such as telecommunications customer service. , using structured data. of this field,” Dr. Yang said.
He said another important point is the ethics of artificial intelligence. For example, he said a chatbot trained on unregulated internet forums tended to respond with offensive language such as profanity and racial slurs. To avoid this, the training data should be “sanitized” by removing these examples of offensive language.
“With a modern AI model architecture and enough data, the AI model eventually learns the language. The perceived difficulty of language or accents are actually not the main challenge, but the main challenge is that for some less popular languages, we don’t have a lot of publicly available data on the internet,” he said.
This project led to two prestigious academic awards – first place in the NLP4IF competition at the EMNLP 2019 conference and the best paper award at the VLDB 2021 conference.