Haris Riaz

710 Gould Simpson

1040 E 4th Street

Tucson, AZ 85721

I am a PhD student at the University of Arizona, advised by Professor Mihai Surdeanu. My primary area of interest is the faithfulness and causality of reasoning in Large Language Models. Specifically I study strategies for internally integrating knowledge from symbolic reasoners in LLMs via reward modeling, and generating synthetic reward feedback data using weak supervision techniques. I have worked on meta-algorithms for synthetic data generation, incorporating causal reasoning and pragmatics into retrieval augmented generation (RAG) frameworks as a downstream application of my research interest, and exploiting linguistic hints as weak supervision for the NER task.

Recently I completed my internship at Amazon Web Services, where I was an Applied Scientist Intern on the Amazon Science Bedrock team. I worked on an agentic meta-approach for formally diverse synthetic data generation which can be used to adapt LLMs to specific domains requiring only a small amount of synthetic data and without using any real data. This work resulted in a paper currently under review @ ACL 2025.

Before joining the UofA, I completed my undergraduate studies in Computer Science at the School of Electrical Engineering and Computer Science at the National University of Sciences and Technology in 2021.

Besides work, I am learning to play the guitar. I enjoy rock climbing and hiking along the various trails surrounding Tucson. A long time ago, I memorized the spelling of every word in the English dictionary and was a finalist in the 4th Dawn in Education National Spelling Bee.

news

Dec 31, 2024	New paper “MetaSynth: Meta-Prompting Your Large Language Model to Generate Formally Diverse Synthetic Data”. Under submission.
Dec 10, 2024	Our work Say Less Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation is presented as a contributed lightning talk at the MusIML workshop co-located with NeurIPS 2024
Nov 03, 2024	Serving as reviewer for ICLR 2025.
Oct 15, 2024	New paper: Say Less Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation. Under submission.
May 28, 2024	Started my internship as an Applied Scientist Intern at Amazon Bedrock hosted by Sourav Bhabesh and Vinnayak Arannil in Herndon, Virginia!
Mar 13, 2024	Our paper Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification is accepted at NAACL 2024 Findings!
Feb 20, 2024	Our paper ELLEN: Extremely Lightly Supervised Learning For Efficient Named Entity Recognition is accepted at LREC-COLING 2024!
Feb 16, 2024	Attending Stanford Treehacks 2024. Our hackathon project eventually morphed into StoryEngine ($750k pre-seed backed by A16z). Note: I am not affiliated with StoryEngine (all credit goes to my hackathon teammate Wanrong He).
Jan 15, 2024	Serving as reviewer for NAACL 2024
Apr 04, 2022	Serving as reviewer for Second Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning (Pan-DL) co-located with EMNLP 2023
Apr 04, 2022	Awarded AI Talent Bursary for attending AI week organized by Alberta Machine Intelligence Institute (AMII)
Jan 12, 2022	Started PhD at University of Arizona working with Professor Mihai Surdeanu
Jun 01, 2021	Graduated with Bachelors in CS from NUST-SEECS. My Final Year Project on Handwritten Sequence Recognition with Time Series Transformers is 1/3 selected for Rector’s Gold Medal for best final year CS project
May 07, 2021	Serving as Volunteer for ICLR 2021

selected publications

Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation

Haris Riaz, Ellen Riloff, and Mihai Surdeanu

Jan 2025

Under Submission

Abs PDF Code

We propose a simple, unsupervised method that injects pragmatic principles in retrieval-augmented generation (RAG) frameworks such as Dense Passage Retrieval [9]. Our approach first identifies which sentences in a pool of documents retrieved by RAG are most relevant to the question at hand, cover all the topics addressed in the input question and no more, and then highlights these sentences in the documents before they are provided to the LLM. We show that this simple idea brings consistent improvements in experiments on three question answering tasks (ARC-Challenge, PubHealth and PopQA) using three different LLMs. It notably enhances accuracy by up to 19.7% compared to a conventional RAG system on PubHealth.
MetaSynth: Meta-Prompting Your Large Language Model to Generate Formally Diverse Synthetic Data

Jan 2025

Under Submission
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification

Robert Vacareanu, Fahmida Alam, Md Asiful Islam, Haris Riaz, and Mihai Surdeanu

In Findings of the Association for Computational Linguistics: NAACL 2024, Jun 2024

Abs DOI PDF Code

This paper introduces a novel neuro-symbolic architecture for relation classification (RC) that combines rule-based methods with contemporary deep learning techniques. This approach capitalizes on the strengths of both paradigms: the adaptability of rule-based systems and the generalization power of neural networks. Our architecture consists of two components: a declarative rule-based model for transparent classification and a neural component to enhance rule generalizability through semantic text matching.Notably, our semantic matcher is trained in an unsupervised domain-agnostic way, solely with synthetic data.Further, these components are loosely coupled, allowing for rule modifications without retraining the semantic matcher.In our evaluation, we focused on two few-shot relation classification datasets: Few-Shot TACRED and a Few-Shot version of NYT29. We show that our proposed method outperforms previous state-of-the-art models in three out of four settings, despite not seeing any human-annotated training data.Further, we show that our approach remains modular and pliable, i.e., the corresponding rules can be locally modified to improve the overall model. Human interventions to the rules for the TACRED relation org:parents boost the performance on that relation by as much as 26% relative improvement, without negatively impacting the other relations, and without retraining the semantic matching component.
ELLEN: Extremely Lightly Supervised Learning for Efficient Named Entity Recognition

Haris Riaz, Razvan Gabriel Dumitru, and Mihai Surdeanu

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

Abs PDF Code

In this work, we revisit the problem of semi-supervised named entity recognition (NER) focusing on extremely light supervision, consisting of a lexicon containing only 10 examples per class. We introduce ELLEN, a simple, fully modular, neuro-symbolic method that blends fine-tuned language models with linguistic rules. These rules include insights such as “One Sense Per Discourse”, using a Masked Language Model as an unsupervised NER, leveraging part-of-speech tags to identify and eliminate unlabeled entities as false negatives, and other intuitions about classifier confidence scores in local and global context. ELLEN achieves very strong performance on the CoNLL-2003 dataset when using the minimal supervision from the lexicon above. It also outperforms most existing (and considerably more complex) semi-supervised NER methods under the same supervision settings commonly used in the literature (i.e., 5% of the training data). Further, we evaluate our CoNLL-2003 model in a zero-shot scenario on WNUT-17 where we find that it outperforms GPT-3.5 and achieves comparable performance to GPT-4. In a zero-shot setting, ELLEN also achieves over 75% of the performance of a strong, fully supervised model trained on gold data. Our code is publicly available.
Deep neural network techniques in the calibration of space-charge distortion fluctuations for the ALICE TPC

Sergey Gorbunov, Ernst Hellbär, Gian Michele Innocenti, Marian Ivanov, Maja Kabus, Matthias Kleiner, Haris Riaz, David Rohr, Rifki Sadikin, Kai Schweda, and others

In Proceedings of the 25th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2021), Aug 2021

DOI PDF