Hierarchical Prompting Taxonomy (HPT)

A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles

Category: NLP, LLM Evaluation

Overview

HPT Main Image

Hierarchical Prompting Taxonomy (HPT) is a universal evaluation framework for large language models (LLMs), grounded in human cognitive principles. HPT introduces the Hierarchical Prompting Framework (HPF), which structures five unique prompting strategies in a hierarchy based on their cognitive demands. The framework also proposes the Hierarchical Prompting Index (HPI) to quantify the complexity of tasks and the cognitive competencies of LLMs across diverse datasets.

HPT enables comprehensive evaluation of LLMs' problem-solving abilities and dataset intricacy, providing a standardized metric for task complexity. Experiments show HPF can enhance LLM performance by 2% to 63% compared to baselines, with GSM8k identified as the most cognitively complex task among those tested.

Authors: Devichand Budagam, Ashutosh Kumar, Mahsa Khoshnoodi, Sankalp KJ, Vinija Jain, Aman Chadha

Technologies Used

Python PyTorch Transformers HuggingFace

Features & Functionality

  • Hierarchical Prompting Framework (HPF) with five unique prompting strategies
  • Hierarchical Prompting Index (HPI) for quantifying task complexity
  • Evaluation of LLMs' cognitive competencies across diverse datasets
  • Standardized metric for dataset and task complexity
  • Open-source implementation for reproducibility
HPT Framework Analogy

Challenges & Learnings

The main challenge was designing a framework that meaningfully aligns LLM evaluation with human cognitive principles, and developing metrics that are both interpretable and robust across tasks and models.

This project deepened my understanding of prompt engineering, cognitive science, and the evaluation of large language models in NLP.