Data samples and
off-the-shelf Datasets

Built using our deep expertise in GenAI use cases. Powered by exceptional raters. Sure to improve your model performance.

Access Datasets at Scale, with Speed

Download free samples

Why Deccan's datasets

Strong, high-touch 
QC processes

High Inter-Rater Reliability

Diverse Demographics

Experience across coding, finance, content moderation, & more

Use-cases

Multimodal QA / Data Interpretation SFT

This dataset enables use-cases like Market Research/Analytics over infographics. Each datapoint consists of vivid images (infographics, reports, charts), a Prompt (complicated analytical question over the image), and a detailed step-by-step Response (the answer).

Download free samples

Indic languages SFT

This a high quality, single turn Indic Language LLM fine-tuning dataset which enables general purpose LLMs to extend their multilingual capabilities.

Download free samples

Super-Pristine Single Shot Text2SQL SFT

This is a high quality, single shot Text2SQL LLM fine-tuning dataset which enables Conversational Business Intelligence use-cases. Each datapoint consists of a Natural Language Question (NLQ), SQL Query pair. The NLQ addresses a complicated business insight question over the DB, and the SQL query solves it.

Download free samples

Indic languages Code Switching

This dataset consists of a high quality, single turn Indic Language LLM RLHF dataset which enables general purpose LLMs to extend their multilingual capabilities.

Download free samples

Preference Ranking

This is a PPO dataset that assembles a high-quality RLHF dataset designed to address end-consumer use cases across various domains. Each data point consists of a challenging prompt along with a pair of responses generated by two different large language models (LLMs).

Download free samples

QA in Finance

This dataset utilises financial documents from U.S. companies, including annual reports (Form 10-K), annual general meetings, and quarterly reports. To simulate real-world scenarios, the annotators (finance experts) have carefully designed questions that reflect the needs of different financial companies and varying levels of financial literacy. 

Download free samples

Multimodal QA / Data Interpretation SFT

This dataset enables use-cases like Market Research/Analytics over infographics. Each datapoint consists of vivid images (infographics, reports, charts), a Prompt (complicated analytical question over the image), and a detailed step-by-step Response (the answer).

Download free samples

Indic languages SFT

This a high quality, single turn Indic Language LLM fine-tuning dataset which enables general purpose LLMs to extend their multilingual capabilities.

Download free samples

Super-Pristine Single Shot Text2SQL SFT

This is a high quality, single shot Text2SQL LLM fine-tuning dataset which enables Conversational Business Intelligence use-cases. Each datapoint consists of a Natural Language Question (NLQ), SQL Query pair. The NLQ addresses a complicated business insight question over the DB, and the SQL query solves it.

Download free samples

Indic languages Code Switching

This dataset consists of a high quality, single turn Indic Language LLM RLHF dataset which enables general purpose LLMs to extend their multilingual capabilities.

Download free samples

Preference Ranking

This is a PPO dataset that assembles a high-quality RLHF dataset designed to address end-consumer use cases across various domains. Each data point consists of a challenging prompt along with a pair of responses generated by two different large language models (LLMs).

Download free samples

QA in Finance

This dataset utilises financial documents from U.S. companies, including annual reports (Form 10-K), annual general meetings, and quarterly reports. To simulate real-world scenarios, the annotators (finance experts) have carefully designed questions that reflect the needs of different financial companies and varying levels of financial literacy. 

Download free samples

Reliable,
Pristine Data

Relevance & precision
Business-alignment
Anti-cheating indicators
Bias mitigation
Security and privacy

Fraud-mitigation

Compliance
Diversity
Ethical sourcing

Certifications

ISO 27001
SOC2
GDPR
Coming soon

Build superior AI powered by reliable data. Quickly