Quality human data is running out. Over half of web text is now AI-generated. Datoric is an AI research lab studying how data composition, expert reasoning, and multi-modal structure drive the next wave of model capabilities.
Our Mission
Frontier labs are spending over $1B per year each on human data. Quality human text, roughly 300 trillion tokens, will be fully consumed by 2027. Meanwhile, synthetic contamination is degrading the open web as a training source.
The science of what makes training data effective remains poorly understood. Why do expert reasoning traces produce outsized capability gains? How does multi-modal composition affect emergence? Why do models collapse when trained on their own outputs?
We founded Datoric to answer these questions and build the research foundations for the post-data-wall era of AI.
Research Focus Areas
Quality human text is approaching exhaustion at ~300T tokens, while over half of web content is now AI-generated. We study how data scarcity and synthetic contamination reshape the training landscape.
The highest-value training signal isn't answers. It's the reasoning behind them. We research how professional decision traces in medicine, law, and engineering transfer to model capability.
How do paired modalities like voice, vision, text, and sensor data interact during training? We investigate cross-modal data composition and the capability emergence it produces.
No collection infrastructure exists for agent training data. We study tool-use traces, error recovery patterns, and multi-step task completions, where small seed datasets yield outsized capability gains.
Models trained purely on synthetic outputs collapse. We research the optimal interplay between human-sourced data and synthetic augmentation: the flywheel that drives frontier performance.
Frontier models fail dramatically on underrepresented languages, scoring as low as 35% on native-speaker benchmarks. We study how linguistic diversity and cultural context shape model behavior at scale.
Our Approach
01
We identify where models systematically fail, from 35% accuracy on underrepresented language benchmarks to missing decision traces in professional domains like debugging, diagnosis, and legal reasoning.
02
We construct targeted datasets with domain practitioners: expert reasoning traces, multi-modal paired data, and agentic task demonstrations that encode the knowledge synthetic generation cannot replicate.
03
Every intervention is evaluated against domain-specific benchmarks, not generic leaderboards. We publish our findings on data composition, human-synthetic flywheels, and capability emergence.
Teams
Data Science studies the theory of training data quality: measuring signal density, mapping distributional gaps, and understanding why models trained on expert reasoning traces outperform those trained on volume alone.
Multi-Modal Systems investigates how models learn from heterogeneous data, studying cross-modal transfer, paired data composition, and the 50-100% capability premiums that multi-modal training produces over single-modality approaches.
Agentic Intelligence researches the data infrastructure for AI agents: tool-use traces, error recovery patterns, and GUI interaction data, where 312 human demonstrations can be augmented to 27,000 training instances with 141% capability improvement.
Evaluation & Benchmarks builds domain-specific evaluation frameworks that expose failures generic benchmarks miss, measuring model performance in professional contexts across medicine, law, finance, and multilingual settings.
Research Notes
Every paper measures a specific failure pattern in a specific domain. We publish the full methodology, the dataset, and the statistical significance tests alongside every claim.
%%%%%%%%#-::::-----======-----+#%%%%%%%%%%%####### %%%%%%%%+:-----------------++#%%%%%%%%%%%%%%###### %%%%%%%+---::-----------==#%%%%%%%%%%%%%%%%%%##### %%%%%%+---:::::::------+##%%%%%%%%%%%%%%%%%%###### %%%%%#-=---=++*#%#++=-=%%%%%%%%%%%%%%%%%%%%%%%#### %%%%%%%%%%%%%%%%%#+==-#%%%%%%%%%%%%%%%%%%%%%%%%%## %%%%%%%%%%%%%%%%%*+**%%%%%%%%%%%%%%%%%%%%%%%%%%%%# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#%%%%%%%%%%%%##+=-=* #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%=*%%%%%%#####**=-:-* +*%%%%%%%%%%%%%%%%%%%%%%%##%%#--#%%%####*#******** *#%%%%%%%%%%%%%%%%%%%%%%%+*%*:-:-###%%%%%%%####### *#%%%%%%%%%%%%%%%%%%%%%%###*:=+-:..-#%%%%%%%%%%%%% #%%%%%%%%%%%%%%%%%%%%%%%**+::....:=##%%###%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%*+*-:....:=*%#**#%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%*=--:::::-=##++##%%%%%%%%% %%%%%%%%%%%%%%####%%%%%%*---:::::--+=+*##%%%%%%%%% %%%%%%%%%%*+-:.:::-=#%%%#---:------+**##%%%%%%%%%% %%%%#+-::...::::::--=*#%%#=------=**###%%%%%%%%%%% #%%+:....:-+-::::---+*##%%%=---=#%%%%%%%%%%%%%##%% #%%::::::--------=-=+++*#%#%%%#+####%%%%%#*%%%%%## %%%*:::---====--===--=+**#%%%%%%##******+-+%%%%%#% %%%%+:::--====------=++*##%%%%%%%#*****+--#%%%%%%% @%%%%*::::--------=-=++*#%%%%%%%%#**+=+=--*%%%%%%% @@@%%%#-:::::::--++++*#%%%%%%%%%%%*+=-==---*%%%%%# @@%%%%%%::::----+##%%%%%%%%%%####*+--=+=---=%%%%*= @%%%%%%#:-==-===*%%%%##%%%%%##**=---:::--===#%*=-- %%%##%%#=+++++-==*%%#*%%%%%%##*+==----===+--#*:::- *****++***==++=====+++++******+=====+====+==**==-- *#**++++**========+===+*###*#*+==+++++=+===-****** ******++++===========**********==++++++=====+++++* ***********+========+*********+===++========-::--- **+*++******+==---==+***##**##**==---===-=+++++=== ****+++****#**===---==+****+**++++=------+***+==-- ***********#*#*+++====++******++++++=++=====++---- ********#***+*************+=====++++=+++***+=+=--- ************+++++++++++++++++=+++=++++++*+=+++++=- ##****#*****++++++++*+++***+++=======++=++===-=+== ##############****+=+*+*******++++++=+++==-=++++++ #########%%%#*+=+#*=+=+#######****#**#****+++++*** ######%######**##*=-=*#########*####*##*##*#*#*##* #*#####%##%##*=-==--+*#########*#######*########## ###%###%%####*+==--:-+##*###################**####
ElevenLabs Scribe v2 leads at WER 0.277. AssemblyAI Universal-3 Pro silently collapses 5 of 9 Indic languages to romanized Latin script — invisible to WER alone. Only Scribe v2 and Sarvam Saaras v3 cover all 10 target languages.
April 16, 2026 · 12 MIN READ
Read paper
----:::::..................................:: --:::::::...........................:::...::: ::::..........................::::::::::::::: :::.........................::::::::::::::--- :::.........::::::.......::::------------==++ :::..........::::::....:::::-------==++++**#* ::...........:::::::::::----===++++*##%%@@@%@ :......:...:::::::::------=====+++*#########% :...::.:...:::::::::::::----::----===--=----= ................:::.....:::-----::::::::::::: ..............::::.....::::::-----::::::::::: .............:::............:::----:::::::::: .............................:::::......::::: ......:::...............................:.::: ....................::::.............:..::::: ..................::::::..............:=---:: .................:::::::..............-:::::. .....................:.........:===+=-:...... :...............................:::::-:.::... ::..........::................:--::::. ::.. :.......::..:::=-==.......................... . .::::::::::::::::.................::.... . ........ . ..:-==----:................ .:... .. .....-::::.............. .. .:.:.... .. .......:::--::::............... .. .............. .:::---:::::........:--=-: .. .-:----... ..:..:..:.:::. .....::::. . :. :---::==-=..::. .....:::............. ..:::::::-::::--.....:. ..-===+=-:-....:=-- .::::::::..::::........ .:::---::......:--= .:. .. .....:........::::.....:::::::.:- =-:. ...............::-:---::----------- +++:. ................:.:::::--========---== +=*=*---:. ...............:---------======-== ==+*#*#*:......::::----------==+============= ==#=*:=- .----=-===++===+++++===+++++++++= :-- ..-====+**++++++++++++============= . . ...:***#****************+++++++++=== .:.:......:*####**#######**######***+++++== ::::.......:*##################**#*#***+++++
Sycophancy is universal: authoritative framing reduces contradiction detection by 16–18 pp across all seven frontier video models. No model is immune. All are overconfident when wrong.
April 7, 2026 · 14 MIN READ
Read paper
....:---. . .:.
-=-:... .
.-:. .:. .
.. . :==:. .... ....
-. . . :==.........
.:... :=+++=-. :.
. :+=++*+=-. . .
.. ...:--=-::.. ... .
. ...:........ .....::.
..==-.. .... ........ . .:=-=::... .
..-=*#+-..... .. . .:=+*+=.. .
. ..:.-++++.::. .... ... ...:-+- ..
.. .....::=*-. ....... .. . ..=: ..
......... :: . . . ....... : ...
... .. .. .: ..-:: . .. ...:===-:.
.... . ..........:-:. ..-=--=++++=---+:
... .. ...........-=-:....::...:. .+--
.... ... .......==: :===:... ......:::
..... ...... .....:+*+- :-===:.... .:==:=+
..... .. .. . ..:-==+++-::-===-: ..-:-=+=-
............ ...:=+=+:....:-=- .:=+*=-:.
. ........... ....:+**. ........-+*+-:::.
........... .. .....++........:+=-:...::.
.... ..... .....-:.......:-:..::--::.
.. . ... .....-. ........ .:-::..:.
........... . .....:....... .:::.... .:
........... ...............:: .:-
..::......... . ................. .---
...-::... . . . .................. .:--=
....::. . ....::----::....... .::---
.................:-=====-:......... :----=
..........:----====---:.:......... :--:--
...........::.......:............... .:----
.................................... .::--
....................-=--........... ::--
.............-==+++++=:............. .::
..........:-++***+=---.............. .:
.........-==+++=-=-::-:-:.-=-:::::-:::. .
........------:........:-:---===-:::..
=:....................::-::..::...:: .. Across seven frontier video models, accuracy falls 44 points from 30-second to 5-minute clips. Error detection is catastrophic frontier-wide. The best model catches barely one in four intentional errors.
March 24, 2026 · 13 MIN READ
Read paper
.... .
..... .
...... ..
.... ..
.
. ...
.......::.
......:.::::.
....:::::::----.
....:::::::-:-----:
...:::::---------===.
...::----==-=====+++=+- .
.....:------====+++++++*++*+ .
.::--------==-==+==++++++*#= ....:--::::-=
.:-----------------====+=+***-==*##%%%%%#####
.:--------------::----==+***##%%%%%%%%%%##*#%
:--------------------====+==+#%%%%%%%%%%**#%%
::-----------------------=--=*%%%%%%%%%#++*%%
:::::---==-------=====-====++*#%%%%%%%%#+-+%%
::---:------=---=====+++=+*++*#%%####%##**#%%
:::------==---=======+*####---+***#%%%%%@%%%@
.:::::-::--=====++==+#%##*****+**=++#########
::::::::--::----:-=-####+=-=*****+*+=#######%
====+++++*=--:::--:-=+**:=+++**##***+++*+**+*
++*********+-:::::--=**=:--=++*++**+***#**--=
***###**+=-:....::--+#%=-::-===+----=**+*#+--
*##*+=:...... ...:-*#=-::-====-----==-=+***
#*=:......... ......:--:::--=#*+--------+=*+
###*+=-====-:.......:::..::---++**+==++------
####**#####%###*-:....::. ....:-====-===----=
####**##++#####*****+=+****=::....::=+===----
*+++=-::::--::-=***#%%%##%%%#%*+-:..::--====-Three of five dedicated ASR providers cannot serve low-resource African languages. Only ElevenLabs Scribe and Gemini 2.5 Pro produce usable transcripts, statistically tied at the top.
March 10, 2026 · 12 MIN READ
Read paper
==---=+::--=-:::.....::::--==++*-.:-----::--- ::-::==--=-:.:::.....::---++****:.-=-=-:----= ::::---:::::........::--=+***++-.:---::-===== ----::::-::::.::....::-+++*****=..:-:::-===++ ***+-::::-:.:.:.....:-+******+-.....:--=++++* **++++=-:.:..::....::=*****++-....:::--+=++++ +++***=-:..::::....:-++++*++=.....::::=++++++ +*+*++====--::....::=+++**+-.. ...:::-+**+*+= ++++====++-::....::-++*++=:. ...:::-=+++++++ +++++++*#+-:.....::++***+:. ...::::-======= ==++++++*=:.....:-=***+=:. .....:..:-=+=+=- +***+++*+-:....:-+**+*+-.........:::---=++=-: ++++=+=+-::....:+*++=-....:::-=-:::-+====-::: ========-::.:-:-+++=....:::::-+*=:::::::::::: =====-------=++===-:::.:::::::=*=:::::::::::: --=--------===++++-::::::::::-*+--=:::::::::: ::::::::::::--==++-:::-:::::-=+--::::::::::-- =-------::::::--+++=--:--::--+=:::::::::----- ++====--::::::::-+++=-----::==::.:::::::::--= *+==-:::::::::::-++++=-::::-=::::::-:::::--== =+**+--::::::::::-++++=-:::-:.:-:=-::::::-=== -++===+*=-:::::::-+++++----.:::-::-=-----===+ ===---:--==---:::-==+++=+=-:.:::--=--------== +===-------:::----=*+==+++:::.:::-==-====---- ====-:-::--::::::-=+++++=--:::--:----------:: =++=::::----:::::-:-=+==+-=--=-=---::::--:::: --:...:::---:-:::--===+=++=+=---==------:--:: .....=++-:-==---=--===*+==-------======--:::: .........:--=-==--------::::::::----======--: :::......---:-===-::::::::-------=++========- -::-::::.::--:--===::----=--:---+*+*++===++== ===+=-=-.:-:..:::--:--======-=*++====+===+==- =--===--:::..::.::::-==----=-====------====== +=++--::::.:::::::::-----==+=====-::-------== +****=---:.:-::-:::::--==========+===-------- =+++=+=:-::-:::::::::-==+=====+======-==----- --=-:::::::::::::.:::-===-=-=-----------===-- ====-:::::::--:::::::------------------=====- =====-::::-++=---:::---:----::::::----=====-- +===--==--=+=-::::::--::::-==-==------=+====-
Across 12 systems spanning five dedicated ASR providers, four audio-native MLLMs, and three Claude text controls, ElevenLabs Scribe beats every audio-native multimodal LLM on WER. GPT-4o Audio is the reliability outlier.
February 24, 2026 · 11 MIN READ
Read paper
Get in Touch
Whether you're a lab exploring data composition, a domain expert with unique datasets, or simply curious about our work — we'd love to hear from you.
Get in touch