<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Tinkering Tokens]]></title><description><![CDATA[next token prediction, experiments, half-baked potatoes and thoughts]]></description><link>https://tinkeringtokens.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!kk0i!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbcb657f-03a9-49fc-94fb-a72a63b67290_1280x1280.png</url><title>Tinkering Tokens</title><link>https://tinkeringtokens.substack.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 12 Jun 2026 18:09:54 GMT</lastBuildDate><atom:link href="https://tinkeringtokens.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Usman]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[tinkeringtokens@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[tinkeringtokens@substack.com]]></itunes:email><itunes:name><![CDATA[Usman]]></itunes:name></itunes:owner><itunes:author><![CDATA[Usman]]></itunes:author><googleplay:owner><![CDATA[tinkeringtokens@substack.com]]></googleplay:owner><googleplay:email><![CDATA[tinkeringtokens@substack.com]]></googleplay:email><googleplay:author><![CDATA[Usman]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Lost in translation]]></title><description><![CDATA[Why machine learning is producing better molecules but not better drugs]]></description><link>https://tinkeringtokens.substack.com/p/lost-in-translation</link><guid isPermaLink="false">https://tinkeringtokens.substack.com/p/lost-in-translation</guid><dc:creator><![CDATA[Usman]]></dc:creator><pubDate>Fri, 10 Apr 2026 06:10:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kQp2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As an outsider to AI for biology, I was excited about the potential of what AI could do for drug discovery. A good part of this last year for me was spent realising why biology matters and why it might be the most important thing for me to work on. Abhishaike (The Owl Man) has a much better piece on <a href="https://www.owlposting.com/p/ask-not-why-would-you-work-in-biology">why anyone should be working on biology</a>. As an ML researcher coming to this field, and chronically online on Twitter, I had somewhat higher expectations for where things stand. The claims from insiders and new startups sometimes left an impression that we will get much better drugs, much faster, and for much cheaper. Longer lives for all and maybe even freedom from aging, from disease, from biological fragility altogether, and no pain of losing a loved one. After digging through the literature and understanding the computational landscape of AI in biology, I came out with a lot of nuance. Let&#8217;s start with the story of this drug called DSP-1181 (one can make a case for how biologists suck at naming more than software developers).</p><p>In January 2020, Exscientia and Sumitomo Dainippon Pharma announced that a molecule called DSP-1181 had entered clinical trials. After a drug is tested computationally and in animals, it needs to go through different phases of testing in humans to make sure it is safe and effective. The drug discovery pipeline for DSP-1181 made use of machine learning to design a serotonin receptor agonist for obsessive compulsive disorder, and the time it took to get this done was the main headline: twelve months from initial screening to a drug candidate ready for human testing, versus an industry average of 4-6 years. The use of machine learning here was to narrow down the search space for possible molecules and make sure that DSP-1181 bound its target with high potency and selectivity. In the preclinical stages, the models had performed well for their intended tasks. DSP-1181 had the solubility, stability, and safety profile that typically takes years of trial and error to achieve.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://tinkeringtokens.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Tinkering Tokens! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>However, the program was later discontinued when the study had not met its expected criteria. In other words, in initial human testing, the drug was safe; it had reached the bloodstream, but it never translated into treating the disease properly.</p><p>The story of DSP-1181, like other AI discovered drugs, is a good way to tether ourselves to reality as it forces us to look at what success actually means when we leave the computational realm, and why we do not have zero-shot drug discovery and probably will not for a very long time.</p><p><strong>Eroom&#8217;s Law</strong><br><br>Drug discovery is a scientifically and economically complex process. Insiders casually talk about how it takes more than $2.5 billion and timelines of ten or more years to bring a new drug to market. The process of finding a cure burns through cash, and most attempts fail. In 2012, Jack Scannell and his colleagues called this pattern Eroom&#8217;s law, Moore&#8217;s law spelled backwards. They argued that the number of new drugs approved per billion dollars of R&amp;D has halved roughly every nine years since 1950.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kQp2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kQp2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kQp2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kQp2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kQp2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kQp2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg" width="1024" height="797" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:797,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kQp2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kQp2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kQp2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kQp2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6032449-7710-46c7-a159-70cbd64ecc8e_1024x797.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Despite all the improvements in technology over the past few decades, including high-throughput screening, combinatorial chemistry, genomics, proteomics, and computational modeling, none of them has translated into overall productivity. Researchers keep getting better at the early stages of the discovery, identifying targets, designing molecules, testing compounds in cells and animals. But the late-stage failure rate has not budged, and most drugs that enter clinical trials still fail, with most of them failing around Phase II or Phase III, after the bulk of the money has already been spent. <a href="https://www.writingruxandrabio.com/p/a-manifesto-for-reviving-biopharma">Ruxandra Teslo</a> has written some very good primers on this topic recently.</p><p>This is a very strange thing to wrap your head around if you are like me, coming from the world of software. We are used to tools getting better and productivity going up, <strong>but in drug discovery, the tools have gotten better, and productivity and output are still going down</strong>.</p><p><strong>AI In Biology</strong></p><p>Statistical models in biology go back decades. The earliest ones relied on simple linear regression, trying to correlate genetic variations with observable traits or disease risks, things like how fast you metabolize a drug or whether it will have toxic effects in the body. As computational power increased and machine learning techniques advanced, the models grew more sophisticated, but the basic approach stayed the same for a long time.</p><p>In the early 2010s, it was mostly Quantitative Structure-Activity Relationship modeling (QSAR) and property prediction, which relates a quantitative measure of chemical structure to a physical or biological activity. Models were trained to predict properties like solubility, binding affinity, and permeability from a molecule&#8217;s chemical structure. The models were narrow, and each property of a drug got its own model, trained on its own dataset where the input was the chemical structure combined with other features scientists would care about, whilst the output was a prediction for one very specific property.</p><p>The progression from here rhymes with other fields where deep learning started gaining traction. Around 2016, deep learning&#8217;s success in vision and language processing started translating into drug discovery, and new architectures emerged that were better suited to chemical and biological data. <a href="https://medium.com/data-science/towards-geometric-deep-learning-iv-chemical-precursors-of-gnns-11273d74125">Graph neural networks (GNNs)</a> were one key development during this time period. A chemical already has a natural graph structure where the atoms are nodes, and the bonds connecting them are edges. Prior approaches had to convert this structure into a flat list of features, which meant losing information about how atoms were connected to each other. GNNs could take the molecular graph as input directly, reasoning about structure in a way that matched how chemists actually think about molecules.</p><p>Being able to represent proteins and genomes as sequential structures led to an increasing use of transformers (the architecture powering LLMs) in biology. Proteins, which are a chain of amino acids, could be represented as a string of letters where a set of three letters can represent each individual amino acid, and for the genomes, the letters A, T, C, and G written in a specific sequence can encode the information about our genes.</p><p>One of the key examples of this paradigm is when DeepMind released AlphaFold 2. This was a significant step for AI-driven drug discovery, as most drugs work by binding to specific proteins in the body, and to design a drug that binds well, you need to know what the protein looks like in 3D. The amino acid sequence alone doesn&#8217;t tell you that, and experimentally determining a protein&#8217;s structure used to take months or years. Protein folding models like AlphaFold allowed us to reduce the experimentation timelines by producing the protein folds in minutes or hours from just the sequence.</p><p>In the last few years, with the success of LLMs, foundation models have become popular in computational biology. These models are defined by scale and self-supervision. They train on far more data with much bigger architectures, hundreds of millions of protein sequences, tens of millions of chemical structures, and they learn general patterns from unlabeled data <em>before </em>being fine-tuned for specific tasks. We now have protein language models like <a href="https://huggingface.co/docs/transformers/en/model_doc/esm">ESM</a> for structure prediction, chemical models for molecular generation, and <a href="https://www.nature.com/articles/s41586-023-06415-8">diffusion models</a> that can design new proteins with specified properties.</p><p>If you zoom back out and look at the whole timeline, there is a funnel effect to be observed. We started with hyper-specific objectives and very small datasets. Over time, two things happened at once, the scope of problems we could attack computationally kept expanding, and the architecture moved from narrow task-specific models to foundation models that handle multiple tasks at once. The problem space expanded because ML proved useful and the model architecture broadened because we realized pretraining on everything helps with specific tasks.</p><p><strong>The Generalisation vs Translational Gap</strong></p><p>If our models are getting better and we are collecting more data over time, then why can&#8217;t we turn around Eroom&#8217;s law? Surely, if models like AlphaFold are reducing the timelines of protein folding from months to days or minutes, then we should be getting closer to this promised future world where all the diseases will be cured by AI?</p><p>The reality, however, has a surprising amount of detail to it, and getting a drug to work in humans is not as simple as just asking a model to generate the right compound with the properties and characteristics we want. There are a few gaps that indicate we aren&#8217;t zero-shotting drugs anytime soon and the first kind is the generalization gap, which is the standard failure mode anyone working in ML must know of. Generalization gap is when you train a model, and it does well in the training, but the results don&#8217;t generalize to the unseen situations that the model hasn&#8217;t encountered in its training dataset. While models like AlphaFold have made it much easier than before, the models still have a lot of work to do to be able to generalize well. Claus Wilke has written some well thought out things about this <a href="https://blog.genesmindsmachines.com/p/no-alphafold-has-not-completely-solved">here</a>. To sum it up, biology is very difficult, and there are way too many edge cases for us to still cover.</p><p>But building models for drug discovery introduces another gap, which I call the translation gap. The translation gap is arguably harder than the generalisation gap. This is when your model&#8217;s predictions might generalize well within the computational domain, but still fail when you actually test the compounds in a lab or in humans. A common failure mode here is when there is no practical way for a chemist to build the compounds generated by the models i.e, the compounds can&#8217;t be synthesised in the real world. Another most important failure mode is what we saw earlier with Clinical trials and Eroom&#8217;s law, where the compounds generated in the labs don&#8217;t work in humans.</p><p>Why is the translation gap uniquely difficult in biology? One of the issues is incomplete biological representation. In silico (computational) methods can&#8217;t capture full biological complexity. They&#8217;re typically used alongside in vitro ( test tube experiments) data, both to create the model and to test it, but the complexity of diseases in a human with various molecular pathways makes it very difficult for computational models to replicate the disease.</p><p>So unlike AI in other domains, if your model generalizes, you&#8217;re mostly done. In drug discovery, even a perfectly generalizing model might fail in the lab because biology has context-dependent behavior that&#8217;s hard to capture in the training data.</p><p>This matters because it means the standard ML playbook (bigger models, more data, better benchmarks) doesn&#8217;t automatically translate to better drugs. You can have SOTA performance on every benchmark and still not move the needle on actual drug discovery.</p><p><strong>Where are we?</strong></p><p>The pattern across the industry is clear. AI has widened the top of the drug development pipeline and sped up the early stages. Generative models can now propose structures that satisfy multiple constraints at once. Property prediction, docking, ADMET models, and toxicity flags are all meaningfully better than they were ten years ago and can go through millions of candidates very quickly.</p><p>We can see the impact in the real world (albeit with a smaller sample size), where AI-discovered drugs have a much higher <a href="https://pubmed.ncbi.nlm.nih.gov/38692505/">success rate</a> in the first phase of the clinical trials; however, the late stages, which are also the expensive stages, remain as brutal as they have always been. Most of the money in drug development goes into clinical trials. Most of the failures happen in Phase II and Phase III, after all the early computational and experimental work is already done and the models we built have so far not made a huge dent in improving the success rate here.</p><p>If you widen the top of the pipeline by a factor of ten but do not change the success rate in the late stages, you have not actually changed the economics of drug development. You have just given yourself more candidates to fail expensively.</p><p>The honest reading from an outsider&#8217;s perspective of where we are is that AI has helped us make great advances for the part of the problem, the part where you need molecules to be good molecules, to have the right properties, to be safe and stable. The part that hasn&#8217;t been solved is the part where good molecules need to become good medicines. That gap, from computational prediction to clinical reality, is still as wide as it ever was.</p><p>In <a href="https://www.chemistryworld.com/opinion/robots-queuing-up-to-fail/4020705.article">Derek Lowe&#8217;s</a> words: &#8220;Compounds found (wholly or partly) by such AI methods are still going to be subject to the same white-knuckle dice-rolling as all the others when they get into human trials, because we have (as yet) no computational tools that really help us predict whether we have picked the right target, the right disease, the right biochemical pathway, or the right compound to affect it without doing anything unexpected along the way.&#8221;</p><p>Eroom&#8217;s law is still intact, and the curve hasn&#8217;t bent. Bending it is going to require something more than better molecular optimization, something closer to actually understanding disease at a depth that our current models don&#8217;t come close to.</p><p>The gap remains.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://tinkeringtokens.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Tinkering Tokens! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Coming soon]]></title><description><![CDATA[This is Tinkering Tokens.]]></description><link>https://tinkeringtokens.substack.com/p/coming-soon</link><guid isPermaLink="false">https://tinkeringtokens.substack.com/p/coming-soon</guid><dc:creator><![CDATA[Usman]]></dc:creator><pubDate>Sun, 01 Mar 2026 04:31:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kk0i!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbcb657f-03a9-49fc-94fb-a72a63b67290_1280x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is Tinkering Tokens.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://tinkeringtokens.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://tinkeringtokens.substack.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>