What does it take for a computer to learn the rules of RNA base pairing? People are training large language models for RNA structure prediction. Some of these models have hundreds of millions of parameters. On exciting early result has been that these models learn the rules of Watson-Crick-Franklin base pairing directly from data. A research group at Harvard decided to see what the smallest possible model was that could achieve this result. They trained a tiny probabilistic model with only 21 parameters using gradient descent. With as few as 50 RNA sequences—with no corresponding structures—the rules of base pairing would pop out after only a few training epochs. So the answer to their original question was that it takes "a lot less than you may think" to learn this type of model. I don't think this means that the large-scaling training efforts are necessarily dumb or misguided. But this result suggests theres a lot of efficiency and performance that can still be eked out of architecture innovation. There's a lot of underlying structure to the language of biology.
3.45K