The Bitter Lesson is a surprising realisation. For years, researchers crafted increasingly elaborate, hand-tuned algorithms to solve tasks such as image and speech recognition. These approaches were not as competitive as more recent ones that relied on scale - massive amounts of data and/or compute. Programmers used to build chess engines by encoding deep domain knowledge. Yet, in 2017, Google’s AlphaZero outperformed previous, carefully engineered systems using a machine learning algorithm called Deep Reinforcement Learning (and a ton of computation and simulation time!).
A Chess Engine is fundamentally a search problem. We need to find a move that looks the best within an acceptable amount of time. A better model, therefore, finds a better move. Past iterations used brute-force Tree Search to explore millions of positions per second, and relied on a hand-crafted human evaluation function to pick the best move. AlphaZero does things differently: while it still does a Tree Search, its search is guided by a neural network, which learns its own evaluation policy by playing against itself. This lets the network discover strategies through scale, and not human insight.
The Bitter Lesson is a lesson for a certain class of problems. Success must be clearly defined in metrics and we must be able to gather data or simulate the problem. Designing a controller can fall squarely within this definition. A controller decides how to drive a system so it behaves the way we want. It observes the system’s current state (using sensors), computes actions (thrust, voltages) and sends commands to effectors (motors) to steer the system toward a target or maintain stability.
The most ubiquitous form of control is Proportional-Integral-Derivative (PID) Control. It’s used everywhere, from your coffee machine’s temperature setting to keeping aircrafts flying safely. PID uses feedback to make a variable match a desired set-point by adjusting the control signal using three PID terms. The first term (Proportional) pushes the output in proportion to the current error. Increasing it causes a faster response, but doing that too much causes overshoots and instability. The Integral term accumulates small, persistent errors that P cannot remove. Lastly, the Derivative term reacts to the rate of error change and damps the response. It has a predictive function and prevents overshoot. But too much D makes the controller sensitive to noise. In sum, a PID controller is highly sensitive to its (human-set) gains.
The Bitter Lesson is relevant for a subset of control problems. For years, humans have hand-designed physics-based models and gains for systems like process controllers in factories through extensive tuning in simulation and reality. These are well-understood systems that are easy to model. Problems in robotics, such as dextrous manipulation and locomotion, have much more complicated dynamics. Friction, slippage and contact produce errors which may cause our model to blow up. Robotic systems also use a number of actuators - your (biological) hand has 27 joints - and is therefore a high-dimensional problem. As such, researchers and engineers have been leveraging the data-driven route to solve problems in this space. How have they fared?
What is a high-dimensional problem? It is one where we have to make decisions across a very large number of independent variables - Degrees of Freedom, as per robotics parlance. A single thermostat only has a few dimensions of control (adjusting heater output using feedback from current temperature). A humanoid hand, with force sensors on the fingertip, can have more than 50 dimensions. This can be combinatorially explosive. Errors from one dimension can also propagate easily, making for brittle behaviour.
Dextrous Dactyl
Many of us have played with a Rubik’s Cube. A passionate subset may have even solved one. A smaller subset of that may have solved many under impressive timings using finger tricks. As we can infer, this task requires high dexterity and good in-hand manipulation skills. For years, engineers have developed elaborate models of contact, friction and force-torque observers with the ambition of making a robot hand that can rotate and re-grasp objects reliably. But this would usually require extensive re-tuning and sometimes fail due to sensor noise. In 2018, OpenAI released a robot hand, Dactyl, that could reliably manipulate a Rubik’s Cube in the real-world without per-object modelling. It used the same machine learning paradigm as AlphaZero - Deep Reinforcement Learning - where the model would learn an optimal policy from trial and error. But what enabled this system to generalise to the noisy reals was Domain Randomisation (DR), an algorithm which provides “a variety of experiences rather than maximising realism”. DR varies object properties and adds delays and noise in simulation. From this training process, the team observed emergent behaviour such as the Tip Pinch grasp, which uses the thumb and little finger for precision grasps (humans, on the other hand (pun intended!), use their middle finger or index finger; the team attributes this to the extra degree of freedom in the robotic hand’s little finger).
You may have come across some outrageous videos of robots doing parkour. The robot in that video is Boston Dynamics’ Atlas. It uses Model Predictive Control and pushes the control paradigm to the limit (more specifically, it uses a hierarchy of controllers, with MPC being a higher-level controller and several PID lower-level controllers). They developed a full-joint, full-link inertia model that enabled Atlas to carry and throw heavy objects, whilst re-planning every joint trajectory. However, every increase in model fidelity comes with a steep increase in computational cost and brittleness when reality deviates from simulation. Recognising the “curse of the model”, there has been a push towards Model-Free Reinforcement Learning, where a single neural policy can be trained using a massive, randomised simulation (like we saw in Dactyl).
Model Predictive Control is an advanced control strategy that solves an optimisation problem at every time step. Unlike PID, MPC can predict the future trajectory over a finite horizon and choose a control move that optimises a performance criterion that respects constraints. Say we wanted a drone to reach a target point whilst avoiding obstacles. We could derive a cost function that drives us towards the goal and penalises bad behaviour (such as extreme control efforts). We could also have constraints on thrust and avoiding obstacles. The cost function would then be solved using a commercially-available Sequential Quadratic Programming solver at every time step.
In 2025, Boston Dynamics have begun to integrate Reinforcement Learning into Atlas’ control stack, with the goal of enabling more dynamic and generalisable manipulation from simulation. But as we saw, the impact of Reinforcement Learning on robotic control was seen as early as 2019. With the proliferation of open-source physics engines, Reinforcement Learning libraries and easy access to cloud GPUs, startups can build end-to-end robotic agents in months. Thus, new startups such as Figure.ai and Agility Robotics have emerged, promising full-body autonomy using model-free approaches.
This intersects with another trend: Large Language Models (LLMs). LLMs are excellent at reasoning tasks, such as breaking down broad objectives (“make me a cup of coffee”) into sub-tasks (“find the cup”, “start the coffee machine” → “pick up the cup”, “place the cup under the dispenser”, “press the button”). These sub-goals would use a motor primitive trained using reinforcement-learning. LLMs are also able to revise the plan in real-time if the environment changes (“no cups in the drying rack”) and pass new instructions downstream. This approach provides a very attractive way to tackle the Goal Planning problem in robotics.
Multimodal LLMs have also shown great premise in multimodal perception. Robots must understand the world they inhabit. LLMs have been able to fuse information across modalities to build rich, semantic scene representations (for example, recognising that there are no cups in the drying rack). This has been extended to Vision-Language-Action (VLA) models, which integrate vision, language and action into a single foundation model. VLAs are trained on large-scale robot interaction data, and allow policies to generalise across novel and messy environments by mapping multimodal inputs to action primitives. Companies have embedded VLAs into simulation loops to achieve open-world generalisation, such as Physical Intelligence’s Pi 0.5 model.
Coupled with existing localisation and mapping algorithms, we now have a unique inflection point where robots can maintain a global 3D map to know its global position, use a single foundation model for perception and control and perform fine-grained manipulation.
In sum, the Bitter Lesson has meant:
- Dextrous, high Degree-of-Freedom manipulation for fine-grained tasks
- Simulation-to-Real Generalisation with simulators and DR algorithms
- Robust policies in days/weeks instead of extensive hand-tuning and control engineering
- Logically-reasoned goal decomposition and planning with LLMs
- Multimodal recognition and perception
The Bitter Lesson in Robotics, for Singapore
Singapore has a tight labour market, with the number of job vacancies exceeding jobseekers by 1.64. Much of these vacancies are in technology and healthcare. It is also set to be a ‘super-aged’ society by 2026, exacerbating the problem. At the same time, the state aspires to be an innovation economy, with its industrial policy focused on capturing higher-value added sectors such as Pharmaceutical R&D, Financial Services and Product Development. There is ample room for automation to play a key role in economic growth, which forms a key objective in its Smart Nation vision. It already ranks second globally in robot density and aspires to develop local companies and home-grown IP to address industry requirements.
Singapore’s foundations are well-positioned for this goal. On a strategic front, robotics initiatives have been bundled under the multi-agency National Robotics Programme. It has several regulatory sandboxes to support product development. For Autonomous Vehicles (AVs), the Land Transport Authority has developed the CETRAN site at Nanyang Technological University to build up technical capabilities in AV testing and certification. Tuas Port functions as a sandbox for port automation. Punggol Digital District (PDD) functions as Singapore’s first digital testbed for urban infrastructure. It has an innovative Open Digital Platform which uses a stream-first data exchange protocol, allowing systems (e.g. traffic lights) and service providers to interface in real-time. Several Statuary Boards and Government-Linked Companies (GLCs) have also established in-house robotics engineering teams. This includes Changi Airport Group and ST Engineering.
Its R&D ecosystem, comprising of A*Star (a national research organisation focused on applied R&D for industry) and University-based labs, has also produced several robotics-related publications in top machine learning conferences such as ICRA and ICLR. There have been a few spinoffs, such as MooVita, which develops autonomous buses. Organic development varies across the spectrum of hardware and software. From the recent RoboSG! 2025, notable companies in this space include LionsBot, KABAM Robotics and dConstruct Robotics. These companies perform a mix of organic software development and hardware integration. They only source powertrains and sensors from OEMs, and not the robot itself. Notably, dConstruct (which is more focused on navigation) even uses 3D Gaussian Splatting in its perception stack for SLAM, showing that there has been an effort in using more recent approaches. These companies have been focused on going global, with LionsBot distributing its cleaning robots to more than 30 countries and KABAM deploying its security robots in the US and Australia. dConstruct is in the midst of their global expansion strategy, appearing at GITEX Asia in Dubai.
In sum, Singapore has made significant progress in leveraging robotics for tasks such as inspections and security. Some companies have also worked to develop commercial solutions for perception. There have also been system integrators, using OEM robots to perform these tasks. However, there is a gap in the developments within the East/West Coast and what we have observed above. The Bitter Lesson has meant that it is possible to now perform fine-grained manipulation (to a degree), locomotion and coordinate control and perception using VLAs. One emerging startup working in this area is Menlo Research, which performs its R&D at Sim Lim Square. Its goal is to become an Open R&D lab which ships its products (which it sees falling under the buckets of human augmentation and robotics) . We endorse this vision, given that knowledge spillover will be key to fostering a competitive robotics ecosystem for Singapore.
Menlo monetises by shipping experiments and offering services around AI model deployment and customisation. Menlo emphasises a moat in their open-source community, positioning their shared research as both marketing and product. This mirrors the efforts of YC-backed K-Scale Labs, which is shipping an open humanoid robot design with its simulation pipeline.
One immediate call to action is for entities and companies in Singapore to close the adoption gap. Given that robotics is multi-disciplinary, requiring a mix of mechanical, electrical and computing expertise, this can be challenging. What we note is that the recent developments mark a shift towards computing expertise, with the deployment of massive RL simulations in the cloud and deploying these models towards the edge to evaluate their real performance. Thus, we call for three lines of action:
Institutions (e.g. Polytechnics and Universities) should adopt the use of open-source simulations such as MuJoCo within their curriculum. Students should get the experience of training a RL model in simulation and performing a Sim2Real transfer as part of experimental procedure. This may also include fine-tuning VLAs. There should be no segregation between Control in the Engineering faculties and RL in the Computing faculties; engineers and computer scientists should have equal access to the knowledge and the infrastructure needed for training (Cloud GPUs), or even work together in this process. Students should also get greater access to research experience, given that much activity in this area occurs in the research labs. A few of them will continue as researchers and spin-off their research into companies such as Menlo.
GLCs and Statuary Boards should be cognisant of these developments, which will require greater acceptance that R&D may not be restricted to the institutional labs and can benefit from taking place in industry (which have data and scale). These entities provide a fertile environment for automation. Many of these efforts may begin as a proof-of-concept. We encourage further development of ‘Open Innovation Challenges’, which promote public-private collaboration and drives the development of Singapore’s robotic ecosystem. Notably, one company working with HTX was accepted to Y-Combinator. Given that these challenges involve real problems, it is likely that this pattern will continue.
Further work on the anticipated gaps. While deep learning has shown to exceed the capabilities of control paradigms when it comes to generalisation, they lack the interpretability and respect for constraints that is provided out-of-the-box by techniques such as MPC. This represents the risks. There is ongoing work in Safe RL and Explainability for LLMs, which may close the gap. This could represent the next bound in robotic control and planning. Singapore has done notable work in increasing the trustworthiness of AI, and fostering global collaboration in that area. We believe this should be extended to physical intelligence as well.
Conclusion
Autonomy has made the unimaginable possible. Xiaomi now operates a Dark Factory, which is able to produce 60 smartphones a minute 24/7. Further leveraging automation will be key to increasing the productivity and competitiveness of Singapore’s existing industrials. It will also represent a generation of new software and hardware products to compete on the export market. For this to happen, more have to be aware of The Bitter Lesson. It is one lesson that will stand the test of time.