JulesPerToken
Posts
☝️Amazon will help you reach the highest shelf, 👀 Meta’s new eyes and ears, 🙈 Stanford builds robot you

☝️Amazon will help you reach the highest shelf, 👀 Meta’s new eyes and ears, 🙈 Stanford builds robot you

May 12 Issue #16 - Jules Per Token AI Daily Newsletter

Jules Xu
May 12, 2025

Today’s issue is a robotics recap - what you might have missed.

☝️Can’t reach the highest shelf? Amazon’s Vulcan got you

Amazon just gave us a peek into its future workforce, and surprise: it still includes humans (barely). In a world where AI handles more warehouse grunt work, the retail giant is retraining its people to maintain, troubleshoot, and babysit the bots doing the heavy lifting.

History’s mostly proven that humans do brand new jobs that the bot market creates, but Vulcan gives us a sneak peek into the future of working class jobs amid a robot army.

Fun Fact: Amazon’s robots already help process 75% of orders.

👀 Meta’s new eyes and ears

In April, Meta just dropped two new open-source AI models, and they’re all about seeing and sensing the world better. Together, they give AI better situational awareness. One step closer to bring us closer to robotic systems that can understand and interact with the physical world like a human. Cue sci-fi robot thriller music.

Perception Language Model (PLM): a vision-language system trained to describe what someone is doing in a video

Locate 3D: a spatial reasoning model that helps robots figure out where things are, using language and camera data to pinpoint real-world objects.

Fun Fact: Locate 3D doesn’t just “see” objects—it understands phrases like “the thing behind the couch next to the weird lamp.” Maybe Meta will be able to shoo your dog away from your chewing up your new sneakers.

🙈 When your fav restaurant only offers pickup, Stanford’s robot-you has got your back

Earliest last week, Stanford researchers just unveiled TWIST—the Teleoperated Whole-Body Imitation System—a major leap in humanoid robotics. TWIST enables a robot to mimic full-body human movements in real time using just one neural network. The system works by capturing human motion via MoCap, retargeting it to a humanoid robot, and training a controller using reinforcement learning plus behavior cloning.

TL;DR: Robot-you will mimic your movements until it learns enough to go off into world on its own.

Fun Fact: Thanks to one unified neural network, the robot doesn’t need different programs to walk vs. punch—it’s like muscle metal memory.

I would love to hear from you! Just hit “Reply” if you have any questions or feedback. Or if you want to be featured in this newsletter. Or “Forward” if you want to share with a friend! - Jules

Subscribe here (and read our past newsletters) at www.julespertoken.com

Reply

or to participate.