A perfect day in Los Angeles starts with a stroll along the Venice Beach boardwalk. Then a ride on the Ferris wheel in neighboring Santa Monica. Then visit the Getty Museum, some nine miles away by car. After that, Beverly Hills, then Hollywood to see the Walk of Fame, then Griffith Park for a hike, then Chinatown for dim sum, then downtown, perhaps to catch an evening show at the Walt Disney Concert Hall.
Or at least, that’s what a chatbot thinks a “perfect day” is. This agenda was custom-made for me by Microsoft Copilot after I told it I had one day in town to explore the sights and asked it to plan accordingly. “Certainly! 🌴🌆 Here’s a jam-packed 24-hour itinerary,” Copilot responded, before rattling off an eight-part answer. What I didn’t tell Copilot is that I already live here—and know that such an itinerary is perfect only if your idea of bliss is spending most of the day traversing one of the country’s most sprawling, traffic-clogged cities, frantically popping from landmark to landmark.
I asked Copilot to make me a travel itinerary because Microsoft has trotted it out as an example of how people can use the ChatGPT-like assistant. It can supposedly help you pick a destination, compare flight prices, and settle on attractions that are “popular with tourists—or just a little more off the beaten path.” Of all the things you might ask a chatbot, AI companies love to suggest you ask for help planning upcoming travel. Open up ChatGPT and you might see this hypothetical prompt: “Plan a trip to see the best of New York in 3 days.” Google’s Gemini chatbot offers similar ones. Meta’s line of chatbot assistants on Instagram and Facebook includes “Lorena,” your own personal travel expert. And Rabbit, the company behind a new AI gadget, pulled out the travel example for a keynote video last month.
If one were to play AI-marketing bingo, “trip itinerary” would get crossed off basically every time. Over a year into the generative-AI revolution, companies so frequently suggest that people use their tools in this way that you’d think chatbots would excel at it. But they don’t.
In theory, chatbots that can instantaneously create travel plans are a marketer’s dream. The use case is easy to understand: Planning a vacation can be a real challenge for people. First, it involves toggling among flight listings, hotel availability, and ticketing websites for major attractions. Then, it requires more nuanced research, to figure out which local restaurants are actually good and which are overpriced tourist scams, or what time to set off for a big hike that won’t leave you in the woods after sunset.
Most of this travel information already lives on the internet or in books, meaning that it has likely already been incorporated into a chatbot’s training data. “There are probably thousands of places on webpages that describe a trip to Boston,” Kathleen Creel, a professor of philosophy and computer science at Northeastern University, told me. “There’s travel sites. There’s tour companies. There’s people on Reddit talking about their trip to Boston. There’s people on Reddit talking about living in Boston and what they like.” An AI tool trained on all of this data can process it to spit out a personalized itinerary.
But in practice, AI travel plans leave something to be desired. When I told ChatGPT that I was a “huge foodie” and asked it to adjust an L.A. itinerary accordingly, it suggested I go to a Michelin-starred restaurant for dinner. It didn’t say which one. It just told me that L.A. had some and that, if I liked food, I should go to one. That’s sort of like telling a person who likes music that maybe they’d be into a Grammy-winning artist and leaving it at that. ChatGPT suggested I wrap up my day by getting a “sweet treat” at Milk Bar, a chain of high-end bakeries from the New York pastry chef Christina Tosi.
Perhaps I’m just picky, but a team of researchers at Fudan University in Shanghai, Ohio State University, Penn State, and Meta came to a similar conclusion. They tested chatbots on 1,000 sample queries, such as “Please create a travel itinerary for a solo traveler departing from Jacksonville and heading to Los Angeles for a period of 3 days, from March 25th to March 27th, 2022. The budget for this trip is now set at $2,400.” They then evaluated whether the chatbots were able to provide answers that met all the criteria in the prompt. The chatbots pretty much failed across the board. Of the four tested, OpenAI’s GPT-4 model did the best, but even it successfully answered only six queries out of 1,000, or 0.6 percent. (The research has not yet been peer reviewed.)
The chatbots failed for a variety of different factors: They made reasoning errors, and sometimes made stuff up. “I can’t emphasize this enough: These kinds of tools are meant to supplement, not supplant, our decision-making process,” Brigitte Tousignant, the communications lead for the AI company Hugging Face, told me. She used her company’s chatbot to plan a week-long trip to Montreal and was “pleasantly surprised” with how specific the results were. Then she noticed that the bot suggested she attend three comedy and music festivals that each take place during different times of the year.
With these drawbacks in mind, I asked five AI companies—Microsoft, Google, OpenAI, Meta, and Rabbit—why they mention using these tools for travel planning. Only Microsoft and Google commented for this story. “The early value proposition of AI in travel planning is the significant time savings and information gathering it offers,” a Microsoft spokesperson told me in an email statement. “We’ve seen people use it with great results.” Aarush Selvan, a senior product manager for Gemini experiences at Google, told me that people had used the company’s chatbot to plan travel or get trip inspiration right from its initial launch.
Someday, AI may actually be able to plan you a remarkable trip—particularly if these bots become agents who can actually take action, like booking flights on your behalf. Google isn’t quite there, but it has integrated Google Flights and Google Maps into its Gemini chatbot, which pulls up flight options when you ask for a travel plan. “We know we’re really just scratching the surface here,” Selvan told me.
Until then, each nudge from an AI company to use its tools to plan a trip serves as a reminder of the chatbot limbo we’re in. It’s been more than a year since ChatGPT was released, and the initial hype has died down. These tools are impressive, and clearly have a lot of potential. But exactly what these tools are best for right now is still murky. “A lot of what are emerging as truly useful use cases of AI are not these sort of sexy consumer-facing things,” Creel said. “They’re things like machine learning for science or the fact that large language models have these surprising applications in drug discovery or protein design or things like that.” These applications may change our health systems, and our world. But unfortunately they won’t make it any easier to sip cocktails by the beach.