A few simple steps to take Copilot from Overpromised to Game-Changer

A few simple steps to take Copilot from Overpromised to Game-Changer

I recently had the pleasure of attending Collabdays Hamburg 2024, a community event where MVPs shared their insights on working with Microsoft 365, Power Platform, and, of course, Copilot. As someone always eager to explore the latest tools, I jumped at the opportunity to participate in several sessions focused on Microsoft Copilot. But here’s the kicker: almost without exception, the instructors—most of whom were Microsoft MVPs—struggled to get Copilot to do what they wanted. What’s worse, when Copilot generated a solution somewhat similar to the desired outcome, suggesting a modification in the prompt box could completely change everything, forcing the user to start from scratch. Yes, we saw the potential of what Copilot could do—but we certainly didn’t feel that Copilot was enterprise-ready.

Just as I was about to write Copilot off and attend sessions on other topics, I joined another demo on Copilot, this time by Daniel Laskewitz, a Senior Cloud Advocate at Microsoft, and what I saw blew me away. Daniel didn’t just make Copilot work; he made it shine. He set up a Copilot that could pull accurate information from two specific websites and deliver travel tips exactly the way he wanted. For the first time all day, I saw a Copilot that I could imagine putting on a client-facing website—something that none of the other presenters, no matter how skilled or reputable, had managed to pull off.

This left me with a burning question: Why was Daniel’s Copilot so much more effective? As I dug a little deeper, I realized that the answer was surprisingly simple: all we needed to do was recognize the current limits of Copilot and break our requests down into bite-sized pieces that it could work with. Sounds simple, right? Yet, in every single Copilot presentation, we were told to simply write what we wanted Copilot to do in a prompt box. When that didn’t work, we reacted in the most human way possible and concluded that Copilot wasn’t ready.

Sure, Copilot is still growing, and we are just scratching the surface of what it could do. But before we can fully leverage Copilot, we need to be honest with ourselves, recognize its strengths and limitations, and figure out how to work with the current boundaries. I bet that all the participants at Collabdays would be completely fine with working within Copilot’s current limits and being trained on how to get the best results within these limitations, rather than being oversold on all the amazing things that Copilot should be able to do but can’t reliably achieve just yet.

The “secret method” to making copilot work

So what exactly did Daniel do differently that worked? The answer is surprisingly straightforward but powerful: he made full use of the existing features in Copilot Studio, particularly the topics and actions, and use them to precisely define how copilots interact with users on different topics and in some cases, what tasks the copilots should carry out in different scenarios. For more information on topics and actions in the copilot studio, please see the official Microsoft documentation.

To help you navigate this process, I have created the following flowchart that roughly explains the workflow to creating a reliable copilot:

Here’s how the process works:

  1. Define the Goal of the Interaction: Start by clearly defining what you want Copilot to achieve. This might be something like providing travel tips or retrieving specific data.
  2. Does Copilot Recognize the User’s Intent? If Copilot automatically understands the user’s request, it will ask questions to gather more information and create relevant topics. If not, you’ll need to manually add the necessary topics.
  3. Are There Any Missing Topics? Review the topics that Copilot has created. If any are missing or not aligned with your goal, you’ll need to add them manually.
  4. Define the Actions Copilot Should Take: Once all topics are in place, specify the exact actions Copilot should perform in response to user inputs within each topic.
  5. Review the Logical Steps in Each Topic: Go through each topic and action to ensure they follow a logical sequence that leads to the desired outcome.
  6. Perform Final Test and Refine: Test the entire setup in various scenarios to see how Copilot performs. Refine the topics and actions as necessary to improve reliability.
  7. Publish and Monitor: Once you’re satisfied with Copilot’s performance, publish the setup and monitor its interactions to ensure it continues to meet user needs effectively.

Another thing that Daniel did was identify interactions where accurate information is necessary, and turn down the “temperature” of the copilot so that it responds with the information available to it, and nothing else.

Copilot Studio is designed to automate most of the process of creating a Copilot, but at the moment, it doesn’t always perfectly recognize user intent or generate the necessary topics and actions on its own. This means you may need to intervene and ensure that everything is set up correctly. Once it is properly set up, though, it is able to reliable follow instructions, find and access the needed data sources, and follow instructions.

So what is copilot missing?

Microsoft Copilot, powered by GPT models, is undoubtedly a revolutionary tool, but it’s not without its shortcomings. Researchers find that there are several inherent limitations of GPT models, which affect its overall reliability:

  1. Logical reasoning: GPT models are built on vast machine learning models designed to predict and generate text based on massive datasets. However, these models often struggle with logical consistency and reasoning. For instance, in scenarios where Copilot is expected to draw inferences or navigate through multi-step reasoning, the outputs can sometimes be incoherent or logically flawed
  2. Hallucination : Another critical issue that Copilot inherits from GPT models is hallucination—where the AI generates information that is factually incorrect or completely fabricated. This is due to the fact that GPT models are trained on vast amount of unstructured data, but it inherently lacks understanding of the logic behind the data. OpenAI, the creator of the GPT models, have introduced new training models so that GPT4 models are able to “learn” from and reduce the errors it makes. But at the time of writing this text, GPT3.5 is still the standard model behind Copilot, and mistakes still happen even with GPT4o

In other words, if the task is for copilot to sort through large amount of data, find a relevant text, and respond to a user, it would be able to do so reliably and eloquently. However, if the user tries to give it a task where it needs to critically think of how to perform a task, providing recommendations, or making decisions based on user inputs, the reliability of Copilot can be compromised by these inherent weaknesses.

Expectancy value misalignment

During my time at Collabdays Hamburg 2024, I observed a fascinating phenomenon. The instructors at the event were all Microsoft MVPs—seasoned IT professionals who are likely well-acquainted with the limitations of GPT models, such as GPT-3.5. Despite their deep understanding of technology, these experts consistently attempted to use the Copilot prompt box to achieve complex tasks, rather than utilizing the more effective built-in features in Copilot Studio to fine-tune and define Copilot’s parameters. This approach resulted in repeated failures to achieve the desired outcomes during the sessions.

To delve deeper into this issue, I conducted interviews with several of the instructors. Many of them admitted that when Copilot is provided with specific instructions and structured data, it performs tasks reliably and accurately. However, this structured approach was not demonstrated in the sessions.

Why is that?

My personal observation, after attending several workshops on Copilot, is that Microsoft’s communication about Copilot’s capabilities may have contributed to a misalignment between Copilot’s strengths and the expectations of its users. Copilot has consistently been marketed as a smart AI capable of performing complex tasks with little human assistance. This has led users to assign it tasks that require complex logical reasoning—something it still struggles with.

This situation underscores the importance of aligning user expectations with the actual capabilities of AI tools. When expectations are set too high without sufficient grounding in the reality of the technology’s limitations, even the most knowledgeable users can fall into the trap of expecting too much, leading to frustration and a diminished perception of the tool’s value.

Honesty is truly the best policy

Microsoft Copilot is undeniably a game changer. Imagine scrolling through hundreds of pages of dense policy documents, trying to find the correct instructions, and then quoting precisely the right passage to draft an appropriate email. This process, which would take a human hours, can be done by Copilot within seconds. The ability to sift through vast amounts of information, distill the essence, and present it in a usable form is a remarkable feat—one that can significantly boost efficiency in any professional setting.

However, there’s a critical caveat that users must acknowledge: Copilot, despite its impressive capabilities, still requires human guidance to be truly effective. The technology is powerful, but it’s not infallible. If users expect Copilot to handle every task perfectly without any oversight, they are not being honest with themselves. This kind of thinking can lead to frustration when Copilot doesn’t deliver the desired results, prompting some to unfairly blame Microsoft and the tool itself, rather than recognizing that they might not be using it optimally.

At its core, Copilot is designed to assist, not replace, human judgment. It excels when users provide clear, structured instructions, and when they recognize the need to step in and guide the tool through more complex tasks. In this sense, honesty is not just about acknowledging Copilot’s limitations—it’s about being truthful about one’s own role in the process. Expecting a tool to do all the heavy lifting without any effort or input is a recipe for disappointment.

But honesty isn’t just required on the part of users. Microsoft also has a responsibility to be transparent about what Copilot can and cannot do. Overselling the tool as a flawless AI that requires little to no human intervention can lead to a significant disconnect between expectations and reality. If users are not properly trained on how to use Copilot effectively—understanding its strengths and working around its limitations—they might feel misled and think they’ve been sold on empty promises.

The solution lies in balance. By being upfront about Copilot’s capabilities and its current limitations, and by offering robust training on how to best use the tool, Microsoft can empower its users to harness the true power of Copilot. This way, users won’t be turned off by unmet expectations but will instead see Copilot for what it truly is: a powerful ally that, with the right guidance, can rocket boost their efficiency and productivity.

Advice for Microsoft: How to Make Copilot a Must-Have Tool for Businesses

While Microsoft Copilot has shown great promise, there’s no denying that it still has room for improvement. As powerful as it is, Copilot needs refinement if Microsoft wants to attract more users and ensure widespread adoption across business systems. Here are some key recommendations to help Copilot reach its full potential.

1. Focus on Step-by-Step Task Management in the Medium Term

One of the current pitfalls of Copilot’s interface is that users are given just one prompt box to perform complex tasks. This essentially sets Copilot up to fail, as it must interpret a user’s prompts and handle intricate tasks that require complex logic. Instead, Microsoft could introduce an interface that allows users to guide Copilot step-by-step.

For example, when a user is creating a Power Automate flow, rather than requiring them to describe an entire workflow in a single text box, Microsoft could develop an interface that prompts users to guide Copilot when they want to add a new action. This would make it easier to find the correct steps, select settings, and ensure each phase of the flow works as expected. By breaking tasks into smaller pieces, Copilot would help users create complex workflows with less effort and fewer mistakes, making it more accessible for both novices and experts alike.

2. Invest in Strengthening Logical Reasoning Capabilities

Copilot still struggles with logical reasoning, and improving its capacity for critical thinking is essential if Microsoft wants it to meet the expectations it has set.

Of course, this is a significant challenge, requiring a huge investment of time, resources, and training data. However, training Copilot to handle logical reasoning more effectively would significantly increase its reliability and make it a trusted tool in areas that require detailed analysis, decision-making, or problem-solving. This improvement would address one of the key complaints from users who rely on Copilot for tasks beyond simple queries.

3. Enable Cross-AI Collaboration for Specialized Tasks

Lastly, Copilot is not always the best AI model for certain tasks. For example, I once asked ChatGPT if GPT could be fine-tuned to become a super chess model. The chatbot admitted that it would likely never match the level of specialized chess AI models because their infrastructure is completely different. However, it could interpret the results from those models and help users understand the chessboard like those bots.

Rather than trying to make Copilot an expert in every field, Microsoft should allow it to interact with other specialized AI models, such as those focused on niche tasks like medical diagnostics, financial analysis, or design. This kind of cross-AI collaboration could position Copilot as a versatile assistant, capable of tapping into expert systems when necessary, without compromising its broader functionality.

Final Thoughts on Empowering Copilot for the Future

To truly make Copilot a must-have tool for businesses, Microsoft needs to focus on two key areas: transparency and user empowerment.

First and foremost, Microsoft should be completely transparent about Copilot’s current capabilities and limitations. While Copilot can indeed save time by processing large amounts of information quickly, it still requires human guidance, particularly for tasks that involve complex reasoning. Users need to know that Copilot is not a magical solution for every problem but a powerful assistant that, with the right input, can dramatically increase productivity. By setting realistic expectations and training users to understand how to work within Copilot’s limitations, Microsoft can prevent frustration and foster long-term user engagement.

Second, Microsoft should focus on making Copilot more intuitive and user-friendly by guiding users through tasks in manageable steps. Adding functionality that allows users to interact with Copilot step by step for complex workflows—rather than relying on a single prompt—would make Copilot more effective and accessible to a broader audience. This would minimize errors and reduce the complexity of using AI for intricate processes, such as creating automated workflows in Power Automate.

In the long term, investing in Copilot’s logical reasoning capabilities and enabling cross-AI collaboration would be game-changing. By enhancing its reasoning abilities and integrating with specialized AI models for tasks like medical diagnostics or financial analysis, Microsoft can position Copilot as a versatile assistant capable of handling both general and niche tasks.

Ultimately, the key to Copilot’s success lies in balancing innovation with user support. By addressing these areas, Microsoft can ensure that Copilot lives up to its promise, becoming an indispensable tool that helps users work smarter, not harder.

Facebook
Twitter
LinkedIn