{
    "componentChunkName": "component---src-templates-blog-post-tsx",
    "path": "/blog/2022-05-03-backstage-entity-provider/",
    "result": {"data":{"blogPost":{"title":"Tutorial: Using Github Webhooks with Backstage Entity Provider","slug":"/blog/2022-05-03-backstage-entity-provider/","authorNodes":[{"name":"Min Kim","slug":"/people/min-kim/"}],"markdown":{"html":"<p><a href=\"https://backstage.io/docs/features/software-catalog/external-integrations#custom-entity-providers\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Entity Providers</a> are a more scalable and robust alternative to <a href=\"https://backstage.io/docs/features/software-catalog/external-integrations#custom-processors\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Entity Processors</a>. The Backstage team introduced Entity Providers to solve problems that big deployments of Backstage were experiencing with the ingestion pipeline. If you take a look at their documentation on <a href=\"https://backstage.io/docs/features/software-catalog/life-of-an-entity\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">The Life of an Entity</a>, it illustrates how your new ingestion pipelines should be structured.</p>\n<p>Before the Entity Providers came into the picture, the preexisting processors essentially took on the roles of both the entity providers <em>and</em> the processors described in the documentation. I bring this up as it may provide clarity to anyone confused by some of the overlapping functionality of the current processors and entity providers in the Backstage plugins; some of those processors have not yet been updated to adapt to the new proposed ingestion pipeline.</p>\n<p>Getting back to why Entity Providers were introduced, there are some common issues with ingestion <em>processors</em>:</p>\n<ul>\n<li>The processing queue would get filled with no-op processing, putting pressure on external systems. In some cases, it leads to rate limiting requests.</li>\n<li>Writing performant custom processors required implementing caching, which many teams did not.</li>\n<li>Failure in external service leads to the creation of orphan entities, which disappear components from the catalog.</li>\n</ul>\n<p>Entity providers eliminate these problems by giving developers complete control over the execution of ingestion:</p>\n<ul>\n<li>Entity Providers do not have an implicit queue that drives their execution; instead, developers specify the mechanism that drives each entity provider. A driver for an entity provider can be a simple callback that runs on an interval, an event listener triggered by a Web Socket connection, or a response to the HTTP request.</li>\n<li>There is no need to cache requests because there is no implicit orphaning of entities. Entities are only mutated when the Entity Provider calls <em>commitMutation</em> on the connection. The developer can control update frequency by configuring the mechanism that drives the entity provider.</li>\n<li>Entity Providers automatically handle efficiently merging small changes to many entities without explicitly applying deltas. It’s possible to use just the deltas if the data source provides them.</li>\n</ul>\n<p>Since entity providers do not have an implicit queue, you’ll need to specify what drives each entity provider. You could use a task scheduler and specify the frequency of entity provider executions. Ideally, your entity provider would respond to changes in the original data source and remain idle all other times. Developers can accomplish this with webhooks and streaming.</p>\n<p>In this tutorial, we'll walk you through the steps of adding <code class=\"language-text\">GitHubOrgEntityProvider</code> to your catalog and then show you how you can configure Github Webhooks to trigger mutations to your Backstage database.</p>\n<h2 id=\"adding-an-entity-provider\" style=\"position:relative;\"><a href=\"#adding-an-entity-provider\" aria-label=\"adding an entity provider permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Adding an Entity Provider</h2>\n<p>Assuming you already have your own instance of Backstage, let's install the necessary packages for adding the Github Org Entity Provider:</p>\n<div class=\"gatsby-highlight\" data-language=\"text\"><pre class=\"language-text\"><code class=\"language-text\">yarn workspace backend add @backstage/plugin-catalog-backend-module-github @backstage/integration</code></pre></div>\n<p>Then you can import the provider to your catalog builder in <code class=\"language-text\">packages/backend/src/plugins/catalog.ts</code>. Be sure to replace the orgUrl with your own:</p>\n<div class=\"gatsby-highlight\" data-language=\"diff\"><pre class=\"language-diff\"><code class=\"language-diff\"><span class=\"token deleted-sign deleted\"><span class=\"token prefix deleted\">-</span> import { CatalogBuilder } from '@backstage/plugin-catalog-backend';\n</span><span class=\"token inserted-sign inserted\"><span class=\"token prefix inserted\">+</span> import { CatalogBuilder, EntityProvider } from '@backstage/plugin-catalog-backend';\n</span>...\n<span class=\"token inserted-sign inserted\"><span class=\"token prefix inserted\">+</span> import { ScmIntegrations, DefaultGithubCredentialsProvider } from '@backstage/integration';\n<span class=\"token prefix inserted\">+</span> import { GitHubOrgEntityProvider } from '@backstage/plugin-catalog-backend-module-github';\n</span>\nexport default async function createPlugin(\n<span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> env: PluginEnvironment,\n</span>): Promise&lt;Router> {\n<span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> const builder = await CatalogBuilder.create(env);\n<span class=\"token prefix unchanged\"> </span> builder.addProcessor(new ScaffolderEntitiesProcessor());\n</span>\n<span class=\"token inserted-sign inserted\"><span class=\"token prefix inserted\">+</span>  const integrations = ScmIntegrations.fromConfig(env.config);\n<span class=\"token prefix inserted\">+</span>\n<span class=\"token prefix inserted\">+</span>  const githubCredentialsProvider = DefaultGithubCredentialsProvider.fromIntegrations(integrations);\n<span class=\"token prefix inserted\">+</span>\n<span class=\"token prefix inserted\">+</span>  const gitProvider = GitHubOrgEntityProvider.fromConfig(env.config, {\n<span class=\"token prefix inserted\">+</span>    id: \"github-org-entity-provider\",\n<span class=\"token prefix inserted\">+</span>    orgUrl: \"https://github.com/my-organization\", // 🚨 REPLACE\n<span class=\"token prefix inserted\">+</span>    logger: env.logger,\n<span class=\"token prefix inserted\">+</span>    githubCredentialsProvider\n<span class=\"token prefix inserted\">+</span>  });\n<span class=\"token prefix inserted\">+</span>\n<span class=\"token prefix inserted\">+</span>  builder.addEntityProvider(gitProvider as EntityProvider);\n</span>\n<span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> const { processingEngine, router } = await builder.build();\n<span class=\"token prefix unchanged\"> </span> await processingEngine.start();\n</span>\n<span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> return router;\n</span>}</code></pre></div>\n<p>If you try to run your Backstage app, you should notice nothing has changed in your catalog. As mentioned earlier, developers must provide the mechanism that will drive the entity provider. For the sake of this example, let's add the GitHubOrgEntityProvider's <code class=\"language-text\">read()</code> at the end of our catalog builder:</p>\n<div class=\"gatsby-highlight\" data-language=\"diff\"><pre class=\"language-diff\"><code class=\"language-diff\"><span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> await processingEngine.start();\n</span><span class=\"token inserted-sign inserted\"><span class=\"token prefix inserted\">+</span>  await gitProvider.read();\n</span>\n<span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> return router;\n</span>}</code></pre></div>\n<p>When you restart your Backstage, you <em>should</em> see the members and teams of your organization. If you do not, you should check the permissions of your Github authentication. Whether you're using a personal access token or a <a href=\"https://backstage.io/docs/integrations/github/github-apps\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Github App</a>, you need to make sure it has permissions for <code class=\"language-text\">read:user</code> and <code class=\"language-text\">read:org</code>.</p>\n<p>Calling <code class=\"language-text\">read()</code> towards the end of the catalog building process will only update your database once during deployment so now we're going to configure a webhook to trigger the updates.</p>\n<h2 id=\"configure-github-webhook\" style=\"position:relative;\"><a href=\"#configure-github-webhook\" aria-label=\"configure github webhook permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>Configure Github Webhook</h2>\n<p>Let's start by creating your webhook on Github. From your Github organization's settings page, click <code class=\"language-text\">Webhooks</code> in the side bar and then <code class=\"language-text\">Add webhook</code>.</p>\n<p><figure class=\"figure\"><img src=\"/img/2022-05-03-add-webhook.png\"><figcaption class=\"figure-caption\">github-webhook-section-screenshot</figcaption></figure></p>\n<p>You can get a <code class=\"language-text\">Payload URL</code> from <a href=\"https://smee.io\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">smee.io</a> - this is a service that proxies payloads from your webhook for local development.</p>\n<p>Set the content type to <code class=\"language-text\">application/json</code> and let's specify the events we want by selecting <code class=\"language-text\">Let me select individual events</code>. The <code class=\"language-text\">GitHubOrgEntityProvider</code> adds users and teams to your catalog so the webhook events we want to receive from Github are <code class=\"language-text\">Orgnaization</code> and <code class=\"language-text\">Teams</code>.</p>\n<blockquote>\n<p>For your actual deployment, you'll want to modify the Payload URL from the smee URL to the URL of your live Backstage app.</p>\n</blockquote>\n<p>Once the webhook is added, you should follow the instructions displayed on <a href=\"https://smee.io\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">smee.io</a> to install their client and use <code class=\"language-text\">http://localhost:7007/api/catalog/github/webhook</code> as the target URL.</p>\n<p>Next, update your catalog builder so that it runs <code class=\"language-text\">read()</code> when a webhook event is posted to <code class=\"language-text\">/github/webhook</code>:</p>\n<div class=\"gatsby-highlight\" data-language=\"diff\"><pre class=\"language-diff\"><code class=\"language-diff\"><span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> await processingEngine.start();\n</span><span class=\"token deleted-sign deleted\"><span class=\"token prefix deleted\">-</span>  await gitProvider.read();\n</span><span class=\"token inserted-sign inserted\"><span class=\"token prefix inserted\">+</span>  router.post(\"/github/webhook\", async (req, _res) => {\n<span class=\"token prefix inserted\">+</span>    const event = req.headers[\"x-github-event\"];\n<span class=\"token prefix inserted\">+</span>    if (event == \"membership\" || event == \"organization) {\n<span class=\"token prefix inserted\">+</span>      await gitProvider.read();\n<span class=\"token prefix inserted\">+</span>    }\n<span class=\"token prefix inserted\">+</span>  })\n</span>\n<span class=\"token unchanged\"><span class=\"token prefix unchanged\"> </span> return router;\n</span>}</code></pre></div>\n<p>If you run the smee client and your Backstage app, you'll see that the entity provider will update the database only when a webhook event is posted by Github.</p>\n<p>If you have concerns of the possibily of a webhook being missed, you might want to consider using the Backstage <a href=\"https://github.com/backstage/backstage/tree/master/packages/backend-tasks\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">task scheduler</a> to run <code class=\"language-text\">read()</code> once a day for that extra assurance.</p>\n<h2 id=\"whats-next\" style=\"position:relative;\"><a href=\"#whats-next\" aria-label=\"whats next permalink\" class=\"anchor before\"><svg aria-hidden=\"true\" focusable=\"false\" height=\"16\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"16\"><path fill-rule=\"evenodd\" d=\"M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.64 1-2.09V6.25c-1.09.53-2 1.84-2 3.25C6 11.31 7.55 13 9 13h4c1.45 0 3-1.69 3-3.5S14.5 6 13 6z\"></path></svg></a>What's next?</h2>\n<p>In this tutorial we quickly went over the steps of adding an entity provider to your catalog and using smee to proxy webhook events to your local environment, but this is just the beginning!</p>\n<p>If you look at the implementation of <a href=\"https://github.com/backstage/backstage/blob/master/plugins/catalog-backend-module-github/src/GitHubOrgEntityProvider.ts#L106-L147\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><code class=\"language-text\">GitHubOrgEntityProvider</code></a>, the <code class=\"language-text\">read()</code> function queries data directly from Github and runs a <code class=\"language-text\">full</code> mutation. In its current state, depending on the size of your organization, your <code class=\"language-text\">read()</code> function might end up triggering way too frequently - resulting in too many Github requests and performing a full mutation of your database each time.</p>\n<p>When you create your own custom entity provider, you will want to create a function that applies a <code class=\"language-text\">delta</code> mutation just from the data received from the webhook events. You can read more about the mutation types <a href=\"https://backstage.io/docs/features/software-catalog/external-integrations#provider-mutations\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">here</a>.</p>","frontmatter":{"date":"May 03, 2022","description":"In this short tutorial, Min will show you how to configure Github Webhooks for Backstage's Github Entity Provider","tags":["backstage"],"img":{"childImageSharp":{"fixed":{"src":"/static/089fea5cc9ab03dd1375a107244fcc3e/31987/2022-Github-with-Backstage.png"}}}}}}},"pageContext":{"id":"a0a44e3c-99e5-5cd1-95a4-832596e08207"}},
    "staticQueryHashes": ["1241260443"]}