テクノロジー

Implementing Dynamic Rendering for Search Engines on pixiv.net

mu-ko mu-ko
2019.9.19
シェア
ツイート
ブックマーク
トゥート

Japanese version: pixivで検索エンジン向けにDynamic Renderingを実装した話 - pixiv inside

Hello, I am mu-ko, a product manager at pixiv.

This year, we have implemented Dynamic Rendering for Search Engines such as Google on our largest service, pixiv.net. In this article, I will be talking about the background behind our Dynamic Rendering implementation, reexamining methods, and notes.

What is pixiv

pixiv is a service for creators to post illustrations, manga, and novels.

As of September 2019, there are over 40 million registered users and 85 million works. Accesses from Japan amount to approximately 60%, the rest being from overseas.

In order to make it easier for users outside Japan to find artworks, we provided a feature for users to suggest translations for the Japanese tags in 2018, so that even users who can't read Japanese are able to find artworks on pixiv.

I want to know how my translation will be used – pixiv Help Center

Number of pages and crawls

  • About 85 million work pages
  • etc

pixivの数字

  • Googlebot crawls 1 to 2 million pages / day1

  • Bingbot crawls 8 to 10 million pages / day

The background behind the implementation of Dynamic Rendering

pixiv has been gradually transitioning to SPA since 2018. However, the renewal was visible only to logged-in users. This was because it has been determined that if pixiv's pre-login pages become SPA, the traffic from Google Search or the likes may eventually decrease.

If search traffic decreases, discovering new artworks will become harder.

However, we wanted to release the new (SPA) design for not logged-in users. Since the SPA are rendered Client Side, the options were either Server Side Rendering or Dynamic Rendering. Implementing SSR was cost-prohibitive, so we decided to implement Dynamic Rendering instead.

To implement Dynamic Rendering we used a customized Rendertron.

Rendertron is a headless Chrome rendering solution designed to render & serialise web pages on the fly.

GitHub - GoogleChrome/rendertron: A Headless Chrome rendering solution

Why Dynamic Rendering?

  • Allow the bots to index the pages faster

Googlebot can render and index Client Side Rendered pages, but it’s slow and there’s a high probability of pages being stuck in the Render Queue for some time, therefore it will take longer for CSR SPA pages to be indexed.

Understand the JavaScript SEO basics | Search | Google Developers

  • Development costs were lower than SSR

Since the app was not built with SSR in mind initially, rebuilding it would take considerable effort, time we can spend instead on other development.

  • It was confirmed that there is no problem using Dynamic Rendering for Bingbot

More information is available on Bing blogs.

Disadvantages of Dynamic Rendering

  • The response can only be “good enough” for bots, unlike SSR.
  • Can’t be used to render the pages to end users.
  • To check the Dynamically Rendered pages we had to implement a custom workflow when monitoring the renderer servers.

Examining the Dynamic Rendering implementation

When implementing Dynamic Rendering, we measured the index rate and the time it took to index.

Also, when Dynamic Rendering was not implemented, we measured how long it took to index due to the influence of Render Queue.

We examined the following:

  1. Does pre-rendering using Rendertron affect the content of the page?
  2. Does the page pass the mobile-friendly test?
  3. Is the structured data on the page correctly recognized by Google’s Structured Data Testing Tool?
  4. How are pages crawled by URL inspection tools?

------- Examinations after passing the first 4 steps -------
5. Is there a change in the crawl volume in the access logs?
6. Is there an effect on the index?

1. Does pre-rendering using Rendertron affect the content of the page?

We checked the HTML output from Rendertron and confirmed that some things were missing from it:

  • CSS on React pages - we are using Styled Components and the implementation does not render the styles inside <style> tags by default
  • Lazy-loaded content - the default viewport for Rendertron was too small for our use case

2. Does the page pass the mobile-friendly test?

We confirmed that it indeed passes the test with the mobile friendly test tool. Rendertron allows us to request the mobile version of the website for Googlebot.

3. Is the structured data on the page correctly recognized by Google’s Structured Data Testing Tool?

We tested this by enabling Dynamic Rendering for the UA of Structured Data Testing Tool.

4. How are pages crawled by URL inspection tools?

We implemented Dynamic Rendering on a small number of pages and confirmed how it was recognized on Google Search Console.

We can check the results of crawled pages from "VIEW CRAWLED PAGE". We compared this HTML to the client-side rendered page, and made sure there is no missing content.

5. Is there a change in the crawl volume in the access logs?

By scanning our access logs for Googlebot's UA, we investigated whether the amount of crawling on SPA pages changed.

By doing this research, we also found out how long it takes for a newly published page to be crawled.

6. Is there an effect on the index?

We investigated whether there is a difference in the index rate between pages with and without Dynamic Rendering.

We performed more than 20 tests in total. And we checked the effect of various adjustments.

To summarize the results, the final index rate can be 10-15% lower if Dynamic Rendering is not implemented on newly published SPA version pages.

In addition, even if the final index rate were to reach the same level, the time it takes for a new page to be indexed was delayed by more than 3 days.

Some tests were indexed smoothly without using Dynamic Rendering on SPA pages. However, if Dynamic Rendering is not used, the time until indexing is delayed due to the pages being stuck in Render Queue.

Important notice when implementing Dynamic Rendering

Make sure the content does not change

If the amount of content decreases, it may affect the index rate and evaluation. Therefore, it is recommended to perform the above verification to confirm whether the amount of content has decreased.

Pay attention to response time

We need to be careful about the time it takes for the Rendertron server to handle Googlebot requests.
Rendertron's default timeout is 10 seconds. pixiv changed the timeout to 3 seconds and optimized the SPA for it, in order to cut the waiting time for unnecessary loading, thus increasing the number of requests to be processed at the same time.

There is a way to enable caching for Rendertron, but pixiv did not use that feature. This is because our page contents update frequently and because of that, cache management is going to be extremely complicated.

Finally

pixiv has decided to make its frontend SPA-based in order to improve development efficiency and user convenience over the next 10 years. On top of that, we implemented Dynamic Rendering so that more people can reach the submitted artworks.

I hope this article will help those who are struggling with SEO on SPA-based sites.

If you have more questions, you can find me at:


  1. This is the number of requests by crawlers whose User-Agentis reported as Googlebot. The Mediapartners-Google crawler is not included.
    Google crawlers (user agents) - Search Console Help 

シェア
ツイート
ブックマーク
トゥート