Posted by ajtowns
Sep 5, 2023/18:15 UTC
In exploring the best practices for preserving forum content, it is evident that scraping is the recommended approach. This method is supported by multiple resources and discussions within the Discourse community. For instance, detailed guidance on archiving old forums to initiate a new Discourse forum can be found, indicating a preference for creating static HTML archives of forum content. There are several discussions and tools available that outline how to effectively achieve this, highlighting the importance and utility of scraping in maintaining the accessibility of forum data.
Moreover, the use of specific endpoints like the /raw/ endpoint provides a direct method for accessing the raw content of posts, further facilitating the scraping process. Additionally, the availability of APIs, such as the one that returns JSON formatted posts, underscores the technical avenues available for extracting data systematically.
The discourse on the Discourse platform extends to the provision of public data dumps. This feature emphasizes the platform's commitment to making forum data more accessible and useful, particularly for purposes like AI training. The prioritization of exporting data reflects an understanding of the broader applications and significance of forum content beyond its original context.
Overall, these resources collectively demonstrate a structured and endorsed methodology for content preservation through scraping, offering practical solutions and insights for those looking to archive or utilize forum data in various capacities. The links provided, such as Archive an old forum, Improving Discourse Static HTML Archive, A Basic Discourse Archival Tool, Exporting the complete forum as static HTML pages, the /raw/ endpoint (delvingbitcoin.org/raw/87), the API endpoint (delvingbitcoin.org/posts.json), and the discussion on Discourse Public Data Dump, serve as valuable references for anyone interested in this area.
TLDR
We’ll email you summaries of the latest discussions from high signal bitcoin sources, like bitcoin-dev, lightning-dev, and Delving Bitcoin.
We'd love to hear your feedback on this project.
Give Feedback