Login

***Gerald Lasser*** · 2022-07-13, 16:37

So I kind of did a Python crash course and tried to do a small crawler for the unofficial stuff.

It looks pretty good so far, doing what I intended it to do, given my Python experience of 0, I consider this a major goal :-)

you pass it a URL of a part in the unofficial lib and it will compile a list of subparts required to be downloaded so that the part "works" You can then (in future, once I manage the download and zipping) pass the package to e.g. Studio

So it looks into the the section of the website "Required (unofficial) subfiles" and recursively walks through them.

Code:
import requests

import time

from bs4 import BeautifulSoup

from urllib.parse import urljoin

class CrawledPart():

    def __init__ (self, Part, PartLink, DATLink):

        self.Part = Part

        self.PartLink = PartLink

        self.DATLink = DATLink

class PartFetcher():

    def fetch(self, partno):

        # u9247 (lots of data)

        # u9576 (no subfiles)

        url="https://www.ldraw.org/cgi-bin/ptdetail.cgi?f=parts/"

        liburl="https://www.ldraw.org/library/unofficial/"

        time.sleep(0.5)

        print ("Part to Fetch: " + partno )

        r= requests.get(partno)

        # cut off the parent files at the marker for the RELATED subfiles, we do not need them

        doc = BeautifulSoup(r.text.split ("Related")[0],"html.parser")

        # doc = BeautifulSoup(r.text,"html.parser")

        # class .list contains the list of the required sub-parts

        link = doc.select (".list")

        if len(link)> 0:

            for subpart in (link[0].select(".header")):

                Part = subpart.attrs["href"]

                PartLink = urljoin (url, subpart.attrs["href"])

                DATLink = urljoin (liburl, subpart.attrs["href"].split("=")[1])

                crawled = CrawledPart(Part, PartLink, DATLink)

                crawledparts.append (crawled)

                print ("Subpart: " + PartLink)

                subparts=PartFetcher()

                subparts.fetch(PartLink)

        return crawledparts

and the call

Code:
subparts = PartFetcher()

crawledparts=[]

parturl= "https://www.ldraw.org/cgi-bin/ptdetail.cgi?f=parts/71603.dat"

for item in subparts.fetch(parturl):

    print (item.DATLink)

What remains:
- Include the root parts link as well
- download now the single files
- pack them in a ZIP with the correct folder structure

Login
Username:
Password:	Lost Password?
	Remember me