PROJECT: Auction Results in the Search Results – How would I go about this?

Whilst I am over-whelmed by the sheer level of technical knowledge required for my dream project, a level far beyond what I am (currently) capable of, I’ve decided to step back, relax and just think it all through. By breaking it into tiny projects I might, slowly, slowly make progress and can iterate on the idea as needed.

Where to start?

1) Define the product: build something that shows auction results in the search results, a specific google product that does this, like the recipe boxes, map results etc. Product should have images as this is critical for an art dealer.

2) Research a list of auction houses, start nationally.

3) Build a search engine crawler, that will visit the websites of these auction houses, scrape the results and present them in the as-yet-non-existent search product, or, in the short-term a website.

4) Display the data in a uniform, engaging way. There is Artnet already, but there info is behind a login and I don’t think the UI is particularly effective, easy or engaging. I haven’t even tried to look things up on a mobile…

Does this all sound wonderfully simple? Well, it won’t be, because I’ll have to:

a) work out how to build a search engine crawler. Looks like i’ll be learning Python after all..

b) work out how to collect and present what will be a mess of data. At this point, I have no idea how I will be able to plough through it all algorithmically to then make sure the fields populate as they are meant to. Nothing worse than an automated product that doesn’t work.

I believe the data lake I will be presented with will be my biggest challenge. How to order all of that? How to make sure all art works are properly collected under the right artist?

For example,

Making sure the following are all grouped under one man:

Henry Moore

Henry Moore (1898 – 1986)

Henry Moore (British, 1898 – 1986)

circle of Henry Moore

attributed to Henry Moore

Moore, Henry

The ‘circle of’ and ‘attributed to’ are iffy, but I believe it is still important to offer these results initially, whilst giving people the opportunity to filter them out.

At present, spelling mistakes or formatting errors mean results get lost. I also know that you can ‘hack’ the results by asking the auction houses not to submit their results to the existing product.

Work needed.

