Why didn't you grill that guest more?

I got an email from a listener (whom I will refer to only by his first name, Alex) who had some feedback on my recent episode with Andy Martin from Zillow. The email was long but I can paraphrase the core question in this snippet:

...some of his answers were vague and I wish you had interogated him more on how the model works. It would have been much more interesting to hear about the feature importance of their models and getting into the nitty-gritty of making a good prediction.

Andy and I talked quite a bit about Zillow's Zestimate service and Alex apparently felt that I had asked only surface level questions about the process for creating Zestimate. First off, Alex, thanks for your candid feedback. We really appreciate hearing raw opinions like this at Data Skeptic.

I think fundamentally, Alex's question is more about my approach as an interviewer than about Zillow specifically, so I'll answer the question from that perspective.

Long time listeners to Data Skeptic might notice that I rarely interview people from high profile companies. Yes, I've interviewed several people from Microsoft, but never anyone from Google, Facebook, LinkedIn, Cloudera, etc.

At least one of those companies seems to keep a tight leash on letting employees talk to the press, which is part of the explanation. But more importantly, my goal with Data Skeptic is always to bring you content that is informative, original, and interesting. I don't want to cover the same topics that other media outlets already cover and I will never let the program be a platform for a PR person to repeat their scripted talking points. Several listeners have observed that I rarely interview executives. My goal is always to get the message directly from the people doing the interesting work. When communicating with large companies, that can sometimes be a challenge.

Setting things up with Zillow was relatively easy. They were professional and accomodating at every step. There was one point in or interview process for which they gave the response "we can't talk about that particular detail", but in my own assessment, our interview is no weaker for leaving that singular detail out of the discussion. As an aside, some readers might be interested to know that there has only been one interview in the history of Data Skeptic where I canceled an interview mid-stream. It was with a mid-sized company where a PR person was listening in and continuously interjecting about what could and could not be talked about. There came a point when I determined that the restrictions on what we could discuss made the interview no longer worth airing. Nothing even remotely like that happened with my chat with Andy.

If there were deeply informative questions that I failed to ask him, it was due to to limits of my own creativity and not any restriction Zillow wanted to impose. My overall assessment is that they're a pretty open company. Yet, I'm trying to understand more about Alex's perspective.

Admittedly, I did not ask "what features go into your model and what are the exact weights your algorithm found for each feature?" But what good would that have really done anyone if I had? Algorithms are only part of the equation. Having the right data and the best pre-processing steps are critical and non-trivial. Even if Zillow had been willing to disclose deep parametric details about their model for Zestimate on the air, what good would it have done anyone?

Knowing some fine tuning detail of a model is a long way off from leveraging that information in a competitive way. The models Zillow is able to produce which result in their Zestimate are (presumably) the product of a large data collection phase, a data filtering / cleaning phase, a modeling phase, and ultimately a use of that model to extrapolate. There are probably 5 non-trivial steps in that process that they've become domain experts in. Saying "our coefficient for having a pool in Dallas is 1.05" is about as useful as me telling you that my PayPal account's password has 3 times as many symbols as letters.

My objective in all my interviews is to bring out details that are interesting to the use case at hand, but moreso, may provide inspiration to at least one listener about how they might tackle a problem they're working on. So while from some listeners' perspective, this or other interviews might seem to lack the request for details that some might consider proprietary, I actually look at this a totally different way...

If your business is so fragile that revealing the value of some coefficient on a podcast is a credible threat to your market share, than your business is already hopeless. What is interesting to me personally, is the process by which people tackle problems and leverage data. Their precise calculations don't seem to add all that much value, and are frequently perceived (right or wrong) as proprietary data / trade secrets.

Alex did end his email with a few kind words, leading me to believe he was overall happy with the episode. I certainly hope so, because I really enjoyed my conversation with Andy and I think it's a noteworthy episode. But if listeners tune in thinking that the next episode is going to reveal the secret optimal coefficient of a model that companies don't want you to know... well, maybe Data Skeptic isn't the right source for that.

I'm appreciative of Andy and Zillow as a company that they were willing to come on Data Skeptic and share so many details about what Zillow does. Did our discussion reveal any trade secrets for the casual real estate investor to exploit? Maybe not. But it also seems unlikely one little podcast interview could undermine what Zillow has created anyway. My questions were aimed at revealing the most novel and useful details about what their data science team works on, and I hope most listeners see it the same way.

But thank you Alex, for calling this out. I do appreciate feedback of all kinds. If you still think I failed to grill Andy on some particular nuance that he should have reveals about Zestimate, I'd like to claim that the burden of proof is now on you. What detail should I have asked about and what should he have revealed? With the most sincere genuinity, I am open to yours or any other commentor's thoughts on this. My personal perspective is that trade secrets tend to get more attention than necessary. Stealing the formula for the exact blend of herbs and spices to make Kentucky Fried Chicken is a long way off from launching the supply chains necessary to build a vast fast food empire.