In San Francisco (CA), we meet Founder & CEO of CrowdFlower, Lukas Biewald. Lukas talks about his story how he came up with the idea and founded CrowdFlower, how the current business model works, as well as he provides some advice for young entrepreneurs.


Martin: Today we are in San Francisco in the CrowdFlower office. Hi, Lukas. Who are you and what do you do?

Lukas: I am the founder and CEO of CrowdFlower. We help data scientists enriching their data. We make it easy to turn your massive data into clean enriched complete data. That is useful for data scientists because they have to analyze that data or build models.

Martin: How did you come up with that idea of CrowdFlower?

Lukas: That is simple. I was working as a data scientist. I always felt that the most important part of the process was the collecting and cleaning of the data. In a way, it was my least favorite part of the job but I really wanted to do that, get analysis and build good models. I got interested in building tools to help cleanup data. I found that they were very useful and then I thought this could be useful for other people so I built the company rather helping people clean up their data.

Martin: Great! Can you progress through the process once you started the company, maybe the first three months or so? What was it like starting the company?

Lukas: It was hard. It was a different time. We were little older than a lot of other startups. I remember Y Combinator, it wasn’t so clear that that was so important thing. There was a lot less resources for tech entrepreneurs. There wasn’t AngelList. AngelList was literally an email list. They would email your company and people would decide to invest or not.

It is really somewhat shocking when you go from having a job to starting a company because you have no infrastructure around you to help you. I remember we closed our first deal and we needed to receive a fax and then print it and then send the fax back and so customer was asking: “Hey, what is your fax number?” I remember we got on our bikes and went to Best Buy, which is a store in America. We literally got a big box. My co-founder, and me we carried it back and we plugged it in and we said okay, you can send us the fax now. There is no infrastructure and you are not getting a paycheck, which is scary.

I think my parents were concerned that maybe I was unemployed. It was super hard. I think it was a lot harder than I was expecting. I was used to building products and having everything else around me taking care of and I think I didn’t realize how much work goes into – you are doing finances, marketing, sales all those things.

Martin: How long did it take you to get the first financing and the first customer?

Lukas: The first customer happened early because we needed money. We sold the product long before it was ready. I think if we had had more access to capital we might have waited longer. It took us eighteen months before we raised any financing. Again back then raising seed rounds was hard. There is a lot less interest in doing seed investments. I remember people would laugh at me: “You have no business plan, you do not have enough customers. There is no way we re going to invest in you.” I think times have changed a lot.


Martin: Let’s talk about business model of CrowdFlower. Did this business model change over time?

Lukas: Yes, it changed a lot over time. The way CrowdFlower works is you set up the data cleanup project that you want. We use workforce to go into the jobs and clean them up. Let’s say I am a data scientist at eBay and I want to know if a search result is good or bad. It is something that data scientists at eBay are interested in because if you search for iPhone and you get a result as a car with an iPhone adapter in it that’s really bad result and you don’t buy anything. eBay, they basically write down: “Here’s my rules, here’s what it means for results be relevant, here’s what it means for results to be not relevant and then here’s a big list of search queries and search results. And I want the crowd to tell me which ones are relevant and not relevant.” These are set up in a software and then the crowd does some of the labeling and then we use machine learning to do even more labeling.

In early days in the company, we operated as a managed service. You would tell us what you wanted and tell us your requirements and then we would operate the software to get the results for you. We charged per result and we’ve priced differently based on the types of applications. We switched to model to, now if you are on eBay and you want these results you go into our platform, you set everything up and the platform does all the management. We’ve gone through from a managed service model where we price per task and we would take a fixed percentage to now you pay for a platform license and then you can use our platforms as much as you need to. We operate as a SaaS software company now. That is our business model as opposed to managed service.

Martin: The example of eBay sounds to me more like you wanted to get this feedback cycle from customers for calibrating the model. In the beginning, you said that you help data scientists to clean their data, which is different – can you give us an example of this as well?

Lukas: Clean and collect. That is why we are saying ‘enrich’. I think it is the best of what we can think; it means clean and collect. Cleaning is often business data. We work with Autodesk data scientists, for example. They have a huge list of customers and they want to do analytics on their customer base but many times, they have duplicate customers in there. It is even complicated. It is like as YouTube and Google are at the same company. It depends on what you’re trying to accomplish. They sent us a big list of customers and then we clean up the records. We say these two records are the same, or we might say this record is mis-categorized, this company isn’t a tech company, the more media company or we might say this address looks like it is wrong, this phone number looks like it is wrong. That type of thing.

Martin: When we are talking about enrichment you need to get that enrichment right. How do you acquire that people that are working and helping data scientists and enriching the data?

Lukas: We post a task online and in some cases we pay people directly to do the tasks and in other cases, we have partnered with companies that have big workforces. We’ve made deals with companies around the world where we can post tasks on their website and pay people for doing jobs.

Martin: How do you control the quality of the work? For example, I am a data scientist and I have tons of data. I need to be sure that the quality I get from you is high because else the analyses will be wrong.

Lukas: Our software does it in many different ways. One simple way, simple but effective, is our customers can hide questions where they know what the answer is so they can do some of the labeling themselves. They could say okay this business and this business are the same. If somebody gets that wrong then they didn’t understand my instructions. That is the simplest way you can label those. We use that like a test when people come in.

We also ask different people the same question and we expect them to agree and if they don’t then we worry about that. We also have people look at other peoples’ results and say if they’re good. Our software platform manages all that. Our customer goes and they may write down what they want and then our platform takes care of controlling the quality and building different types of tasks that you need to make sure that results are good.

Martin: As I understand, those people are only enriching the data and not modeling features or something like that?

Lukas: Just enriching the data. Another thing that our software does is it watches the people label data and it actually builds an algorithm that protects what people are going to do for the label. In some cases, our machine learning software can figure out without even going to human being what the right label should be.

Martin: The people, who are enriching the data, are they doing this full time or it’s just a hobby of them?

Lukas: Typically, part time.

Martin: This means only when I have some kind of job to solve which is bigger I can hire flexible workforce via CrowdFlower?

Lukas: Exactly.

Martin: How did you acquire the first customers?

Lukas: It was hard. We saw the market need and we asked all our friends: “Hey, do you know anyone that needs this kind of data cleanup?” Luckily, we were able to get to some people that were willing to try us and we took good care of the early customers. Some of them are still with us. LinkedIn was very, very early. I remember this guy, DJ Patil he did our first deal and he’s become very famous data scientist. He signed the first contract.

In fact, one of our very early customers became an Angel investor, actually two. It is interesting, one of our early customers, Gary Kremen, was the founder of and he is used us and invested. Then Travis Kalanick, now is is famous as the CEO of Uber, but at the time, he was actually just an engineer working on data problems. He was a very early user of a platform member. He called in and he wanted to meet in person. He angel invested in CrowdFlower too. These are early people, who helped us long before Uber.

Martin: When you entered the market, did you only provide high discount or did you say try this for free, and if you love it, we can provide you more with a decent pricing?

Lukas: We never offered it for free because the problem is I think free is little too easy for people. We felt like you need to pay some things that we know that you actually care about the results and you’re not just doing this as a favor for us.

Martin: Currently, are you using any distribution partners or are you having only direct sales or more inbound or outbound sales?

Lukas: We are mainly direct sales. Our leads mainly come from inbound sources. One of the advantages we have is that we really sell data scientists. I don’t think that a lot of companies have figured out how to reach data scientists well. I think Kaggle has done a good job and some others but because we have narrow focus on a specific kind of customer, we find that inbound works well. We just try to make stuff that data scientists are going to be interested in. We put out data. That is what data scientists like. We will post data sets that we think are interesting or useful, we will survey data science industry. We have just run a conference last week for data scientists. I think it was very successful because it was specific. We love data scientists. I was a data scientist.
It is easy for us to talk to them.

I think we will probably start doing outbound but in my experience I think inbound, it’s harder to scale in a way, but it is more the way people want to be sold to. I believe in content marketing, for example. Create content – that is interesting. People can come to our website and they can learn about the state of the data science industry. That’s useful for them and then optionally they can take a look at CrowdFlower. We don’t force you to put in your email address or anything like that. We make the content available and then if people are interested and they want to collect data they can try that.

Martin: What is your competitors’ advantage over other platforms where you can also have crowd sources working for you?

Lukas: I think the simple answer is quality. We’re the biggest platform in terms of volume. The only one that’s close in terms of volume is called Mechanical Turk. You have maybe heard of it. They have an advantage because they don’t charge any fee to use their marketplace. They have a very low barrier to entry. My experience is that quality is often very bad or at least uneven. What we try to do is make sure the quality is good. I think that we are the highest quality data cleanup provider out there.

Martin: Is there a reason for this that you have a kind of mechanics in place or that you have higher qualified workforce?

Lukas: One is that our software is better. We are focused on data cleanup for data scientists. That means that our software is very specific for our application. We have good templates that worked really well for the types of things data scientists want to do. We have good accuracy measurements and sophisticated tools. It is not for everyone. It is for people that really care about data equality and really want to make sure they get good results. What that means on, what we call a contributor side of the marketplace, is people know that they’re being measured and they believe they’re being treated fairly. When we see someone do good work for us for a long time we get them access to more and more work, which is what they want. That means that the people that have been in our system long time you can really trust because they’ve proven over and over that they can do high quality work and they like coming back every day. I think better software and better market place means that when customers use us they get high quality results.

Martin: Imagine, I am a data scientist. I have several jobs to do. I have some kind of one-time analyzes to do, I have some kind of predictive analytics and algorithms I want to build. Are you only focusing on the first one which is some kind of pattern analyzes to do data enrichment for this, or you have some kind of maintenance data enrichment for live products?

Lukas: We love to do maintenance data enrichment for live products. We have tools to help with that. If you want to, you can use machine-learning tools to continuously label your data and you can even set things up so that you send it to the machine learning first and if the machine learning is confident and its answer you get back, and if it is not confident it gets labeled. Then those labels get fed back into the algorithm as a training data. You can use our tools. Many of our customers are advanced data scientists and they choose to use their own tools, their own custom stuff. That’s great too.

I think we are especially strong in predictive analytics because it requires so much training data. It is something that many people don’t realize. I would say that industry about predictive analytics is the best way to make your models effective. It has given lots and lots training data. That’s a great market for us and we love to help people with that.

Martin: Imagine, I am a market place. I would have a job for you and say: “Hey, we have one million search results.” How much would it cost to label the search results, those one million on search list?

Lukas: It depends on many things that you can set. There is a kind of a cost, quality and speed tradeoff. If you are very price-sensitive you just want to get it done as cheap as possible and you are willing to wait you can post a job at a low price and just kind of wait until it finishes. If you really want to get results back faster and you want high quality results than you have to pay a lot to get those results back. We do not mind any of these strategies. We make our money giving you the platform and tell you license for that and then you can pick your tradeoffs. We have a wide range of templates that you can use to get those results back. We will have simple templates that might cost, if it takes a few seconds maybe it only cost you a few cents to say if it is relevant result or not. If you have complicated taxonomy, you have complicated rules or if you want only to target our best contributors then you might have paid a $1 per record or something like that.

Martin: Is this auction based? Imagine I would put in the job description and say okay, people can bid or I can set a price – how does it work?

Lukas: The way our stuff works today is you set the price and then people can choose to do it or not.


Martin: Let’s talk about your learnings over the last year. What has been the major learnings from your side?

Lukas: One thing that’s really served us that we didn’t do in the beginning that I wish we have done earlier is to focus on one particular kind of customer. I think for a lot of entrepreneurs that come out with a new tool like a really new approach, a new kind of thing you can get lots of different people that are interested in using it. When we first launched CrowdFlower, we had many different kinds of people saying wow this is a cool tool. It is for surveys; I can do site usability testing with it and all other amazing things I can do with it. What that does is it feels good because you have all these options but it makes it impossible to do marketing. It was a scary hard decision for us to say hey we’re going to focus on data science. It was really difficult and I think a lot of the team was worried because data scientists were less than a quarter of our customer base. However, I felt that the data scientists were the happiest customers. I knew that if we focused on them we would be able to grow that market.

I think one of the skills of entrepreneurs to say is not how is things now but how could things change. Back in 2012, it looked like a data science market was small. Some of our investors and the management were as if this is too small. We cannot only focus on this market. I think if you look at the trends if you’re in it, you are thinking wow this market’s going to grow a lot so. Having some patience like the one we are going to focus on this now because it is going to set us up for success later. It is going to help us make a good decision. That decision is one of the best decisions that we have ever made. In retrospect, it looks like an obvious decision but at the time, it was not obvious. We had two executives leave because they were not on board. The decision seemed too risky.

Martin: What other lessons did you learn over the years?

Lukas: I think another underrated piece of running a business, and I have actually seen this in many of my friends companies’ too, is it is important to really like your customers. Everyone says: “It is true”. In many ways, your customer is your most important constituent. As a founder, you really want to like the people that you serve. One of the things that have made running this company fun is that I like data scientists. When I go to the data science conference I am interested I love hearing what they are up to. I feel much comfortable hanging out with the data scientists than often like C-level executives of the companies that were in. I find their problems much more interesting than like managing thousand persons team. I think that there is definitely an effective strategy with the business when you are going to the top and target the C-suite. But I think for us, as far as it looks for us as a business and our DNA is serving data scientists and making them successful.

I see that in many people that have gotten frustrated. Sometimes people go into business thinking they are going to serve one market that they like and then they end up serving a different market. Sometimes it works great but often it fails because they do not like that market. I have friends sell to HR, some of them love HR conferences. And then I have other friends that have discovered that HR is a great place to sell to but you know I don’t like these people that much. You cannot succeed if you don’t really enjoy your customers.

That is something to think about when I look at people and they ask me hey, I want to start a company. One way of looking at it is working backwards from whom you want to serve. Whom do you like? You maybe like entrepreneurs. Then you can start with that. Okay. I like entrepreneurs. What do they really want? I think like back into like that is a real recipe for success because you are going to make something you want; it’s going to be fun. If you are running a company, you spend so much of your time with customers. This is going to be most of your life.

Martin: Lukas, let’s talk about the growth options because you described that when the data science market was still small, you said: “Okay, I bet on that the market will grow, data enrichment provider.” But at some point the market is saturated with this kind of service. What other growth options do you perceive for your company?

Lukas: I would say we are far from selling every data scientists in the world. I think that we are going to grow with that for a good long time. In the back of my mind, there are so many cool things to try but I do not ever bring that up because we need to focus on saturating the data science market before we start to worry about expansion opportunities. I even think that there are ways to serve our market better so that we can actually increase the value that we’re making for our customers beyond just what we’re doing today. You look at the average sales price today; I think that could actually go up a lot in a way that everyone looks good about it because we can make our software more useful and sell more modules to our customer base. I guess for me, in my situation, I look more at how do we actually saturate the market because we actually haven’t done it yet and then how do we expand within the market that we are in.

For example, we have launched a new AI module recently. It was interesting experience for me because it was the biggest launch that we have had since we launched CrowdFlower. The original CrowdFlower did not have a machine-learning piece. Every record that you got back then was done by the Crowd. Recently we launched a thing where it is ok to say now you can have it done by artificial intelligence module. When we first built the CrowdFlower I think the hardest thing as an entrepreneur is to get feedback on what you are doing because people are busy and if they’re not using it, it is really hard to get people’s attention.

These books like The Lean Startup, The Four Steps to the Epiphany, they are excellent. They tell you: “Hey, run everything by customers before you make it.” That is easier said than done because you cannot just call up a potential user and they take your phone call. You really have to hustle to get in front of them. It was interesting to build this kind of second module for our data scientists, to do data enrichment because we actually had hundreds of people everyday that are logging on the CrowdFlower trying to enrich the data, using the tools. So “Hey, here I is what I am thinking. I think we built machine learning, do you have any feedback on it?” Of course, they have tons of feedback because they are so excited that we’re going to make the tool even more useful for them. It was a much faster customer development process that we were able to run because we had his existing customer base – they were all trying to do the same thing.

Martin: Just for clarification, the new AI module basically tries to do what the humans have been doing based on the machine learning that data enrichment is done by an algorithm which improves over time?

Lukas: Exactly.

Martin: The feedback loop is then done by your customers so the data scientists or is it done also by the Crowd?

Lukas: The feedback loop is be done by the Crowd. In a sense that if the data scientist is controlling everything, the data scientist might say: “If the model is under 90% confident in the answer, I want a human to actually look at it.” We automatically feed that back in the oven so it could get smarter. If we get 85% confident then we get a human to label it, then the algorithm can see if: “I was right” in which case it gets little more confident, or maybe “I am wrong” in which case it gets less confident and sort of retrain the parameters.

Martin: Thereby you can reduce the cost for data enrichment because they can only focus on the last 10%.

Lukas: Exactly. A good question. I think I explained it well.

Martin: Lukas, thank you so much for sharing your knowledge.

Lukas: Thank you very much.

Martin: If you are a data scientist you know data is the key for building some really awesome data products. If you want to enrich your data you have to focus on the cool machine-learning stuff then maybe you should think about CrowdFlower.

Comments are closed.