How a tiny startup beat the tech giants of the world and ranked #1 in video search (Part 1

‍

One question I often got from our customers and investors is

“How does your technology compare to Google’s or Microsoft’s?”

I’m sure what they REALLY wanted to ask is…

“Is your technology better than Google’s or Microsoft’s?”

It’s a difficult question to answer. Even more difficult for a deep tech AI startup, especially if the founders do not have a strong track record of publications or come from academia. The answer usually ends up in one of two routes,

The bulldozer strategy: “Yes, we are better! Here is their technology’s benchmark performance and here is ours.”
→ Reaction: doubtful, questioning, and sometimes even resentful
The sidestep strategy: “We provide better usability and can build out features for specific customer segments, and our customers love it!” (Talking about products and customers instead of technology)
→ Reaction: maybe persuaded, but still dissatisfied.

We always went with #2 despite having had a better benchmark performance than other companies. It gave a natural segue for us to talk more about our customers, and more importantly, we definitely did not want to argue with the folks we were trying to turn into believers in our product and vision!

As a technical founder who’s leading the AI research and product development, I was often discouraged. Each time I heard the same question repeated, again and again, I felt powerless to the point of having nervous breakdowns. I constantly repeated the word “sorry” to my team who worked day and night to build the incredible technology I knew we had.

That’s when I knew we had to participate in the ICCV VALUE(Video-And-Language-Understanding-Evaluation) Challenge hosted by Microsoft. The challenge had already started two weeks earlier, but who cares? This was the perfect opportunity to prove ourselves.

Three reasons,

The task of the challenge was spot on for us — video retrieval — that evaluates the performance of video search AI models.
The evaluation would be objective and complete, with four different and diverse domains of benchmark video datasets.
It was hosted and joined by the most prestigious AI institutions and tech giants such as Microsoft, Tencent, and Baidu, giving us the chance to directly compete against them.

If we could win the competition, there would be much to gain: credibility, branding, PR, hiring, confidence, …and most importantly, we would have a powerful, bulldozer answer to give to our customers and investors when asked something along the lines of, “Are you better than Google?”

Despite the shiny opportunities that we imagined if we could win the competition, the odds were so obviously against us.

We had limited cloud GPU resources that we could utilize for training multiple models at the same time. At the time, we only had $50K to spare for the competition. We had been given $100K worth of free AWS credit upon joining Techstars, and had already used up $50K. For a competition of this size, $50K in compute is same as not having compute at all.
We had limited human “labor”. Our entire company consisted of fewer than 10 people. 10 people minus the non-engineers minus the engineers who had to focus on product and PoC tasks with beta customers...? That only left me with 2 engineers, and that’s including myself.
We had limited datasets that we could use to train our model. Unlike the tech giants who own infinite amounts of videos to pre-train their models, our only option was to utilize public video datasets that are available to everyone.

And so, I believed that we had less than a 10% chance of winning, but still decided to participate. Just like any startup at some point, we needed to take a leap of faith and arm ourselves with a winning mentality. As the startup saying goes, the odds will always be 0% if we don’t do anything.

Next Post: Part 2 — Nuts & Bolts of ICCV VALUE Challenge

‍

One question I often got from our customers and investors is

“How does your technology compare to Google’s or Microsoft’s?”

I’m sure what they REALLY wanted to ask is…

“Is your technology better than Google’s or Microsoft’s?”

It’s a difficult question to answer. Even more difficult for a deep tech AI startup, especially if the founders do not have a strong track record of publications or come from academia. The answer usually ends up in one of two routes,

The bulldozer strategy: “Yes, we are better! Here is their technology’s benchmark performance and here is ours.”
→ Reaction: doubtful, questioning, and sometimes even resentful
The sidestep strategy: “We provide better usability and can build out features for specific customer segments, and our customers love it!” (Talking about products and customers instead of technology)
→ Reaction: maybe persuaded, but still dissatisfied.

We always went with #2 despite having had a better benchmark performance than other companies. It gave a natural segue for us to talk more about our customers, and more importantly, we definitely did not want to argue with the folks we were trying to turn into believers in our product and vision!

As a technical founder who’s leading the AI research and product development, I was often discouraged. Each time I heard the same question repeated, again and again, I felt powerless to the point of having nervous breakdowns. I constantly repeated the word “sorry” to my team who worked day and night to build the incredible technology I knew we had.

That’s when I knew we had to participate in the ICCV VALUE(Video-And-Language-Understanding-Evaluation) Challenge hosted by Microsoft. The challenge had already started two weeks earlier, but who cares? This was the perfect opportunity to prove ourselves.

Three reasons,

The task of the challenge was spot on for us — video retrieval — that evaluates the performance of video search AI models.
The evaluation would be objective and complete, with four different and diverse domains of benchmark video datasets.
It was hosted and joined by the most prestigious AI institutions and tech giants such as Microsoft, Tencent, and Baidu, giving us the chance to directly compete against them.

If we could win the competition, there would be much to gain: credibility, branding, PR, hiring, confidence, …and most importantly, we would have a powerful, bulldozer answer to give to our customers and investors when asked something along the lines of, “Are you better than Google?”

Despite the shiny opportunities that we imagined if we could win the competition, the odds were so obviously against us.

We had limited cloud GPU resources that we could utilize for training multiple models at the same time. At the time, we only had $50K to spare for the competition. We had been given $100K worth of free AWS credit upon joining Techstars, and had already used up $50K. For a competition of this size, $50K in compute is same as not having compute at all.
We had limited human “labor”. Our entire company consisted of fewer than 10 people. 10 people minus the non-engineers minus the engineers who had to focus on product and PoC tasks with beta customers...? That only left me with 2 engineers, and that’s including myself.
We had limited datasets that we could use to train our model. Unlike the tech giants who own infinite amounts of videos to pre-train their models, our only option was to utilize public video datasets that are available to everyone.

And so, I believed that we had less than a 10% chance of winning, but still decided to participate. Just like any startup at some point, we needed to take a leap of faith and arm ourselves with a winning mentality. As the startup saying goes, the odds will always be 0% if we don’t do anything.

Next Post: Part 2 — Nuts & Bolts of ICCV VALUE Challenge

‍

One question I often got from our customers and investors is

“How does your technology compare to Google’s or Microsoft’s?”

I’m sure what they REALLY wanted to ask is…

“Is your technology better than Google’s or Microsoft’s?”

It’s a difficult question to answer. Even more difficult for a deep tech AI startup, especially if the founders do not have a strong track record of publications or come from academia. The answer usually ends up in one of two routes,

The bulldozer strategy: “Yes, we are better! Here is their technology’s benchmark performance and here is ours.”
→ Reaction: doubtful, questioning, and sometimes even resentful
The sidestep strategy: “We provide better usability and can build out features for specific customer segments, and our customers love it!” (Talking about products and customers instead of technology)
→ Reaction: maybe persuaded, but still dissatisfied.

We always went with #2 despite having had a better benchmark performance than other companies. It gave a natural segue for us to talk more about our customers, and more importantly, we definitely did not want to argue with the folks we were trying to turn into believers in our product and vision!

As a technical founder who’s leading the AI research and product development, I was often discouraged. Each time I heard the same question repeated, again and again, I felt powerless to the point of having nervous breakdowns. I constantly repeated the word “sorry” to my team who worked day and night to build the incredible technology I knew we had.

That’s when I knew we had to participate in the ICCV VALUE(Video-And-Language-Understanding-Evaluation) Challenge hosted by Microsoft. The challenge had already started two weeks earlier, but who cares? This was the perfect opportunity to prove ourselves.

Three reasons,

The task of the challenge was spot on for us — video retrieval — that evaluates the performance of video search AI models.
The evaluation would be objective and complete, with four different and diverse domains of benchmark video datasets.
It was hosted and joined by the most prestigious AI institutions and tech giants such as Microsoft, Tencent, and Baidu, giving us the chance to directly compete against them.

If we could win the competition, there would be much to gain: credibility, branding, PR, hiring, confidence, …and most importantly, we would have a powerful, bulldozer answer to give to our customers and investors when asked something along the lines of, “Are you better than Google?”

Despite the shiny opportunities that we imagined if we could win the competition, the odds were so obviously against us.

We had limited cloud GPU resources that we could utilize for training multiple models at the same time. At the time, we only had $50K to spare for the competition. We had been given $100K worth of free AWS credit upon joining Techstars, and had already used up $50K. For a competition of this size, $50K in compute is same as not having compute at all.
We had limited human “labor”. Our entire company consisted of fewer than 10 people. 10 people minus the non-engineers minus the engineers who had to focus on product and PoC tasks with beta customers...? That only left me with 2 engineers, and that’s including myself.
We had limited datasets that we could use to train our model. Unlike the tech giants who own infinite amounts of videos to pre-train their models, our only option was to utilize public video datasets that are available to everyone.

And so, I believed that we had less than a 10% chance of winning, but still decided to participate. Just like any startup at some point, we needed to take a leap of faith and arm ourselves with a winning mentality. As the startup saying goes, the odds will always be 0% if we don’t do anything.