Speech recognition roads are bumpy, speed, private cloud is a key factor

Speech recognition is considered by the industry to be the next explosion point of search engines. However, due to the fact that the recognition accuracy in practical applications needs to be improved, the commercialization of speech technology is not good.

Yunzhisheng, which was established only last week for more than a year, officially announced that it has won RMB 100 million in Series A financing. Yun Zhisheng, co-founder and CEO of Yunzhisheng, told Nandu reporter that compared to the traditional 2B speech recognition company, Yunzhisheng’s genes are more biased towards the Internet, and the customization of private cloud services is facilitated through the free public cloud voice platform. Further commercialization has laid the groundwork.

Although the data is less successful in execution

Liang Jiaen, who was born in the Chinese Academy of Sciences, has been engaged in speech recognition research for more than ten years, and his entrepreneurial team also has a profound technical background, and has accumulated more than ten years of professional recognition in speech recognition and semantic understanding. In his view, the need for voice interaction is becoming more and more urgent: as mobile Internet grows, smartphones, smart TVs, wearable devices, etc. all require a good interactive experience, and voice as the most direct way to interact; For the user, it is also more willing to choose a simple, natural voice interaction. "This is an industry trend, and intelligent interaction means will become mainstream in the future." Liang Jiaen said.

With current technology, speech recognition can achieve very high accuracy under laboratory conditions, but in practical applications, environmental noise, dialect accent, topic professionalism and other issues are often encountered, which ultimately affects the user experience, and therefore, the technical stability and Maturity is the threshold for entrepreneurship in speech recognition. Liang Jiaen believes that in order to complete the speech recognition system, in addition to powerful algorithms, there must be a large amount of data. He admits that compared with the industry leader Keda Xunfei, Yunzhisheng has much less data. However, by establishing a public cloud platform, data can be accumulated continuously to optimize the system.

Specifically, the public cloud platform provides large-vocal continuous voice online recognition, and developers can directly invoke public cloud services through A PI regardless of their size. In fact, more than 80% of Yunzhisheng's customers are SMEs and individual developers, which is a difference and complement to Keda Xunfei, which is deeply rooted in large customer service. However, this does not affect the big companies' favor for Yunzhisheng. Liang Jiaen said frankly, including LeTV and Hammer RO M. In fact, the first thing to look for is the University of Science and Technology. The reason why Yunzhisheng can finally attract them, in addition to the technology to a certain level, the rapid execution is the biggest advantage. "In the case of cooperation with Sogou, it took only two weeks from the first contact to the release of the Sogou voice assistant, and generally it was a matter of months." These large companies have a large number of users. The son brought a lot of data to the public cloud platform.

Power private cloud customization

With the public cloud as the foundation, Yunzhisheng further explores the path of private clouds.

The so-called private cloud is to provide customized intelligent interactive solutions for enterprises, including speech recognition, semantic understanding, and speech synthesis. Liang Jiaen explained that the public cloud platform provides only basic voice technology services. In fact, the voice interaction is very relevant to the enterprise business. For those enterprises that have just needed voice, the public cloud is not enough to fully meet the demand. The recognition model is also optimized based on the company's unique application environment. For example, the cooperation between Yunzhisheng and LeTV is to deeply customize and integrate the voice assistant according to the TV field, so that the effect is more suitable for the actual use of the smart TV. "The only thing that is really willing to pay is that this part just needs users. The public cloud platform of Yunzhisheng is free, and the private cloud platform of 2B is the main source of revenue." Liang Jiaen said.

However, compared to the thousands of developers accumulated by public cloud platforms, only ten companies that customize private cloud services, how to improve the custom business of private clouds, thereby increasing revenue? Liang Jiaen pointed out that in fact, after the public cloud platform is enlarged, its users will also be converted into private cloud users. This is the reason why the former is free--free to attract a large number of developers to access their platforms, understand and experience the voice. Identification. If this leads to increased user activity and increased user stickiness for developers' applications, they will recognize the value of voice and even be willing to pay for better service. Therefore, the public cloud is brand promotion, but also to cultivate users.

In the choice of customization field, Liang Jiaen said that he will not frame himself at present, and all aspects of mobile phones, televisions, vehicles, smart watches, and call centers will be involved. “Try to know different industries and know which markets are big enough. However, we will definitely focus on two or three areas and then grow bigger.”

Do not compete with their own platform developers

The fee for technical services alone may not be sustainable, and Yunzhisheng has a longer-term plan for profitability.

Liang Jiaen expects that public cloud platforms may have tens of thousands of developers in the future, and when users gather enough, it is possible to do the back-end realization. He envisions that this is a chain of advertisers, platforms and front-end developers: the number of A PP users of a single developer may be only tens to millions, and the advertising value is not large; Ten thousand developers and tens of thousands of applications have accumulated a large number of users, which has advertising value and recommendation value; revenue from advertisers, platform and developer share. However, to really get through this chain, Liang Jiaen believes that the platform must reach at least 100 million users, and there is still a long way to go before Yun Zhizhi’s current volume.

In addition, Liang Jiaen said that he only focuses on platform development and does not intend to develop his own voice A PP. He believes that if Yunzhisheng also does C-side push A PP, then there is a lot of competition with developers, and their platform is not practical. "By the developers to make the platform bigger, so that developers can not only use our platform for free, but also share the benefits. In the Chinese Internet environment, this business model has gone far."

Yunzhisheng gathers developers through the voice cloud platform and further explores business value in the future. This idea is the correct "big loop thinking" in the Internet age. Focusing on the platform rather than simply making money on voice, mobilizing the power of developers to build the entire ecosystem.

As a technology cloud platform, clear mode and strong marketing are good points in the beginning, but in the long run, it depends on whether the technology is competitive. In addition to Yunzhisheng, several engineering companies with technical backgrounds such as Keda Xunfei and Sibichi have launched voice clouds, which have brought more choices to developers. The market is big and there are opportunities. The key is whether it can be more advantageous than other platforms in terms of recognition and understanding.

As far as the language industry is concerned, the current focus has gradually shifted to human-machine dialogue technology that can understand natural language in a natural environment, including how to use it in a noisy car TV, where the recognition rate is not high. Round dialogues understand user intent; how to complete complex information search through dialogue, booking transactions, etc., there is still a lot of room for development. For traditional voice companies, especially the original speech recognition to text, text and then semantic recognition will face challenges, and a few companies with the potential of platform technology such as Yun Zhisheng are valuable.

