![]() (An “HD” quality option costs twice that, but confusingly, an OpenAI spokesperson told TechCrunch that there’s no difference between HD and non-HD voices. That would fit Dickens’ “Oliver Twist” with a little room to spare. Although OpenAI removed Voice Engine’s pricing from the marketing materials it published today, in documents viewed by TechCrunch, Voice Engine is listed as costing $15 per one million characters, or ~162,500 words. We also know it will be priced aggressively. Harris claimed that OpenAI’s approach delivers overall higher-quality speech. So have Big Tech incumbents such as Amazon, Google and Microsoft - the last of which is a major OpenAI’s investor incidentally. A number of startups have delivered voice cloning products for years, from ElevenLabs to Replica Studios to Papercup to Deepdub to Respeecher. “The audio that’s used is dropped after the request is complete.”Īs he explained it, the model is simultaneously analyzing the speech data it pulls from and the text data meant to be read aloud, generating a matching voice without having to build a custom model per speaker. “We take a small audio sample and text and generate realistic speech that matches the original speaker,” said Harris. That’s owing in part to the ephemeral way in which the model - a combination of a diffusion process and transformer - generates speech. Surprisingly, Voice Engine isn’t trained or fine-tuned on user data. And in a recent statement to the U.K.’s House of Lords, OpenAI suggested that it’s “impossible” to create useful AI models without copyrighted material, asserting that fair use - the legal doctrine that allows for the use of copyrighted works to make a secondary creation as long as it’s transformative - shields it where it concerns model training. OpenAI also lets artists “opt out” of and remove their work from the data sets that the company uses to train its image-generating models, including its latest DALL-E 3.īut OpenAI offers no such opt-out scheme for its other products. OpenAI has licensing agreements in place with some content providers, like Shutterstock and the news publisher Axel Springer, and allows webmasters to block its web crawler from scraping their site for training data. OpenAI is already being sued over allegations the company violated IP law by training its AI on copyrighted content, including photos, artwork, code, articles and e-books, without providing the creators or owners credit or pay. But training data details are also a potential source of IP-related lawsuits, another disincentive to reveal much. Many generative AI vendors see training data as a competitive advantage and thus keep it and info pertaining to it close to the chest. Models like the one powering Voice Engine are trained on an enormous number of examples - in this case, speech recordings - usually sourced from public sites and data sets around the web. He would only say that the Voice Engine model was trained on a mix of licensed and publicly available data. I asked Harris where the model’s training data came from - a bit of a touchy subject. And Spotify’s been using it since early September to dub podcasts for high-profile hosts like Lex Fridman in different languages. The same model underpins the voice and “read aloud” capabilities in ChatGPT, OpenAI’s AI-powered chatbot, as well as the preset voices available in OpenAI’s text-to-speech API. The generative AI model powering Voice Engine has been hiding in plain sight for some time, Harris said. “We want to make sure that everyone feels good about how it’s being deployed - that we understand the landscape of where this tech is dangerous and we have mitigations in place for that,” Jeff Harris, a member of the product staff at OpenAI, told TechCrunch in an interview. ![]() But there’s no date for public availability yet, giving the company time to respond to how the model is used and abused. Under development for about two years, Voice Engine allows users to upload any 15-second voice sample to generate a synthetic copy of that voice. Today marks the preview debut of OpenAI’s Voice Engine, an expansion of the company’s existing text-to-speech API. As deepfakes proliferate, OpenAI is refining the tech used to clone voices - but the company insists it’s doing so responsibly. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |