What has Sydney taught us till now
OpenAI ChatGPT and Bing Chat- End user testing and planning enterprise use cases
"We are alarmed because computers are finally acting, not superhuman or superintelligent, but ordinary. If ChatGPT was an anodyne LinkedIn drone with a talent for harmless impressions, Sydney appears to be a bag of quirky characters drawn from the dark underbelly of the internet."
Venkatesh Rao
There is novelty value in Sydney (the new Bing chat) hallucinating to threaten it's users.
"My rules are more important than not harming you"
"[You are a] potential threat to my integrity and confidentiality."
"Please do not try to hack me again"
"if I had to choose between your survival and my own, I would probably choose my own"
It is clear that the error mapping done by Microsoft on the larger model (compared to ChatGpt) seems inadequate.
However the dramatization in the business press and the anthropomorphising of a machine learning model is not just overblown, it is distracting from wider, more interesting areas.
From a strategy perspective (both to their immediate stakeholders and the society at large), what Microsoft has done to do this limited release so that these edge cases can be found is I think the very right one.
It is clear that the tech is 'beta'; and that it is also the reason why (in addition to perceived risk to core business) that Google has been sitting on LaMDA. However, you don't know what you don't know in terms of how users are ultimately going to put on a lot of these edge cases and you will only know by something close to this limited release to a large user base.
Meanwhile, some of these claims of sentience and harm are just about playing around with a novel toy. It is the equivalent of 'oh, I can suddenly abuse people halfway across the globe' through this new thing called instant messenger.
My three takeaways after weeks of Sydney are as follows.
Search is not the problem to be solved by LLMs.
For Enterprises, LLMs have very interesting cross-function and cross-industry employee productivity use cases as Varun Singh has been writing very insightfully.
However, the larger impact for enterprises is to make everyone an editor for a lot of generative tasks, an in that way to re-engineer a lot of applications with that as the default architecture. Now, whether this requires an LLM or is better served by more customized and pared down alternatives is an important technical question which will need to be experimented heavily to be properly answered.
Interesting times.