Thinks and Links | May 24, 2024
BD&A - Thinks and Links
Big Data & Analytics - Thinks and Links | News and insights at the intersection of cybersecurity, data, and AI
đź“« Subscribe to Thinks & Links direct to your inbox
Happy Friday!
Computers Built from the Silicon Up for AI
It’s no surprise that Microsoft’s latest developer event, “Microsoft Build,” continued the trend around technology companies doing ever more with AI. This week’s series of announcements saw the introduction of the Copilot+ family of PCs. These are designed with hardware, operating system, and pre-installed software components that will take advantage of advances in Large Language Model (LLM) and other AI capabilities. With processing power onboard to run local models, they have the potential to revolutionize how users experience AI and test the limits which we’ll trust it.
The details from the conference include silicon that can process over 40 trillion operations per second (TOPS). This will bring practical AI inference capabilities to consumer users without needing to rely on a cloud service. AI models that can run locally have many advantages, including preventing the leakage of sensitive data and remaining compliant with various regulatory restrictions. This makes possible software capabilities also announced, notably “Recall.” This will be a photographic memory feature that will remember and understand everything you do on your computer by taking constant screenshots.
What remains to be seen is what happens with these screenshots. Are they stored locally and processed by the models onboard? Or will they be shipping those screenshots to a Microsoft Server to be processed by their clusters of NVidia H100 GPUs with 1,000 TOPS? The cloud servers will offer better performance, be easier to launch new features on, and help get consumers used to this behavior. They would also become some of the most target-rich repositories of personal and sensitive data in the world. A data breach or a rogue employee can alter the history of AI computing and create a lot of problems for a lot of people. I expect we will see this type of product hit the market without constraint, then bad things will happen, and eventually regulation will catch up to prevent such a massive ingestion of private data.
However, Apple has shown with Face ID and similar features the power of keeping AI small and localized. When you use your Face to unlock the phone and access credit cards, that AI model doesn’t leave the device. Apple has famously defended their customer’s rights to privacy by building devices with local encryption and offline models. It will be interesting to see if and how they contribute to the AI technology wars in their upcoming announcements, hopefully with greater privacy assurances than we’ve seen from Microsoft or Google.
I understand the privacy concerns and risks for technical IP. That said, on my personal PC I’m excited to try Recall. The power of AI to augment and accelerate everything I do can’t be understated. It has already made me a better coder, researcher, and writer. Being able to search my digital memory to tie up loose ends of tasks or be a more responsive coworker and friend sound amazing. I hope that the privacy concerns can be rapidly addressed because I really want this functionality. It’s going to be really interesting to watch companies that have done the work to ensure privacy and security protections are in place begin to really use these capabilities to be more impactful. Working with them will feel like researching a crime scene with Barry Allan. Somehow the awkward forensic scientist always has the answer to questions in a Flash. My advice to technology and security organizations would be to to purchase a few Copilot+ laptops and start working to understand how they can be broken into and then secured.
Understanding how a Large Language Model Thinks:
https://www.anthropic.com/research/mapping-mind-language-model
Exciting new research out from Anthropic titled “Mapping the Mind of a Large Language Model.” They’ve shown a method at scale for interpreting how the millions of neurons work together to understand and respond with concepts. This approach allows researchers to adjust how the model responds at a very granular level not previously capable by prompting alone. Examples include:
Having the model respond as if it was the Golden Gate bridge
Amped up the “sycophantic praise” so that responses were adoring and untruthful
Identifies a path for monitoring and preventing models from responding in deceiving or undesirable ways
This is critical work for reducing some of the risk of LLMs in production, especially against high risk use cases. It also has some important implications for how “prompt engineering” might evolve.
GPT-4O
https://www.technologyreview.com/2024/05/17/1092649/gpt-4o-chinese-token-polluted/
Researchers are finding evidence of poorly cleaned training data in Chinese and other languages within the latest OpenAI model. In particular it appears content advertising gambling and pornography websites formed enough of the inputs to the model to create dubious outputs. This oversight leads to bad publicity and unusable outputs, but it can also lead to pathways for jailbreaking models. Any new development that relies on gpt-4o (an otherwise fantastic model) will need to be warry of Chinese character inputs.
ChatGPT + Google Drive / Microsoft OneDrive
https://openai.com/index/improvements-to-data-analysis-in-chatgpt/
You can now directly plug ChatGPT into your cloud file sharing directory of choice. This will lead to much easier data analytics and I’m sure everything that is in those directories is safe / approved for use in AI models. Nothing to see here…
Slack
https://techcrunch.com/2024/05/17/slack-under-attack-over-sneaky-ai-training-policy/
Earlier this week, news broke that Slack was indiscriminately using user data to train AI models. A Hacker News forum post raised awareness that this was being done and companies needed to Opt Out to avoid the sensitive details of discussions on the platform becoming a part of the next model run. While the story and the need to opt out became somewhat mainstream this week, the terms allowed this training since last Fall. Add this to the growing list of reasons why privacy and third party risk practices must be integrated with company AI policies and governance.
The Smart IT Podcast - AI: Full Speed Ahead with Guardrails
https://the-smart-it-podcast.captivate.fm/episode/ai-full-speed-ahead-with-guardrails
This weeks’ guest brought a ton of insights to the podcast. I highly recommend listening in! Don’t have time? GPT-4o summarized it for me:
-
The podcast discusses the vast potential and challenges of implementing AI technology within enterprises.
-
It covers critical topics like job impacts, automating tasks, AI risks such as hallucinations and data leaks, and the necessity for governance and ethical frameworks.
-
The conversation highlights the role of security teams and the importance of building AI skillsets and framing business cases for AI investments.
-
Practical recommendations are offered, including establishing AI governance frameworks, upskilling employees, and securing funding for AI initiatives.
-
Randy Lariar from Optiv Security shares his insights on effectively communicating the value of AI initiatives to leadership and ensuring safe AI deployments.
Have a Great Weekend!
Google’s new AI Overview with searches are a bit problematic…
You can chat with the newsletter archive at **https://chat.openai.com/g/g-IjiJNup7g-thinks-and-links-digest