Big Data & Analytics - Thinks & Links | August 27, 2023

Big Data & Analytics - Thinks and Links | News and insights at the intersection of cybersecurity, data, and AI

Happy Weekend!

I’ve been on vacation for the past week, and I tried very hard to disconnect and recharge. However, I couldn’t help but think about Data and AI topics. In lieu of a full blog post this week, here’s a few tools I’ve been enjoying and some links that made me think…

ChatGPT + Code Interpreter (https://chat.openai.com/)  – This is now my go-to explanation for what the future with LLM/AI workflows looks like. The ability to do data analysis, write code, or build ML models via short prompts and file uploads continues to amaze me and anyone I share it with. For $20/month you can live in the future and try this too.

ChatGPT Mobile (https://apps.apple.com/us/app/chatgpt/id6448311069)  – Works on the beach (thoroughly tested this week) and lets you play with prompt development / learn new things / even come up with itineraries for your vacation travels (as long as you’re not looking for hot new restaurants that have been launched since 2021)

Perplexity (https://www.perplexity.ai/) – This is another cool LLM based tool that is similar to ChatGPT, but also has an agent (“copilot”) capability. When you enable that you can have it do research for you, or even write the first draft of a newsletter…

MidJourney (https://www.midjourney.com/) – This is my favorite AI image creator, where the limits are only what you can imagine to describe in the prompt. The tool is always improving, with recent additions including zoom out of an existing AI image and create variations based on a selection within the image. Helpful for vacation, the whole thing works via Discord and their mobile app is perfect for making art on your phone while sipping a cool drink at sunset.

Poe (https://poe.com/) – Another AI chatbot engine, but you can pick different models including Claude, and Llama. You can also create or subscribe to “bots” which have prompts to make them more useful. I built GPTYoda who answers all queries with backwards speak and riddles. Someone else built a MidJourney bot which helps to create detailed prompts for making more impressive images over on MidJourney.


CISA Keeps It Real: Software Must Be Secure By Design, and AI is Powerful Software

https://www.cisa.gov/news-events/news/software-must-be-secure-design-and-artificial-intelligence-no-exception

“Discussions of artificial intelligence (AI) often swirl with mysticism regarding how an AI system functions. The reality is far more simple: AI is a type of software system.”

This. If I could summarize the many briefings and emails and discussions with clients over the past year about security and AI it might be this phrase above. Security is critical for software. AI is software. AI must be secure. This article by the Cybersecurity and Infrastructure Security Agency explains why CISOs and their teams must be involved in the development of advanced AI capabilities. It includes links to further helpful resources including:

Example of Assurance and assessment

NIST AI Risk Management Framework

Great research paper: Machine Learning: The High-Interest Credit Card of Technical Debt

Tracking AI Vulnerabilities similarly to how we currently track Common Vulnerabilities and Exposures (CVEs)

Instituting Machine Learning Bill of Materials (ML-BOMs) (see also SBOMs)

Fine-tuning For All

https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates

One of the common concerns with Large Language Models underlying ChatGPT and Google Bard is that they are prone to hallucinations. These very large models were trained to be generally good at many things. When you ask them for specialist answers they may mess up. Would you trust a friend who is excellent at trivia to perform heart surgery?  Fine-tuning LLMs is a way for developers to provide domain-specific context to these models and make them more useful. This has been shown to give far better results and lower hallucination. With this release, the fine-tuning workflow is now available via an API for the most common and powerful LLM provided by OpenAI. Anyone building serious applications on top of this model should read this article and consider taking the time to fine-tune before implementing in production.

Open Source + Fine-Tuning > Closed Source

https://www.phind.com/blog/code-llama-beats-gpt4

A bit technical, but a sign of the trend here: taking open models like Meta’s Llama and applying fine-tuning appears to perform better than “closed” models like GPT-4. This points to a continuing battle LLMs seeking to find more and more incredible outcomes. For us the consumer / business leader this is great news – competition will bring more alternatives, cheaper options, and ultimately faster adoption of AI capabilities. (Bad guys are probably also watching this, doing their own development too – so all the risks are accelerating also!)

AI as ClickBait:

https://www.darkreading.com/attacks-breaches/attackers-dangle-ai-based-facebook-ad-lures-to-take-over-business-accounts

The hype over AI is probably more dangerous than the actual AI itself. Here is another example of hackers using the promise of AI superpowers to lure unsuspecting and unsecure users into their traps.

Python + Excel = 💚🐍

https://techcommunity.microsoft.com/t5/excel-blog/announcing-python-in-excel-combining-the-power-of-python-and-the/ba-p/3893439

I would argue that Excel is the most important data analysis tool. You may write code or engineer databases, but at some point those insights need to be presented to a decision maker (likely via PowerPoint or Google Slides) and the best interim step to get from big data to action is Excel.

So imagine my delight when I see python (also extremely important for data analysis) is going to be accessible directly from Excel. This will make for some exciting data analysis if done correctly. I’m looking forward to trying it out!

From the Archives: Why AI Underperforms and What Companies Can Do About It

https://hbr.org/2019/03/why-ai-underperforms-and-what-companies-can-do-about-it

Although ostensibly about AI and Data type of roles, the same message is true for cybersecurity and risk. It’s communication not code that holds companies back. This article was originally published in 2019 and holds true today, even as more and more “AI” hits the world. Safe and successful AI will rely on a culture of trust and communication. My favorite bit:

AI strategies’ fail because AI is a means, not an end. “Do you have an AI strategy?” makes as much sense as asking, “Do we have an Excel strategy?” But for companies to get past the hype and focus on the real potential that AI offers, they’ll have to start with how they communicate.

AI strategies’ fail because AI is a means, not an end. “Do you have an AI strategy?” makes as much sense as asking, “Do we have an Excel strategy?” But for companies to get past the hype and focus on the real potential that AI offers, they’ll have to start with how they communicate.

I do think having an Excel strategy sounds fun too…


Have a Great Weekend!