AI & ML interests

LLM, Agents, Quality, Security, Benchmarking

Recent Activity

pierljย  updated a dataset 24 days ago
giskardai/phare
pierljย  updated a dataset 25 days ago
giskardai/phare
alexcombessieย  updated a Space 5 months ago
giskardai/README
View all activity

davidberenstein1957ย 
posted an update 19 days ago
alexcombessieย 
updated a Space 5 months ago
pierljย 
in giskardai/realharm 5 months ago

RealHarm

#2 opened 5 months ago by
PhunvVi
davidberenstein1957ย 
posted an update 6 months ago
mattbitย 

Update README.md

#2 opened 6 months ago by
davidberenstein1957
davidberenstein1957ย 
posted an update 6 months ago
view post
Post
402
๐Ÿšจ LLMs recognise bias but also reproduce harmful stereotypes: an analysis of bias in leading LLMs

I've written a new entry in our series on the Giskard, BPIFrance and Google Deepmind Phare benchmark(phare.giskard.ai).

This time it covers bias: https://huggingface.co/blog/davidberenstein1957/llms-recognise-bias-but-also-produce-stereotypes

Previous entry on hallucinations: https://huggingface.co/blog/davidberenstein1957/phare-analysis-of-hallucination-in-leading-llms
  • 1 reply
ยท
davidberenstein1957ย 
posted an update 7 months ago
davidberenstein1957ย 
posted an update 8 months ago
julien-cย 
posted an update 8 months ago
view post
Post
8598
BOOOOM: Today I'm dropping TINY AGENTS

the 50 lines of code Agent in Javascript ๐Ÿ”ฅ

I spent the last few weeks working on this, so I hope you will like it.

I've been diving into MCP (Model Context Protocol) to understand what the hype was all about.

It is fairly simple, but still quite powerful: MCP is a standard API to expose sets of Tools that can be hooked to LLMs.

But while doing that, came my second realization:

Once you have a MCP Client, an Agent is literally just a while loop on top of it. ๐Ÿคฏ

โžก๏ธ read it exclusively on the official HF blog: https://huggingface.co/blog/tiny-agents
  • 1 reply
ยท
davidberenstein1957ย 
posted an update 9 months ago
view post
Post
2271
๐Ÿ”ฅ Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!

Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.

Blog: https://huggingface.co/blog/PrunaAI/flux-fastest-image-generation-endpoint
davidberenstein1957ย 
posted an update 9 months ago
davidberenstein1957ย 
posted an update 9 months ago
view post
Post
1409
RealHarm: A Collection of Real-World Language Model Application Failure

I'm David from Giskard, and we work on securing your Agents.
Today, we are launching RealHarm: a dataset of real-world problematic interactions with AI agents, drawn from publicly reported incidents.

Check out the dataset and paper: https://realharm.giskard.ai/