๐Ÿšจ ์˜์ƒ ๊ธฐ๋ฐ˜ AI ์•ˆ์ „์„ฑ ์œ„ํ˜‘๊ณผ ํ˜์‹ ์ ์ธ ํ•ด๊ฒฐ์ฑ…: VideoSafetyBench์™€ VideoSafety-R1


๋ณธ ๊ธฐ์‚ฌ๋Š” ์˜์ƒ ๊ธฐ๋ฐ˜ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์˜ ์•ˆ์ „์„ฑ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ ์—ฐ๊ตฌ ๋…ผ๋ฌธ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ VideoSafetyBench(VSB-77k)๋ฅผ ํ†ตํ•ด ์˜์ƒ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ๊ฐ€ ์•ˆ์ „์„ฑ์„ ์ €ํ•˜์‹œํ‚จ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋ฐํžˆ๊ณ , ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ํ˜์‹ ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ VideoSafety-R1์„ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. VideoSafety-R1์€ ๊ฒฝ๊ณ  ํ† ํฐ ๊ธฐ๋ฐ˜ ๋ฏธ์„ธ ์กฐ์ •๊ณผ ์•ˆ์ „์„ฑ ๊ธฐ๋ฐ˜ ์ •์ฑ… ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ์•ˆ์ „์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผœ, AI ์•ˆ์ „์„ฑ ํ™•๋ณด์— ์ค‘์š”ํ•œ ๋ฐœ๊ฑธ์Œ์„ ๋‚ด๋”›์—ˆ์Šต๋‹ˆ๋‹ค.

related iamge

์˜์ƒ AI ์‹œ๋Œ€์˜ ์•ˆ์ „์„ฑ ์œ„ํ˜‘: ์šฐ๋ฆฌ๋Š” ์–ผ๋งˆ๋‚˜ ์ค€๋น„๋˜์–ด ์žˆ์„๊นŒ?

์ตœ๊ทผ ๊ธ‰์†๋„๋กœ ๋ฐœ์ „ํ•˜๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ ์€ ์šฐ๋ฆฌ ์‚ถ์— ๋†€๋ผ์šด ๋ณ€ํ™”๋ฅผ ๊ฐ€์ ธ๋‹ค์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ  ๋ฐœ์ „๊ณผ ๋”๋ถˆ์–ด, ํŠนํžˆ ์˜์ƒ ๊ธฐ๋ฐ˜ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ(Video LLM)๊ณผ ๊ฐ™์€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ AI์˜ ์•ˆ์ „์„ฑ์— ๋Œ€ํ•œ ์šฐ๋ ค ๋˜ํ•œ ์ปค์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. Yiwei Sun ๋“ฑ ์—ฐ๊ตฌ์ง„์ด ๋ฐœํ‘œํ•œ ๋…ผ๋ฌธ "From Evaluation to Defense: Advancing Safety in Video Large Language Models"์€ ๋ฐ”๋กœ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์˜ ์‹ฌ๊ฐ์„ฑ์„ ์ผ๊นจ์›Œ์ฃผ๋Š” ์ค‘์š”ํ•œ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.

77,646๊ฐœ์˜ ์˜์ƒ ๋ฐ์ดํ„ฐ๋กœ ํ™•์ธ๋œ ์ถฉ๊ฒฉ์ ์ธ ๊ฒฐ๊ณผ: VideoSafetyBench (VSB-77k)

์—ฐ๊ตฌ์ง„์€ VideoSafetyBench (VSB-77k) ๋ผ๋Š” ๋Œ€๊ทœ๋ชจ ์˜์ƒ ์•ˆ์ „์„ฑ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ตœ์ดˆ๋กœ ๊ณต๊ฐœํ–ˆ์Šต๋‹ˆ๋‹ค. 10๊ฐœ ์–ธ์–ด๊ถŒ์—์„œ ์ˆ˜์ง‘๋œ 77,646๊ฐœ์˜ ์˜์ƒ-์งˆ๋ฌธ ์Œ์„ ์‚ฌ์šฉํ•˜์—ฌ 19๊ฐ€์ง€ ์ฃผ์š” ์œ„ํ—˜ ๋ฒ”์ฃผ๋ฅผ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ, ๋†€๋ž๊ฒŒ๋„ ์˜์ƒ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์˜ ํ†ตํ•ฉ์€ ์•ˆ์ „์„ฑ ์„ฑ๋Šฅ์„ ํ‰๊ท  42.3%๋‚˜ ์ €ํ•˜์‹œํ‚ค๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ์‹œ์Šคํ…œ์  ์œ„ํ—˜์„ ๋ณด์—ฌ์ฃผ๋Š” ๋งค์šฐ ์šฐ๋ ค์Šค๋Ÿฌ์šด ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

ํ˜์‹ ์ ์ธ ์•ˆ์ „์„ฑ ๊ฐ•ํ™” ํ”„๋ ˆ์ž„์›Œํฌ: VideoSafety-R1

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์—ฐ๊ตฌ์ง„์€ VideoSafety-R1 ์ด๋ผ๋Š” ํ˜์‹ ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ–ˆ์Šต๋‹ˆ๋‹ค. VideoSafety-R1์€ ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ๊ธฐ์ˆ ์„ ํ†ตํ•ด ์•ˆ์ „์„ฑ์„ ๋Œ€ํญ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

  1. ๊ฒฝ๊ณ  ํ† ํฐ ๊ธฐ๋ฐ˜ ์•ˆ์ „์„ฑ ๋ฏธ์„ธ ์กฐ์ • (AT-SFT): ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฒฝ๊ณ  ํ† ํฐ์„ ์‹œ๊ฐ ๋ฐ ํ…์ŠคํŠธ ์‹œํ€€์Šค์— ์ฃผ์ž…ํ•˜์—ฌ ๋‹ค์ค‘ ์ž‘์—… ๋ชฉํ‘œ๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ์—์„œ ์œ ํ•ด์„ฑ์„ ๋ช…์‹œ์ ์œผ๋กœ ์ธ์‹ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  2. ์•ˆ์ „์„ฑ ๊ธฐ๋ฐ˜ GRPO: ์ด์ค‘ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ๊ฒ€์ฆ์—์„œ ํŒŒ์ƒ๋œ ๊ทœ์น™ ๊ธฐ๋ฐ˜ ๋ณด์ƒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋™์  ์ •์ฑ… ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด ๋ฐฉ์–ด์  ์ถ”๋ก ์„ ๊ฐ•ํ™”ํ•ฉ๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋‘ ๊ฐ€์ง€ ๊ธฐ์ˆ ์˜ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ๋ฅผ ํ†ตํ•ด ์•ˆ์ „์„ฑ ์ •๋ ฌ์„ ์ˆ˜๋™์  ์œ ํ•ด์„ฑ ์ธ์‹์—์„œ ๋Šฅ๋™์  ์ถ”๋ก ์œผ๋กœ ์ „ํ™˜ํ•˜๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. VideoSafety-R1์€ VSB-Eval-HH์—์„œ 65.1%์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, MMBench, VLGuard, FigStep๊ณผ ๊ฐ™์€ ๊ธฐ์กด ์ด๋ฏธ์ง€ ์•ˆ์ „์„ฑ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ๊ฐ๊ฐ 59.1%, 44.3%, 15.0%์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.

(์ฃผ์˜: ๋ณธ ๋…ผ๋ฌธ์—๋Š” ์œ ํ•ดํ•œ ์–ธ์–ด์™€ ์˜์ƒ์˜ ์˜ˆ์‹œ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ, ๋…์ž์˜ ์‹ ์ค‘ํ•œ ํŒ๋‹จ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.)

๊ฒฐ๋ก : ์•ˆ์ „ํ•œ AI ์‹œ๋Œ€๋ฅผ ์œ„ํ•œ ์ง€์†์ ์ธ ๋…ธ๋ ฅ

VideoSafetyBench์™€ VideoSafety-R1์˜ ๊ฐœ๋ฐœ์€ ์˜์ƒ ๊ธฐ๋ฐ˜ AI์˜ ์•ˆ์ „์„ฑ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์ค‘์š”ํ•œ ์ง„์ „์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Š” ์‹œ์ž‘์— ๋ถˆ๊ณผํ•˜๋ฉฐ, ์•ž์œผ๋กœ๋„ ์ง€์†์ ์ธ ์—ฐ๊ตฌ์™€ ๊ฐœ๋ฐœ ๋…ธ๋ ฅ์„ ํ†ตํ•ด ๋”์šฑ ์•ˆ์ „ํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” AI ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” AI ๊ธฐ์ˆ ์˜ ์œค๋ฆฌ์  ์ฑ…์ž„๊ณผ ์•ˆ์ „์„ฑ์— ๋Œ€ํ•œ ๋Š์ž„์—†๋Š” ๊ณ ๋ฏผ๊ณผ ์„ฑ์ฐฐ์„ ์ด‰๊ตฌํ•˜๋Š” ์ค‘์š”ํ•œ ๋ฉ”์‹œ์ง€๋ฅผ ์ „๋‹ฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.


*์ด ๊ธฐ์‚ฌ๋Š” AI๊ฐ€ ์ƒ์„ฑํ•œ ๋‚ด์šฉ์œผ๋กœ, ์ผ๋ถ€ ์ •๋ณด๊ฐ€ ์‹ค์ œ์™€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ •ํ™•ํ•œ ํ™•์ธ์„ ์œ„ํ•ด ์ถ”๊ฐ€์ ์ธ ๊ฒ€์ฆ์„ ๊ถŒ์žฅ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

Reference

[arxiv] From Evaluation to Defense: Advancing Safety in Video Large Language Models

Published: ย (Updated: )

Author: Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie

http://arxiv.org/abs/2505.16643v1