Claude API 가이드

Anthropic의 Claude API는 안전하고 강력한 AI 기능을 애플리케이션에 통합할 수 있는 RESTful API입니다. 이 가이드는 API 키 발급부터 고급 기능 활용까지 Claude API의 모든 것을 다룹니다.

업데이트 안내: 모델/요금/버전/정책 등 시점에 민감한 정보는 변동될 수 있습니다. 최신 내용은 공식 문서를 확인하세요.

🧭 Codex 대비 안내

Codex는 OpenAI 계정/키로 시작하며, CLI는 로그인 흐름 또는 API 키 인증을 제공합니다.

Codex 앱은 macOS 환경에서 제공됩니다. Codex 가이드 참고.

핵심 포인트

Claude 4 시리즈: Opus (최고 성능), Sonnet (균형), Haiku (속도)
Python, TypeScript SDK 및 HTTP API 지원
스트리밍, 툴 사용, 비전 등 고급 기능
프롬프트 캐싱으로 최대 90% 비용 절감
200K 토큰 컨텍스트 윈도우

Claude API 개요

모델 라인업

모델

모델 티어는 일반적으로 다음과 같이 구분됩니다:
• 고성능 티어: 복잡한 추론/분석에 적합
• 균형 티어: 성능과 비용의 균형
• 경량 티어: 빠른 응답과 대량 처리

공통 특징(상세 수치는 문서 참고):
• 긴 컨텍스트 지원
• 비전(이미지 입력) 지원
• 툴 사용 지원
• 프롬프트 캐싱 지원

주요 기능

기능

// 1. 대화형 AI
• 다중 턴 대화
• 컨텍스트 유지
• 시스템 프롬프트로 행동 제어

// 2. 비전 (Vision)
• 이미지 분석 및 설명
• 차트, 다이어그램 해석
• OCR 및 문서 이해
• 이미지 크기/형식 제한 있음 (최신 정책 확인)

// 3. 툴 사용 (Tool Use)
• 외부 함수/API 호출
• 구조화된 데이터 반환
• 다단계 툴 체이닝
• JSON 스키마 기반 정의

// 4. 스트리밍
• 실시간 응답 스트리밍
• 사용자 경험 개선
• 긴 응답에 적합

// 5. 프롬프트 캐싱
• 반복 컨텍스트 캐싱
• 반복 컨텍스트 비용 절감
• 정책/TTL은 변경 가능

시작하기

API 키 발급

절차

1. Anthropic Console 접속
   https://console.anthropic.com

2. 계정 생성 또는 로그인

3. API Keys 메뉴로 이동

4. "Create Key" 클릭

5. 키 이름 입력 (예: "production", "development")

6. 생성된 키 복사 (한 번만 표시됨)
   sk-ant-api03-...

7. 안전한 곳에 저장 (환경 변수 권장)

보안 주의사항

API 키는 절대 코드에 하드코딩하지 마세요
환경 변수나 비밀 관리 시스템 사용
Git에 커밋하지 않도록 .gitignore 설정
프론트엔드에 노출 금지 (서버에서만 사용)
주기적으로 키 교체

빠른 시작

Python

# 1. SDK 설치
$ pip install anthropic

# 2. API 키 설정
$ export ANTHROPIC_API_KEY='sk-ant-api03-...'

# 3. 첫 요청
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "안녕하세요!"}
    ]
)

print(message.content[0].text)

TypeScript

// 1. SDK 설치
$ npm install @anthropic-ai/sdk

// 2. 첫 요청
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const message = await client.messages.create({
  model: 'claude-20260101',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: '안녕하세요!' }
  ],
});

console.log(message.content[0].text);

Messages API

기본 사용법

Python

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

message = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    system="당신은 친절한 AI 어시스턴트입니다.",  # 선택적
    messages=[
        {
            "role": "user",
            "content": "Python으로 피보나치 수열을 구하는 함수를 작성해줘."
        }
    ],
    temperature=1.0,  # 0.0 ~ 1.0, 기본 1.0
    top_p=0.9,       # 선택적
    top_k=40,        # 선택적
)

# 응답 출력
print(message.content[0].text)

# 응답 메타데이터
print(f"Model: {message.model}")
print(f"Stop reason: {message.stop_reason}")  # end_turn, max_tokens, tool_use
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

다중 턴 대화

Python

import anthropic

client = anthropic.Anthropic()

# 대화 히스토리 관리
conversation = [
    {"role": "user", "content": "내 이름은 철수야."},
]

# 첫 번째 요청
message1 = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    messages=conversation
)

print("Claude:", message1.content[0].text)

# 대화에 Claude의 응답 추가
conversation.append({
    "role": "assistant",
    "content": message1.content[0].text
})

# 사용자 메시지 추가
conversation.append({
    "role": "user",
    "content": "내 이름이 뭐라고 했지?"
})

# 두 번째 요청
message2 = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    messages=conversation
)

print("Claude:", message2.content[0].text)  # "철수"라고 기억함

시스템 프롬프트

Python

# 시스템 프롬프트로 역할 정의
message = client.messages.create(
    model="claude-20260101",
    max_tokens=2048,
    system="""당신은 시니어 백엔드 개발자입니다.

규칙:
1. Python과 FastAPI에 전문성을 가지고 있습니다
2. 코드는 PEP 8 스타일을 따릅니다
3. 타입 힌트를 반드시 포함합니다
4. 보안과 성능을 우선시합니다
5. 항상 에러 핸들링을 포함합니다""",
    messages=[
        {
            "role": "user",
            "content": "FastAPI로 사용자 인증 엔드포인트를 만들어줘."
        }
    ]
)

print(message.content[0].text)

스트리밍

Python 스트리밍

Python

import anthropic

client = anthropic.Anthropic()

# stream=True로 스트리밍 활성화
with client.messages.stream(
    model="claude-20260101",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "긴 이야기를 들려줘."}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # 줄바꿈

# 최종 메시지 객체 얻기
message = stream.get_final_message()
print(f"\n\nTotal tokens: {message.usage.input_tokens + message.usage.output_tokens}")

고급 스트리밍

Python

import anthropic

client = anthropic.Anthropic()

# 이벤트별 처리
with client.messages.stream(
    model="claude-20260101",
    max_tokens=1024,
    messages=[{"role": "user", "content": "안녕하세요!"}],
) as stream:
    for event in stream:
        if event.type == "message_start":
            print(f"메시지 시작: {event.message.model}")

        elif event.type == "content_block_start":
            print("콘텐츠 블록 시작")

        elif event.type == "content_block_delta":
            if hasattr(event.delta, 'text'):
                print(event.delta.text, end="", flush=True)

        elif event.type == "content_block_stop":
            print("\n콘텐츠 블록 종료")

        elif event.type == "message_delta":
            print(f"\nStop reason: {event.delta.stop_reason}")

        elif event.type == "message_stop":
            print("메시지 종료")

TypeScript 스트리밍

TypeScript

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const stream = await client.messages.stream({
  model: 'claude-20260101',
  max_tokens: 1024,
  messages: [
    { role: 'user', content: '긴 이야기를 들려줘.' }
  ],
});

// 텍스트 스트림
for await (const text of stream.textStream) {
  process.stdout.write(text);
}

// 최종 메시지
const message = await stream.finalMessage();
console.log(`\n\nTotal tokens: ${message.usage.input_tokens + message.usage.output_tokens}`);

툴 사용 (Function Calling)

툴 정의

Python

import anthropic
import json

client = anthropic.Anthropic()

# 툴 정의
tools = [
    {
        "name": "get_weather",
        "description": "특정 위치의 현재 날씨를 가져옵니다.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "도시 이름, 예: Seoul"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "온도 단위"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "calculate",
        "description": "수학 계산을 수행합니다.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "계산식, 예: 2 + 2"
                }
            },
            "required": ["expression"]
        }
    }
]

# 첫 요청
message = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "서울의 현재 날씨는 어때?"}
    ]
)

print("Response:", json.dumps(message.model_dump(), indent=2, ensure_ascii=False))

툴 실행 루프

Python

import anthropic

client = anthropic.Anthropic()

# 실제 툴 구현
def get_weather(location: str, unit: str = "celsius") -> str:
    # 실제로는 API 호출
    return json.dumps({
        "location": location,
        "temperature": 22,
        "unit": unit,
        "condition": "맑음"
    })

def calculate(expression: str) -> str:
    try:
        result = eval(expression)  # 실제로는 안전한 파서 사용
        return str(result)
    except Exception as e:
        return f"Error: {e}"

# 툴 맵
tool_functions = {
    "get_weather": get_weather,
    "calculate": calculate,
}

# 대화 시작
messages = [
    {"role": "user", "content": "서울 날씨를 알려주고, 화씨로 변환해줘."}
]

while True:
    # Claude 호출
    response = client.messages.create(
        model="claude-20260101",
        max_tokens=4096,
        tools=tools,
        messages=messages
    )

    print(f"\nStop reason: {response.stop_reason}")

    # 종료 조건
    if response.stop_reason == "end_turn":
        print(f"\nFinal answer: {response.content[0].text}")
        break

    # 툴 사용 요청 처리
    if response.stop_reason == "tool_use":
        # Assistant 응답 추가
        messages.append({
            "role": "assistant",
            "content": response.content
        })

        # 툴 실행
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_name = block.name
                tool_input = block.input

                print(f"\nCalling tool: {tool_name}({tool_input})")

                # 툴 실행
                tool_function = tool_functions[tool_name]
                tool_result = tool_function(**tool_input)

                print(f"Tool result: {tool_result}")

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": tool_result
                })

        # 툴 결과 추가
        messages.append({
            "role": "user",
            "content": tool_results
        })

비전 (이미지 입력)

이미지 분석

Python

import anthropic
import base64

client = anthropic.Anthropic()

# 이미지 파일 읽기
with open("diagram.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

# 이미지와 함께 요청
message = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",  # image/jpeg, image/gif, image/webp
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "이 다이어그램을 설명해주세요."
                }
            ],
        }
    ],
)

print(message.content[0].text)

여러 이미지

Python

# 여러 이미지 비교
def load_image(path: str) -> str:
    with open(path, "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-20260101",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "이 두 이미지의 차이점을 설명해주세요:"},
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": load_image("before.jpg"),
                    },
                },
                {"type": "text", "text": "vs"},
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": load_image("after.jpg"),
                    },
                },
            ],
        }
    ],
)

print(message.content[0].text)

프롬프트 캐싱

캐싱 기본

개념

프롬프트 캐싱이란?
• 반복적으로 사용되는 컨텍스트(시스템 프롬프트, 문서 등)를 캐싱
• 캐시된 토큰은 입력 비용의 10%만 청구 (90% 절감)
• 5분 TTL, 사용 시마다 자동 갱신
• 최소 1024 토큰부터 캐싱 가능

비용 예시 (Claude 4 Sonnet):
일반 입력: 변동 / 1M tokens
캐시 읽기: 변동 / 1M tokens (90% 할인)
캐시 쓰기: 변동 / 1M tokens (25% 추가 비용)

언제 사용?
✓ 긴 시스템 프롬프트
✓ 큰 문서 분석 (여러 질문)
✓ 코드베이스 분석
✓ 반복적인 대화 (5분 이내)

캐싱 사용법

Python

import anthropic

client = anthropic.Anthropic()

# cache_control을 사용하여 캐싱할 블록 지정
response = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "당신은 Python 전문가입니다. PEP 8 스타일을 따르고...",  # 긴 시스템 프롬프트
            "cache_control": {"type": "ephemeral"}  # 캐싱 활성화
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "FastAPI 라우터를 작성해줘."
        }
    ]
)

# 캐시 사용량 확인
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")

# 5분 이내 두 번째 요청 (캐시 히트)
response2 = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "당신은 Python 전문가입니다. PEP 8 스타일을 따르고...",  # 동일한 텍스트
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "이제 Pydantic 모델을 작성해줘."
        }
    ]
)

# 캐시 히트 확인
print(f"\n두 번째 요청:")
print(f"Cache read tokens: {response2.usage.cache_read_input_tokens}")  # 캐시된 토큰 수

문서 캐싱

Python

# 긴 문서를 캐싱하여 여러 질문
with open("long_document.txt", "r") as f:
    document = f.read()  # 예: 50,000 토큰

# 첫 번째 질문 (캐시 생성)
response1 = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"다음 문서를 기반으로 답변해주세요:\n\n{document}",
                    "cache_control": {"type": "ephemeral"}
                },
                {
                    "type": "text",
                    "text": "주요 내용을 요약해주세요."
                }
            ]
        }
    ]
)

print(response1.content[0].text)
print(f"\n캐시 생성: {response1.usage.cache_creation_input_tokens} tokens")

# 두 번째 질문 (캐시 재사용)
response2 = client.messages.create(
    model="claude-20260101",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"다음 문서를 기반으로 답변해주세요:\n\n{document}",
                    "cache_control": {"type": "ephemeral"}
                },
                {
                    "type": "text",
                    "text": "주요 등장인물은 누구인가요?"
                }
            ]
        }
    ]
)

print(response2.content[0].text)
print(f"\n캐시 읽기: {response2.usage.cache_read_input_tokens} tokens")

# 비용 절감 계산
normal_cost = 50000 * 3.00 / 1_000_000  # 변동
cached_cost = 50000 * 0.30 / 1_000_000  # 변동
print(f"\n비용 절감: ${normal_cost - cached_cost:.4f} (90%)")

HTTP API 직접 호출

cURL 예시

bash

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-20260101",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "안녕하세요!"
      }
    ]
  }'

응답 구조

JSON

{
  "id": "msg_01XYZ...",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "안녕하세요! 무엇을 도와드릴까요?"
    }
  ],
  "model": "claude-20260101",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 18
  }
}

비용 최적화

최적화 전략

전략

// 1. 올바른 모델 선택
간단한 작업 → Haiku (변동/변동)
일반 작업 → Sonnet (변동/변동)
복잡한 작업 → Opus (변동/변동)

// 2. 프롬프트 캐싱
• 긴 시스템 프롬프트 캐싱
• 대용량 문서 캐싱
• 반복적인 컨텍스트 캐싱
→ 90% 비용 절감

// 3. max_tokens 제한
• 필요한 만큼만 요청
• 출력이 입력보다 5배 비쌈
• 기본값 사용 지양

// 4. 불필요한 토큰 제거
• 장황한 시스템 프롬프트 축약
• 중복 컨텍스트 제거
• XML 태그보다 간결한 구분자

// 5. 스트리밍
• 비용은 동일하지만 사용자 경험 개선
• 조기 종료 가능

// 6. 배치 처리
• 여러 작업을 하나의 요청으로 통합
• 단, 실패 시 전체 재시도 필요

비용 계산기

Python

class CostCalculator:
    PRICING = {
        "claude-": (15.0, 75.0),
        "claude-": (3.0, 15.0),
        "claude-": (0.25, 1.25),
    }

    def calculate(self, model: str, input_tokens: int, output_tokens: int,
                  cached_tokens: int = 0) -> float:
        """비용 계산 (USD)"""
        input_price, output_price = self.PRICING[model]

        # 일반 입력 토큰
        normal_input = input_tokens - cached_tokens
        input_cost = normal_input * input_price / 1_000_000

        # 캐시된 토큰 (90% 할인)
        cache_cost = cached_tokens * (input_price * 0.1) / 1_000_000

        # 출력 토큰
        output_cost = output_tokens * output_price / 1_000_000

        return input_cost + cache_cost + output_cost

    def compare_models(self, input_tokens: int, output_tokens: int):
        """모델별 비용 비교"""
        print(f"입력: {input_tokens} 토큰, 출력: {output_tokens} 토큰\n")

        for model in self.PRICING:
            cost = self.calculate(model, input_tokens, output_tokens)
            print(f"{model:20} ${cost:.6f}")

# 사용 예시
calc = CostCalculator()

# 일반적인 요청 (2K input, 500 output)
calc.compare_models(input_tokens=2000, output_tokens=500)

# 출력:
# claude-         변동
# claude-       변동
# claude-        변동

모범 사례

일반 권장사항

모범 사례

// 1. 에러 처리
✓ 항상 try-except 사용
✓ 재시도 로직 구현 (지수 백오프)
✓ Rate Limit 에러 처리 (429)
✓ 타임아웃 설정

// 2. 보안
✓ API 키 환경 변수 관리
✓ 서버 사이드에서만 호출
✓ HTTPS 필수
✓ 로그에 민감 정보 제외

// 3. 성능
✓ 스트리밍으로 UX 개선
✓ 프롬프트 캐싱 적극 활용
✓ 적절한 max_tokens 설정
✓ 비동기 처리 (asyncio)

// 4. 비용
✓ 작업에 맞는 모델 선택
✓ 토큰 사용량 모니터링
✓ 불필요한 컨텍스트 제거
✓ 캐싱 가능한 부분 식별

// 5. 품질
✓ 명확한 시스템 프롬프트
✓ Few-shot 예시 제공
✓ 출력 포맷 명시
✓ 다양한 입력으로 테스트

핵심 정리

Claude API 가이드의 핵심 개념과 흐름을 정리합니다.
Claude API 개요를 단계별로 이해합니다.
실전 적용 시 기준과 주의점을 확인합니다.

실무 팁

입력/출력 예시를 고정해 재현성을 확보하세요.
Claude API 가이드 범위를 작게 잡고 단계적으로 확장하세요.
Claude API 개요 조건을 문서화해 대응 시간을 줄이세요.