跳转到主要内容

category

屏蔽


实验掩蔽解析器和转换器是一个可扩展的模块,用于掩蔽和重新水合字符串。该模块的主要用例之一是在调用llm之前,从字符串中编辑PII(个人标识信息)。


真实世界场景​


客户支持系统接收包含敏感客户信息的消息。系统必须解析这些消息,屏蔽任何PII(如姓名、电子邮件地址和电话号码),并在遵守隐私法规的同时将其记录下来进行分析。在记录转录本之前,将使用llm生成摘要。
开始​


基本示例​


使用RegexMaskingTransformer为电子邮件和电话创建一个简单的掩码。

TIP

See this section for general instructions on installing integration packages.

  • npm
  • Yarn
  • pnpm
npm install @langchain/openai

import {
  MaskingParser,
  RegexMaskingTransformer,
} from "langchain/experimental/masking";

// Define masking strategy
const emailMask = () => `[email-${Math.random().toString(16).slice(2)}]`;
const phoneMask = () => `[phone-${Math.random().toString(16).slice(2)}]`;

// Configure pii transformer
const piiMaskingTransformer = new RegexMaskingTransformer({
  email: { regex: /\S+@\S+\.\S+/g, mask: emailMask },
  phone: { regex: /\d{3}-\d{3}-\d{4}/g, mask: phoneMask },
});

const maskingParser = new MaskingParser({
  transformers: [piiMaskingTransformer],
});
maskingParser.addTransformer(piiMaskingTransformer);

const input =
  "Contact me at jane.doe@email.com or 555-123-4567. Also reach me at john.smith@email.com";
const masked = await maskingParser.mask(input);

console.log(masked);
// Contact me at [email-a31e486e324f6] or [phone-da8fc1584f224]. Also reach me at [email-d5b6237633d95]

const rehydrated = await maskingParser.rehydrate(masked);
console.log(rehydrated);
// Contact me at jane.doe@email.com or 555-123-4567. Also reach me at john.smith@email.com

API Reference:

NOTE

如果计划存储掩蔽状态以异步重新水合原始值,请确保遵循最佳安全实践。在大多数情况下,您将希望定义一个自定义哈希和盐析策略。

Next.js stream

示例nextjs聊天端点利用RegexMaskingTransformer。每次使用聊天负载调用api时,都会屏蔽当前聊天消息和聊天消息历史记录。

// app/api/chat

import {
  MaskingParser,
  RegexMaskingTransformer,
} from "langchain/experimental/masking";
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { BytesOutputParser } from "@langchain/core/output_parsers";

export const runtime = "edge";

// Function to format chat messages for consistency
const formatMessage = (message: any) => `${message.role}: ${message.content}`;

const CUSTOMER_SUPPORT = `You are a customer support summarizer agent. Always include masked PII in your response.
  Current conversation:
  {chat_history}
  User: {input}
  AI:`;

// Configure Masking Parser
const maskingParser = new MaskingParser();
// Define transformations for masking emails and phone numbers using regular expressions
const piiMaskingTransformer = new RegexMaskingTransformer({
  email: { regex: /\S+@\S+\.\S+/g }, // If a regex is provided without a mask we fallback to a simple default hashing function
  phone: { regex: /\d{3}-\d{3}-\d{4}/g },
});

maskingParser.addTransformer(piiMaskingTransformer);

export async function POST(req: Request) {
  try {
    const body = await req.json();
    const messages = body.messages ?? [];
    const formattedPreviousMessages = messages.slice(0, -1).map(formatMessage);
    const currentMessageContent = messages[messages.length - 1].content; // Extract the content of the last message
    // Mask sensitive information in the current message
    const guardedMessageContent = await maskingParser.mask(
      currentMessageContent
    );
    // Mask sensitive information in the chat history
    const guardedHistory = await maskingParser.mask(
      formattedPreviousMessages.join("\n")
    );

    const prompt = PromptTemplate.fromTemplate(CUSTOMER_SUPPORT);
    const model = new ChatOpenAI({ temperature: 0.8 });
    // Initialize an output parser that handles serialization and byte-encoding for streaming
    const outputParser = new BytesOutputParser();
    const chain = prompt.pipe(model).pipe(outputParser); // Chain the prompt, model, and output parser together

    console.log("[GUARDED INPUT]", guardedMessageContent); // Contact me at -1157967895 or -1626926859.
    console.log("[GUARDED HISTORY]", guardedHistory); // user: Contact me at -1157967895 or -1626926859. assistant: Thank you for providing your contact information.
    console.log("[STATE]", maskingParser.getState()); // { '-1157967895' => 'jane.doe@email.com', '-1626926859' => '555-123-4567'}

    // Stream the AI response based on the masked chat history and current message
    const stream = await chain.stream({
      chat_history: guardedHistory,
      input: guardedMessageContent,
    });

    return new Response(stream, {
      headers: { "content-type": "text/plain; charset=utf-8" },
    });
  } catch (e: any) {
    return new Response(JSON.stringify({ error: e.message }), {
      status: 500,
      headers: {
        "content-type": "application/json",
      },
    });
  }
}

API Reference:

Kitchen sink

import {
  MaskingParser,
  RegexMaskingTransformer,
} from "langchain/experimental/masking";

// A simple hash function for demonstration purposes
function simpleHash(input: string): string {
  let hash = 0;
  for (let i = 0; i < input.length; i += 1) {
    const char = input.charCodeAt(i);
    hash = (hash << 5) - hash + char;
    hash |= 0; // Convert to 32bit integer
  }
  return hash.toString(16);
}

const emailMask = (match: string) => `[email-${simpleHash(match)}]`;
const phoneMask = (match: string) => `[phone-${simpleHash(match)}]`;
const nameMask = (match: string) => `[name-${simpleHash(match)}]`;
const ssnMask = (match: string) => `[ssn-${simpleHash(match)}]`;
const creditCardMask = (match: string) => `[creditcard-${simpleHash(match)}]`;
const passportMask = (match: string) => `[passport-${simpleHash(match)}]`;
const licenseMask = (match: string) => `[license-${simpleHash(match)}]`;
const addressMask = (match: string) => `[address-${simpleHash(match)}]`;
const dobMask = (match: string) => `[dob-${simpleHash(match)}]`;
const bankAccountMask = (match: string) => `[bankaccount-${simpleHash(match)}]`;

// Regular expressions for different types of PII
const patterns = {
  email: { regex: /\S+@\S+\.\S+/g, mask: emailMask },
  phone: { regex: /\b\d{3}-\d{3}-\d{4}\b/g, mask: phoneMask },
  name: { regex: /\b[A-Z][a-z]+ [A-Z][a-z]+\b/g, mask: nameMask },
  ssn: { regex: /\b\d{3}-\d{2}-\d{4}\b/g, mask: ssnMask },
  creditCard: { regex: /\b(?:\d{4}[ -]?){3}\d{4}\b/g, mask: creditCardMask },
  passport: { regex: /(?i)\b[A-Z]{1,2}\d{6,9}\b/g, mask: passportMask },
  license: { regex: /(?i)\b[A-Z]{1,2}\d{6,8}\b/g, mask: licenseMask },
  address: {
    regex: /\b\d{1,5}\s[A-Z][a-z]+(?:\s[A-Z][a-z]+)\*\b/g,
    mask: addressMask,
  },
  dob: { regex: /\b\d{4}-\d{2}-\d{2}\b/g, mask: dobMask },
  bankAccount: { regex: /\b\d{8,17}\b/g, mask: bankAccountMask },
};

// Create a RegexMaskingTransformer with multiple patterns
const piiMaskingTransformer = new RegexMaskingTransformer(patterns);

// Hooks for different stages of masking and rehydrating
const onMaskingStart = (message: string) =>
  console.log(`Starting to mask message: ${message}`);
const onMaskingEnd = (maskedMessage: string) =>
  console.log(`Masked message: ${maskedMessage}`);
const onRehydratingStart = (message: string) =>
  console.log(`Starting to rehydrate message: ${message}`);
const onRehydratingEnd = (rehydratedMessage: string) =>
  console.log(`Rehydrated message: ${rehydratedMessage}`);

// Initialize MaskingParser with the transformer and hooks
const maskingParser = new MaskingParser({
  transformers: [piiMaskingTransformer],
  onMaskingStart,
  onMaskingEnd,
  onRehydratingStart,
  onRehydratingEnd,
});

// Example message containing multiple types of PII
const message =
  "Contact Jane Doe at jane.doe@email.com or 555-123-4567. Her SSN is 123-45-6789 and her credit card number is 1234-5678-9012-3456. Passport number: AB1234567, Driver's License: X1234567, Address: 123 Main St, Date of Birth: 1990-01-01, Bank Account: 12345678901234567.";

// Mask and rehydrate the message
maskingParser
  .mask(message)
  .then((maskedMessage: string) => {
    console.log(`Masked message: ${maskedMessage}`);
    return maskingParser.rehydrate(maskedMessage);
  })
  .then((rehydratedMessage: string) => {
    console.log(`Final rehydrated message: ${rehydratedMessage}`);
  });

API Reference: