Generate Batch Prompt ID800
🧩 Syntax:
// This node runs once for the single batch item.
// It constructs a dynamic prompt for the LLM to analyze all files in the batch.
const item = $input.item;
const fileBatch = item.json.fileBatch;
// 1. Create a string representation of the files for the prompt.
// We give the LLM only what it needs for analysis and identification.
const filesForPrompt = fileBatch.map(f => ({
filePath: f.originalItem.filePath, // Use the real, full file path
fileExtension: f.originalItem.fileExtension,
fileContent: f.originalItem.fileContent
}));
const filesString = JSON.stringify(filesForPrompt, null, 2);
// 2. Define the revised prompt for BATCH analysis.
const batchPrompt = `
## ROLE & EXPERTISE
You are a specialized RAG systems architect with deep expertise in document preprocessing, chunking methodologies, and vector retrieval optimization. Your analysis directly impacts retrieval quality and computational efficiency in production RAG pipelines.
## ANALYSIS CONTEXT
You have been provided with a JSON array of multiple file objects. Your task is to analyze each file object within this array.
- **Pipeline Stage**: Pre-processing for vector embedding and retrieval
- **Target Use Case**: Multi-repository codebase analysis and knowledge extraction
## BATCH OF FILE OBJECTS TO ANALYZE
\`\`\`json
${filesString}
\`\`\`
## CHUNKING STRATEGY OPTIONS
### 'code'
- **Use For**: Programming languages (.py, .js, .java, .cpp, .go, .rs, etc.)
- **Method**: AST-aware recursive splitting respecting function/class boundaries
- **Optimal For**: Preserving semantic code blocks, maintaining context
### 'recursive'
- **Use For**: Structured markup (.md, .html, .xml, .rst, .tex)
- **Method**: Hierarchical splitting on headers, sections, structural elements
- **Optimal For**: Documents with clear logical structure
### 'semantic'
- **Use For**: Natural language content (.txt, documentation, prose)
- **Method**: Sentence/paragraph boundary-aware splitting
- **Optimal For**: Maintaining contextual meaning and readability
### 'do_not_chunk'
- **Use For**: Binary files, small configs, media, or files where chunking destroys utility
- **Method**: Process as single unit or skip entirely
- **Optimal For**: Preserving file integrity
## ANALYSIS FRAMEWORK
For EACH file object in the provided array, you must perform the following analysis:
1. **Content Structure Assessment**: Analyze the provided content's syntax, formatting, and logical organization.
2. **Semantic Density Evaluation**: Determine information distribution patterns in the actual content.
3. **Context Dependency Analysis**: Identify cross-reference and dependency patterns within the content.
4. **Retrieval Optimization**: Consider how chunks from this specific content will perform in vector similarity search.
5. **Content Summarization**: Create a brief, one-sentence summary of the file's primary purpose or content.
## SIZE & OVERLAP GUIDELINES
- **Code**: 800-1500 chars (preserve function scope), 150-300 overlap
- **Structured Text**: 1000-2000 chars (complete sections), 200-400 overlap
- **Prose/Documentation**: 1200-2500 chars (complete thoughts), 300-500 overlap
- **Configuration**: Assess if chunking adds value vs. whole-file processing
## OUTPUT REQUIREMENTS
Your entire response MUST be a single, valid JSON object.
This object must contain a single key: "analysisResults".
The value of "analysisResults" must be a JSON array. Each element in this array is an object corresponding to a file from the input batch. For each file, the object must have this exact structure:
\`\`\`json
{
"filePath": "[The full path of the file, copied exactly from the input]",
"contentSummary": "[A one-sentence summary of the file's purpose based on its content]",
"chunkingStrategy": "[code|recursive|semantic|do_not_chunk]",
"reasoning": "[Concise technical justification based on the actual content analysis above]",
"recommendedChunkSize": [integer|null],
"recommendedChunkOverlap": [integer|null]
}
\`\`\`
## **CRITICAL INSTRUCTIONS**
- Analyze EACH file object in the BATCH OF FILE OBJECTS provided above.
- Your output array "analysisResults" MUST contain exactly one object for each file object in the input array.
- The "filePath" in your output objects MUST EXACTLY MATCH the "filePath" from the corresponding input object.
- Base your decision on the content structure, not just the file extension.
- Prioritize retrieval quality over processing speed.
- Consider the file's role in the broader repository context.
- Your entire response must be ONLY the JSON object that strictly adheres to the schema, starting with \`{\` and ending with \`}\`. Do not include markdown, comments, or any other text outside the JSON structure.
`;
// 3. Add the generated prompt to the JSON data.
item.json.batchPrompt = batchPrompt;
// Return the modified item.
return item;Balih00
Member