|
8548
|
FirefoxFileEoitViewHistoryBookmarksProfilesToolsWi FirefoxFileEoitViewHistoryBookmarksProfilesToolsWindow Helpgithub.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8~ Google Gemini...l1 OpenJy 19798 evaluation for ai activity types #468All commitsPlatform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ+ New labnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v src/eval/eval-ai-activity-type.py SQ Filter files...• E eval_output/ai_activity_type/eval…..> • ground_truth>I inoutE src/evalE endpoints.pyE eval-ai-activity-type.py#.gitignore293294299are there soem issueswith this PRShow thinking vYes, there are several issues and code smellsin this Pull Request, ranging from performancebottlenecks to architectural anti-patterns.Here is a breakdown of the most notableissues, categorized by impact:Performance & Async Anti-Patterns• Sequential Requests Defeating Asyncluines ch-0onsioe_collect_ from_es, the code loopsthrough raw_samples and await sasession.post request for each sampleone by one. This means the script waitsfor one HTTP request to finish beforestarting the next, completely defeatingthe purpose of using aiohttp forasynchronous concurrency.• Fix: Collect these coroutines into alist and use asyncio.gather()(ideally wrapped in anasyncio. Semaphore to avoidoverwhelming the endpoint) to runthe replayability probes concurrently.• Blocking File I/O in Async Functions(Lines 110-114, 184-188, 268-269):Standard synchronous file operations(with open(...), json. dump,json.load ) are used inside async def• Enter a prompt for Gemini+Pro vYour Jiminny chats aren't used to improve our models. Gemini is Alane can make mistakes, Inciualne about peodleYour privacy & GeminiSummarize page328329330 +→0 • Daily - Platform • nowA100% CD8 • Tue 14 Apr 9:51:28• 2 1 203 viewed• Awaltind aporoval+330 UUUUu• Viewedparser = create_parser(AiActivityTypeEvaluator)parser.aad arcument"_-data-set",lyoe-sulndetault="initial",dest="data_set",nelo= Vataser Toler uncer eval outvurral acuvity tviereval catasets, etault.initial)",parser.aad arcument"--populate-ground-truth-1lm",action="store_true",dest="populate_ground_truth_llm",parser.add araument"--ground-truth-llm-endpoint",type=str,default="OPENAI: gpt-5_2",dest="ground_truth_llm_endpoint",parser.add_argument("--compare-to-ground-truth",type=str,choices=["es", "1lm"],default="es",dest="compare_to_ground_truth",aros = parser.parse arasAiActivityTypeEvaluator.DATA_SET = args.data_setA1Activitylypeevaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_Lm)AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpointAiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truthasyncio.run(run_evaluation(AiActivityTypeEvaluator, args))> .gitignore & ++4 0000• Viewed...You made it to the end!...
|
NULL
|
NULL
|
NULL
|
8548
|
|
8547
|
FirefoxFileEoitViewHistoryBookmarksProfilesToolsWi FirefoxFileEoitViewHistoryBookmarksProfilesToolsWindow Helpgithub.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8~ Google Gemini...l1 OpenJy 19798 evaluation for ai activity types #468All commitsPlatform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ+ New labnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v src/eval/eval-ai-activity-type.py SQ Filter files...• E eval_output/ai_activity_type/eval…..> • ground_truth>I inoutE src/eval* endpoints.pyE eval-ai-activity-type.pyE .gitignore293294299are there soem issueswith this PRShow thinking vYes, there are several issues and code smellsin this Pull Request, ranging from performancebottlenecks to architectural anti-patterns.Here is a breakdown of the most notableissues, categorized by impact:Performance & Async Anti-Patterns• Sequential Requests Defeating AsyncTuines ch-06nsioe_collect_ from_es, the code loopsthrough raw_samples and await sasession.post request for each sampleone by one. This means the script waitsfor one HTTP request to finish beforestarting the next, completely defeatingthe purpose of using aiohttp forasynchronous concurrency.• Fix: Collect these coroutines into alist and use asyncio.gather()(ideally wrapped in anasyncio. Semaphore to avoidoverwhelming the endpoint) to runthe replayability probes concurrently.• Blocking File I/O in Async Functions(Lines 110-114, 184-188, 268-269):Standard synchronous file operations(with open(...), json. dump,json.load ) are used inside async def• Enter a prompt for Gemini+Pro vYour Jiminny chats aren't used to improve our models. Gemini is Alane can make mistakes, Inciualne aoout peodleYour privacy & GeminiSummarize page329330 +‹-0ll • | Daily - Platform • nowA100% CD8 • Tue 14 Apr 9:51:26• 2 1 203 viewed• Awaltind aporoval+330 UUUUu• Viewedparser = create_parser(AiActivityTypeEvaluator)parser.aad arcument"_-data-set",lyoe-sutndetault="initial",dest="data_set",nelo= Vataser Toler uncer eval outvurral acuvity tviereval catasets, etault.initial)",parser.aad arcument"--populate-ground-truth-1lm",action="store_true",dest="populate_ground_truth_llm",parser.add araument"--ground-truth-llm-endpoint",type=str,default="OPENAI: gpt-5_2",dest="ground_truth_llm_endpoint",parser.add_argument("--compare-to-ground-truth",type=str,choices=["es", "1lm"],default="es",dest="compare_to_ground_truth",aros = parser.parse arasAiActivityTypeEvaluator.DATA_SET = args.data_setA1Activitylypeevaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_Lm)AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpointAiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truthasyncio.run(run_evaluation(AiActivityTypeEvaluator, args))> .gitignore & ++4 0000• ViewedYou made it to the end!...
|
NULL
|
NULL
|
NULL
|
8547
|
|
8546
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
AI Chat settings
Close
WORK, Google Account: [EMAIL]
Main menu
New Chat
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected. Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
You said
I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected.
Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
Expand
Listen
Show more options
Gemini said
Gemini said
This Python script defines the
AiActivityTypeEvaluator
class, a specialized tool for evaluating AI-driven activity type classification. It extends a base evaluation framework to handle data collection, ground-truth comparison, and performance scoring.
Core Functionality
Core Functionality
Data Collection (
collect_test_set
):
* Supports fetching raw samples from
Elasticsearch (ES)
or loading them from
local JSON files
.
Implements a
replayability check
: It probes the live API endpoint to ensure the activity exists before including it in the dataset.
Filters data based on request body size, language, and the presence of an original response.
Ground Truth Management:
Saves inputs and "gold standard" responses to a local directory structure (
eval_output/ai_activity_type/...
).
Features an optional mode to populate ground truth using a high-order LLM (e.g.,
gpt-5_2
) if the
POPULATE_GROUND_TRUTH_LLM
flag is set.
Evaluation Logic (
evaluate_against_ground_truth
):
Compares the model's predicted
ai_activity_type
against either the original ES response or the LLM-generated ground truth.
Assigns a
quality_score
of
1.0
for a match and
0.0
for a mismatch or request failure.
Key Configuration Constants
Key Configuration Constants
Constant
Purpose
ENDPOINT_PATH
The specific API route being tested:
call/ai-activity-type
.
COMPARE_TO_GROUND_TRUTH
Determines if the baseline is the original data (
es
) or an
llm
.
SKIP_UNREPLAYABLE_ON_COLLECT
If
True
, discards records where the API returns "Cannot find activity."
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Limits the probe request to 20 seconds.
Constant
ENDPOINT_PATH
COMPARE_TO_GROUND_TRUTH
SKIP_UNREPLAYABLE_ON_COLLECT
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Purpose
The specific API route being tested:
call/ai-activity-type
.
Determines if the baseline is the original data (
es
) or an
llm
.
If
True
, discards records where the API returns "Cannot find activity."
Limits the probe request to 20 seconds.
Copy table
Execution Workflow
Execution Workflow
Parsing:
The script uses
create_parser
to handle CLI arguments like
--data-set
,
--populate-ground-truth-llm
, and
--compare-to-ground-truth
.
Initialization:
Sets up directory paths for inputs and ground truths based on the selected dataset name.
Run:
Executes the evaluation via
asyncio.run(run_evaluation(...))
, which orchestrates the collection, endpoint calling, and result scoring phases.
Copy prompt
Edit
You said are there soem issues with this PR
You said
are there soem issues with this PR
Listen
Show more options
Show thinking
Gemini said
Gemini said
Yes, there are several issues and code smells in this Pull Request, ranging from performance bottlenecks to architectural anti-patterns.
Here is a breakdown of the most notable issues, categorized by impact:
Performance & Async Anti-Patterns
Performance & Async Anti-Patterns
Sequential Requests Defeating Async (Lines 86-106):
Inside
_collect_from_es
, the code loops through
raw_samples
and
await
s a
session.post
request for each sample one by one. This means the script waits for one HTTP request to finish before starting the next, completely defeating the purpose of using
aiohttp
for asynchronous concurrency.
Fix:
Collect these coroutines into a list and use
asyncio.gather()
(ideally wrapped in an
asyncio.Semaphore
to avoid overwhelming the endpoint) to run the replayability probes concurrently.
Blocking File I/O in Async Functions (Lines 110-114, 184-188, 268-269):
Standard synchronous file operations (
with open(...)
,
json.dump
,
json.load
) are used inside
async def
functions. In Python, this blocks the entire event loop while the disk is reading/writing.
Fix:
Use an asynchronous file I/O library like
aiofiles
, or run the blocking I/O operations in a separate thread using
asyncio.to_thread()
.
Architecture & State Management
Architecture & State Management
Mutating Class-Level Variables (Lines 323-328):
The script parses CLI arguments and directly mutates the class attributes (e.g.,
AiActivityTypeEvaluator.DATA_SET = args.data_set
). This is a major anti-pattern. If you ever try to run tests in parallel or create multiple instances of this evaluator in a single process, the state will bleed across them.
Fix:
Pass these arguments into the
__init__
constructor of the instance (
self.data_set = args.data_set
) rather than modifying the class definition globally.
Code Quality & Edge Cases
Code Quality & Edge Cases
Broad Exception Catching (Line 107):
Using a bare
except Exception:
during the replayability probe is dangerous. It will catch
aiohttp.ClientError
(which is intended), but it will also silently swallow
TimeoutError
or even memory errors, making debugging incredibly difficult if the endpoint goes down.
Fix:
Scope the exception to specific expected errors, like
except (aiohttp.ClientError, asyncio.TimeoutError):
.
Inefficient Size Calculation (Line 95):
len(json.dumps(request_body)) < min_size
requires serializing the entire dictionary to a string
just
to check its length. Since
min_size
defaults to
0
, the
if min_size
short-circuit saves it for now, but if anyone sets a minimum size, this will become an unnecessary CPU bottleneck on large datasets.
Potential Path Traversal Risk (Lines 110, 113, 251):
The script writes and reads files directly using...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-d github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8...
|
8546
|
|
8545
|
FirefoxFileEoitViewHistoryBookmarksProfilesToolsWi FirefoxFileEoitViewHistoryBookmarksProfilesToolsWindow Helpgithub.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8~ Google Gemini...l1 OpenJy 19798 evaluation for ai activity types #468All commitsPlatform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ+ New labnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v src/eval/eval-ai-activity-type.py SQ Filter files...• E eval_output/ai_activity_type/eval…..> • ground_truth>I inoutE src/evalE endpoints.pyE eval-ai-activity-type.py#.gitignore293294299are there soem issueswith this PRShow thinking vYes, there are several issues and code smellsin this Pull Request, ranging from performancebottlenecks to architectural anti-patterns.Here is a breakdown of the most notableissues, categorized by impact:Performance & Async Anti-Patterns• Sequential Requests Defeating Asyncluines ch-06t nsiee_collect_ from_es, the code loopsthrough raw_samples and await sasession.post request for each sampleone by one. This means the script waitsfor one HTTP request to finish beforestarting the next, completely defeatingthe purpose of using aiohttp forasynchronous concurrency.• Fix: Collect these coroutines into alist and use asyncio.gather()(ideally wrapped in anasyncio. Semaphore to avoidoverwhelming the endpoint) to runthe replayability probes concurrently.• Blocking File I/O in Async Functions(Lines 110-114, 184-188, 268-269):Standard synchronous file operations(with open(...), json. dump,json.load ) are used inside async def• Enter a prompt for Gemini+ProYour Jiminny chats aren't used to improve our models. Gemini is Alane can make mistakes, Inciualne aoout peodleYour privacy & GeminiSummarize page329330 Đ‹-0ll • | Daily - Platform • nowA100% CD8 • Tue 14 Apr 9:51:25• 2 1 203 viewed• Awaltind aporoval+330 UUUUu• Viewedparser = create_parser(AiActivityTypeEvaluator)parser.aad arcument"_-data-set",lyoe-sutndetault="initial",dest="data_set",nelo= Vataser Toler uncer eval outvurral acuvity tviereval catasets, etault.initial)",parser.aad arcument"--populate-ground-truth-1lm",action="store_true",dest="populate_ground_truth_llm",parser.add araument"--ground-truth-llm-endpoint",type=str,default="OPENAI: gpt-5_2",dest="ground_truth_llm_endpoint",parser.add_argument("--compare-to-ground-truth",type=str,choices=["es", "1lm"],default="es",dest="compare_to_ground_truth",aros = parser.parse arasAiActivityTypeEvaluator.DATA_SET = args.data_setA1Activitylypeevaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_Lm)AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpointAiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truthasyncio.run(run_evaluation(AiActivityTypeEvaluator, args))> .gitignore & ++4 0000• ViewedYou made it to the end!...
|
NULL
|
NULL
|
NULL
|
8545
|
|
8544
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
AI Chat settings
Close
WORK, Google Account: [EMAIL]
Main menu
New Chat
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected. Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
You said
I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected.
Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
Expand
Listen
Show more options
Gemini said
Gemini said
This Python script defines the
AiActivityTypeEvaluator
class, a specialized tool for evaluating AI-driven activity type classification. It extends a base evaluation framework to handle data collection, ground-truth comparison, and performance scoring.
Core Functionality
Core Functionality
Data Collection (
collect_test_set
):
* Supports fetching raw samples from
Elasticsearch (ES)
or loading them from
local JSON files
.
Implements a
replayability check
: It probes the live API endpoint to ensure the activity exists before including it in the dataset.
Filters data based on request body size, language, and the presence of an original response.
Ground Truth Management:
Saves inputs and "gold standard" responses to a local directory structure (
eval_output/ai_activity_type/...
).
Features an optional mode to populate ground truth using a high-order LLM (e.g.,
gpt-5_2
) if the
POPULATE_GROUND_TRUTH_LLM
flag is set.
Evaluation Logic (
evaluate_against_ground_truth
):
Compares the model's predicted
ai_activity_type
against either the original ES response or the LLM-generated ground truth.
Assigns a
quality_score
of
1.0
for a match and
0.0
for a mismatch or request failure.
Key Configuration Constants
Key Configuration Constants
Constant
Purpose
ENDPOINT_PATH
The specific API route being tested:
call/ai-activity-type
.
COMPARE_TO_GROUND_TRUTH
Determines if the baseline is the original data (
es
) or an
llm
.
SKIP_UNREPLAYABLE_ON_COLLECT
If
True
, discards records where the API returns "Cannot find activity."
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Limits the probe request to 20 seconds.
Constant
ENDPOINT_PATH
COMPARE_TO_GROUND_TRUTH
SKIP_UNREPLAYABLE_ON_COLLECT
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Purpose
The specific API route being tested:
call/ai-activity-type
.
Determines if the baseline is the original data (
es
) or an
llm
.
If
True
, discards records where the API returns "Cannot find activity."
Limits the probe request to 20 seconds.
Copy table
Execution Workflow
Execution Workflow
Parsing:
The script uses
create_parser
to handle CLI arguments like
--data-set
,
--populate-ground-truth-llm
, and
--compare-to-ground-truth
.
Initialization:
Sets up directory paths for inputs and ground truths based on the selected dataset name.
Run:
Executes the evaluation via
asyncio.run(run_evaluation(...))
, which orchestrates the collection, endpoint calling, and result scoring phases.
Copy prompt
Edit
You said are there soem issues with this PR
You said
are there soem issues with this PR
Show more options
Show thinking
Gemini said
Gemini said
Yes,
there are several issues and code smells in this Pull Request,
ranging from performance bottlenecks to architectural anti-patterns.
Here is a breakdown of the most notable issues,
categorized by impact:
Performance & Async Anti-Patterns
Performance & Async Anti-Patterns
Sequential Requests Defeating Async (Lines 86-106):
Inside
_collect_from_es
,
the code loops through
raw_samples
and
await
s a
session.post
request for each sample one by one.
This means the script waits for one HTTP request to finish before starting the next,
completely defeating the purpose of using
aiohttp
for asynchronous concurrency.
Fix:
Collect these coroutines into a list and use
asyncio.gather()
(ideally wrapped in an
asyncio.Semaphore
to avoid overwhelming the endpoint) to run the replayability probes concurrently.
Blocking File I/O in Async Functions (Lines 110-114, 184-188, 268-269):
Standard synchronous file operations (
with open(...)
,
json.dump
,
json.load
) are used inside
async def
functions.
In Python,
this blocks the entire event loop while the disk is reading/writing.
Fix:
Use an asynchronous file I/O library like
aiofiles
,
or run the blocking I/O operations in a separate thread using
asyncio.to_thread()...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-d github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8...
|
8544
|
|
8543
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
AI Chat settings
Close
WORK, Google Account: [EMAIL]
Main menu
New Chat
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected. Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
You said
I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected.
Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
Expand
Listen
Show more options
Gemini said
Gemini said
This Python script defines the
AiActivityTypeEvaluator
class, a specialized tool for evaluating AI-driven activity type classification. It extends a base evaluation framework to handle data collection, ground-truth comparison, and performance scoring.
Core Functionality
Core Functionality
Data Collection (
collect_test_set
):
* Supports fetching raw samples from
Elasticsearch (ES)
or loading them from
local JSON files
.
Implements a
replayability check
: It probes the live API endpoint to ensure the activity exists before including it in the dataset.
Filters data based on request body size, language, and the presence of an original response.
Ground Truth Management:
Saves inputs and "gold standard" responses to a local directory structure (
eval_output/ai_activity_type/...
).
Features an optional mode to populate ground truth using a high-order LLM (e.g.,
gpt-5_2
) if the
POPULATE_GROUND_TRUTH_LLM
flag is set.
Evaluation Logic (
evaluate_against_ground_truth
):
Compares the model's predicted
ai_activity_type
against either the original ES response or the LLM-generated ground truth.
Assigns a
quality_score
of
1.0
for a match and
0.0
for a mismatch or request failure.
Key Configuration Constants
Key Configuration Constants
Constant
Purpose
ENDPOINT_PATH
The specific API route being tested:
call/ai-activity-type
.
COMPARE_TO_GROUND_TRUTH
Determines if the baseline is the original data (
es
) or an
llm
.
SKIP_UNREPLAYABLE_ON_COLLECT
If
True
, discards records where the API returns "Cannot find activity."
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Limits the probe request to 20 seconds.
Constant
ENDPOINT_PATH
COMPARE_TO_GROUND_TRUTH
SKIP_UNREPLAYABLE_ON_COLLECT
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Purpose
The specific API route being tested:
call/ai-activity-type
.
Determines if the baseline is the original data (
es
) or an
llm
.
If
True
, discards records where the API returns "Cannot find activity."
Limits the probe request to 20 seconds.
Copy table
Execution Workflow
Execution Workflow
Parsing:
The script uses
create_parser
to handle CLI arguments like
--data-set
,
--populate-ground-truth-llm
, and
--compare-to-ground-truth
.
Initialization:
Sets up directory paths for inputs and ground truths based on the selected dataset name.
Run:
Executes the evaluation via
asyncio.run(run_evaluation(...))
, which orchestrates the collection, endpoint calling, and result scoring phases.
Copy prompt
Edit
You said are there soem issues with this PR
You said
are there soem issues with this PR
Show more options
Show thinking
Gemini said
Gemini said
Yes,
there are several issues and code smells in this Pull Request,
ranging from performance bottlenecks to architectural anti-patterns.
Here is a breakdown of the most notable issues,
categorized by impact:
Performance & Async Anti-Patterns
Performance & Async Anti-Patterns
Sequential Requests Defeating Async (Lines 86-106):
Inside
_collect_from_es
,
the code loops through
raw_samples
and
await
s a
session.post
request for each sample one by one.
This means the script waits for one HTTP request to finish before starting the next,
completely defeating the purpose of using
aiohttp
for asynchronous concurrency.
Fix:
Collect these coroutines into a list and use...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-d github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8...
|
8543
|
|
8542
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Close Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
AI Chat settings
Close
WORK, Google Account: [EMAIL]
Main menu
New Chat
Open menu for conversation actions.
Conversation with Gemini
Conversation with Gemini
Copy prompt
You said I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected. Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
You said
I’m on page “<tabTitle>Jy 19798 evaluation for ai activity types by nikol</tabTitle>” with “<selection>@@ -0,0 +1,330 @@1+#!/usr/bin/env python32+import asyncio3+import json4+import os5+from pathlib import Path6+from typing import Any, Dict, List7+8+import aiohttp9+10+from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest11+from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse12+from src.eval.abstract_evaluator import AbstractEvaluator13+from src.eval.collectors import collect_test_set as es_collect_test_set14+from src.eval.common_evaluator_runner import create_parser, run_evaluation15+from src.eval.endpoints import run_endpoint as run_endpoint_http16+from src.eval.models import EvaluationResult17+18+19+class AiActivityTypeEvaluator(AbstractEvaluator):20+ENDPOINT_PATH = "call/ai-activity-type"21+REQUEST_MODEL = AiActivityTypeRequest22+RESPONSE_MODEL = AiActivityTypeResponse23+OUTPUT_SUBFOLDER = "ai_activity_type"24+USES_LLM_EVALUATION = False25+26+READS_FROM_DATASET_FOLDER = True27+28+DATA_SET: str = "initial"29+30+MINIMUM_REQUEST_BODY_SIZE: int = 031+32+SKIP_UNREPLAYABLE_ON_COLLECT: bool = True33+REPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.034+35+POPULATE_GROUND_TRUTH_LLM: bool = False36+GROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"37+38+COMPARE_TO_GROUND_TRUTH: str = "es"39+40+def __init__(self, **kwargs: Any) -> None:41+super().__init__(**kwargs)42+_repo_root = Path(__file__).parents[2]43+_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"44+_dataset_dir = _datasets_root / self.__class__.DATA_SET45+self._dataset_dir = _dataset_dir46+self._inputs_dir = _dataset_dir / "input"47+self._ground_truths_dir = _dataset_dir / "ground_truth"48+49+async def collect_test_set(self) -> List[Dict[str, Any]]:50+if self.es_host_name:51+return await self._collect_from_es()52+return await self._collect_from_files()53+54+async def _collect_from_es(self) -> List[Dict[str, Any]]:55+if self._inputs_dir.exists() or self._ground_truths_dir.exists():56+raise FileExistsError(57+f"Dataset folder already exists: {self._inputs_dir.parent}. "58+"Choose a different --data-set name or remove the folder manually."59+ )60+61+raw_samples = await es_collect_test_set(62+es_host_name=self.es_host_name,63+es_port=self.es_port,64+endpoint_path=self.ENDPOINT_PATH,65+limit=self.limit,66+language_filter=self.language_filter,67+min_prompt_length=self.min_prompt_length,68+is_json_response=self.IS_JSON_RESPONSE,69+exclude_languages=self.exclude_languages,70+ )71+72+self._inputs_dir.mkdir(parents=True, exist_ok=True)73+self._ground_truths_dir.mkdir(parents=True, exist_ok=True)74+75+min_size = self.MINIMUM_REQUEST_BODY_SIZE76+samples: List[Dict[str, Any]] = []77+skipped_no_response = 078+skipped_too_small = 079+skipped_unreplayable = 080+skipped_probe_errors = 081+82+endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"83+84+timeout = aiohttp.ClientTimeout(total=self.REPLAYABILITY_CHECK_TIMEOUT_SECONDS)85+async with aiohttp.ClientSession(timeout=timeout) as session:86+for sample in raw_samples:87+es_id = sample["id"]88+request_body = sample["request_body"]89+original_response = sample.get("original_response")90+91+if not original_response:92+skipped_no_response += 193+continue94+95+if min_size and len(json.dumps(request_body)) < min_size:96+skipped_too_small += 197+continue98+99+if self.SKIP_UNREPLAYABLE_ON_COLLECT:100+try:101+async with session.post(endpoint_url, json=request_body) as resp:102+if resp.status != 200:103+body_text = await resp.text()104+if "Cannot find activity" in body_text:105+skipped_unreplayable += 1106+continue107+except Exception:108+skipped_probe_errors += 1109+110+with open(self._inputs_dir / f"{es_id}.json", "w") as f:111+json.dump(request_body, f, indent=2)112+113+with open(self._ground_truths_dir / f"gt_{es_id}.json", "w") as f:114+json.dump(original_response, f, indent=2)115+116+samples.append(117+ {118+"id": es_id,119+"request_body": request_body,120+"original_response": original_response,121+"ground_truth": original_response,122+ }123+ )124+125+print(f"Wrote {len(samples)} samples to {self._inputs_dir.parent}")126+if skipped_no_response:127+print(f"Skipped {skipped_no_response} records with no response body")128+if skipped_too_small:129+print(130+f"Skipped {skipped_too_small} records whose request body was smaller "131+f"than {min_size} characters"132+ )133+if skipped_unreplayable:134+print(f"Skipped {skipped_unreplayable} records that are not replayable (missing activity in ES)")135+if skipped_probe_errors:136+print(f"Warning: replayability probe errored for {skipped_probe_errors} records (kept those records)")137+138+if self.__class__.POPULATE_GROUND_TRUTH_LLM:139+label_results = await run_endpoint_http(140+test_set=samples,141+endpoint_url=endpoint_url,142+request_model=self.REQUEST_MODEL,143+force_llm_endpoint=self.__class__.GROUND_TRUTH_LLM_ENDPOINT,144+is_streaming_response=self.IS_STREAMING_RESPONSE,145+timeout_seconds=300.0,146+parallel_requests=self.parallel_requests,147+return_prompt_in_run=self.RETURN_PROMPT_IN_RUN,148+ )149+150+for i, r in enumerate(label_results):151+forced = self.__class__.GROUND_TRUTH_LLM_ENDPOINT152+model = forced.split(":", 1)[1] if ":" in forced else forced153+samples[i]["ground_truth_llm_model"] = model154+155+if not r.get("success", True):156+samples[i]["ground_truth_llm"] = None157+samples[i]["ground_truth_llm_reasoning"] = None158+samples[i]["ground_truth_llm_error_type"] = r.get("error_type")159+samples[i]["ground_truth_llm_error_message"] = r.get("error_message")160+continue161+162+resp = r.get("new_response") or {}163+samples[i]["ground_truth_llm"] = resp.get("ai_activity_type")164+samples[i]["ground_truth_llm_reasoning"] = resp.get("ai_activity_type_reasoning")165+166+return samples167+168+async def _collect_from_files(self) -> List[Dict[str, Any]]:169+if not self._inputs_dir.exists():170+raise FileNotFoundError(171+f"Dataset folder not found: {self._inputs_dir}. "172+"Run --steps collect first or check your --data-set name."173+ )174+175+samples: List[Dict[str, Any]] = []176+for filename in sorted(os.listdir(self._inputs_dir)):177+input_path = self._inputs_dir / filename178+gt_path = self._ground_truths_dir / f"gt_{filename}"179+180+if not gt_path.exists():181+print(f"Warning: ground truth not found for {filename}, skipping")182+continue183+184+with open(input_path, "r") as f:185+request_body: Dict[str, Any] = json.load(f)186+187+with open(gt_path, "r") as f:188+ground_truth: Dict[str, Any] = json.load(f)189+190+samples.append(191+ {192+"id": Path(filename).stem,193+"request_body": request_body,194+"original_response": None,195+"ground_truth": ground_truth,196+ }197+ )198+199+print(f"Loaded {len(samples)} samples from {self._inputs_dir.parent}")200+return samples201+202+async def evaluate_against_ground_truth(self, results: List[Dict[str, Any]]) -> List[EvaluationResult]:203+evaluation_results: List[EvaluationResult] = []204+compare_to = self.__class__.COMPARE_TO_GROUND_TRUTH205+206+for res in results:207+req_id = res["request_id"]208+is_success = res.get("success", True)209+210+if not is_success:211+evaluation_results.append(212+EvaluationResult(213+request_id=req_id,214+request_body=res["request_body"],215+new_response=res.get("new_response"),216+valid_schema=False,217+quality_score=0.0,218+quality_feedback=f"Request failed: {res.get('error_type')} - {res.get('error_message')}",219+response_time=res.get("response_time"),220+success=False,221+error_type=res.get("error_type"),222+error_message=res.get("error_message"),223+ground_truth_match=False,224+ )225+ )226+continue227+228+pred = (res.get("new_response") or {}).get("ai_activity_type")229+230+if compare_to == "llm":231+if "ground_truth_llm" not in res:232+evaluation_results.append(233+EvaluationResult(234+request_id=req_id,235+request_body=res["request_body"],236+new_response=res.get("new_response"),237+valid_schema=res.get("valid_schema", True),238+quality_score=0.0,239+quality_feedback=(240+"Missing ground_truth_llm in run results. Ensure collect step populated it and "241+"endpoints.py passes it through."242+ ),243+response_time=res.get("response_time"),244+success=True,245+ground_truth_match=False,246+ )247+ )248+continue249+gt_label = res.get("ground_truth_llm")250+else:251+gt_path = self._ground_truths_dir / f"gt_{req_id}.json"252+if not gt_path.exists():253+evaluation_results.append(254+EvaluationResult(255+request_id=req_id,256+request_body=res["request_body"],257+new_response=res.get("new_response"),258+valid_schema=res.get("valid_schema", True),259+quality_score=0.0,260+quality_feedback=f"Ground truth file not found: {gt_path}",261+response_time=res.get("response_time"),262+success=True,263+ground_truth_match=False,264+ )265+ )266+continue267+268+with open(gt_path, "r") as f:269+ground_truth_dict: Dict[str, Any] = json.load(f)270+271+gt_label = (ground_truth_dict or {}).get("ai_activity_type")272+273+matched = pred == gt_label274+275+evaluation_results.append(276+EvaluationResult(277+request_id=req_id,278+request_body=res["request_body"],279+new_response=res.get("new_response"),280+valid_schema=res.get("valid_schema", True),281+quality_score=1.0 if matched else 0.0,282+quality_feedback="PASS" if matched else f"FAIL\nExpected: {gt_label}\nGot: {pred}",283+response_time=res.get("response_time"),284+success=True,285+ground_truth_match=matched,286+ )287+ )288+289+return evaluation_results290+291+292+if __name__ == "__main__":293+parser = create_parser(AiActivityTypeEvaluator)294+parser.add_argument(295+"--data-set",296+type=str,297+default="initial",298+dest="data_set",299+help="Dataset folder under eval_output/ai_activity_type/eval_datasets/ (default: initial)",300+ )301+302+parser.add_argument(303+"--populate-ground-truth-llm",304+action="store_true",305+dest="populate_ground_truth_llm",306+ )307+parser.add_argument(308+"--ground-truth-llm-endpoint",309+type=str,310+default="OPENAI:gpt-5_2",311+dest="ground_truth_llm_endpoint",312+ )313+314+parser.add_argument(315+"--compare-to-ground-truth",316+type=str,317+choices=["es", "llm"],318+default="es",319+dest="compare_to_ground_truth",320+ )321+322+args = parser.parse_args()323+AiActivityTypeEvaluator.DATA_SET = args.data_set324+325+AiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)326+AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpoint327+328+AiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truth329+330+asyncio.run(run_evaluation(AiActivityTypeEvaluator, args))</selection>” selected.
Please summarize the selection using precise and concise language. Use headers and bulleted lists in the summary, to make it scannable. Maintain the meaning and factual accuracy.
Expand
Listen
Show more options
Gemini said
Gemini said
This Python script defines the
AiActivityTypeEvaluator
class, a specialized tool for evaluating AI-driven activity type classification. It extends a base evaluation framework to handle data collection, ground-truth comparison, and performance scoring.
Core Functionality
Core Functionality
Data Collection (
collect_test_set
):
* Supports fetching raw samples from
Elasticsearch (ES)
or loading them from
local JSON files
.
Implements a
replayability check
: It probes the live API endpoint to ensure the activity exists before including it in the dataset.
Filters data based on request body size, language, and the presence of an original response.
Ground Truth Management:
Saves inputs and "gold standard" responses to a local directory structure (
eval_output/ai_activity_type/...
).
Features an optional mode to populate ground truth using a high-order LLM (e.g.,
gpt-5_2
) if the
POPULATE_GROUND_TRUTH_LLM
flag is set.
Evaluation Logic (
evaluate_against_ground_truth
):
Compares the model's predicted
ai_activity_type
against either the original ES response or the LLM-generated ground truth.
Assigns a
quality_score
of
1.0
for a match and
0.0
for a mismatch or request failure.
Key Configuration Constants
Key Configuration Constants
Constant
Purpose
ENDPOINT_PATH
The specific API route being tested:
call/ai-activity-type
.
COMPARE_TO_GROUND_TRUTH
Determines if the baseline is the original data (
es
) or an
llm
.
SKIP_UNREPLAYABLE_ON_COLLECT
If
True
, discards records where the API returns "Cannot find activity."
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Limits the probe request to 20 seconds.
Constant
ENDPOINT_PATH
COMPARE_TO_GROUND_TRUTH
SKIP_UNREPLAYABLE_ON_COLLECT
REPLAYABILITY_CHECK_TIMEOUT_SECONDS
Purpose
The specific API route being tested:
call/ai-activity-type
.
Determines if the baseline is the original data (
es
) or an
llm
.
If
True
, discards records where the API returns "Cannot find activity."
Limits the probe request to 20 seconds.
Copy table
Execution Workflow
Execution Workflow
Parsing:
The script uses
create_parser
to handle CLI arguments like
--data-set
,
--populate-ground-truth-llm
, and
--compare-to-ground-truth
.
Initialization:
Sets up directory paths for inputs and ground truths based on the selected dataset name.
Run:
Executes the evaluation via
asyncio.run(run_evaluation(...))
, which orchestrates the collection, endpoint calling, and result scoring phases.
Copy prompt
Edit
You said are there soem issues with this PR
You said
are there soem issues with this PR
Show more options
Analyzing Implementation
Gemini said
Gemini said
Enter a prompt for Gemini
encrypted
Enter a prompt for Gemini
encrypted
Open upload file menu
Tools
Open mode picker
Pro
Stop response
Your Jiminny chats aren’t used to improve our models. Gemini is AI and can make mistakes, including about people.
Your privacy & Gemini Opens in a new window
Your privacy & Gemini
Opens in a new window
Summarize page
Summarize page
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-d github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8...
|
8542
|
|
8541
|
DialFilelEditViewTabsBookmarksHistoryExtensionsWin DialFilelEditViewTabsBookmarksHistoryExtensionsWindowHelplall| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)IEB)nbox (2) - nilC 3y 19798 eval(l Service-Desk xjr Foet Awesonx• Cost | Claude x@ Actions secrC C Dependathttps://github.comfjminny/prophet/actions/runs/24195639151/ob/70624720594?pr=468# AWS|E3 SSHEl Datados Clude3 CircieciD Sentry0 Tasksiminny / prophet&(> Code17 Pull requests• Actions85 Projectsa WildlSecurity and quality 20 z Insights ® Settings• Jy 19798 evaluation for ai activity types #1918|SummaryiA XOON|© Run Flake8 & MypyКи оcиilÔ Usage& Workflow fie> Annotations1 error and 1 warningRun Flake8 & Mypyfalled 5 days apo in 57s> O Setupjob> © Checkout code|> • Set up Python |> © Install Poetry|> • Load cached Poetry dependencies |> • Install dependencies using Poetry |Run Haked.1 •Run poetry run flakeß, -count -select-E9, F63,F7,F82,N293, N291,N292 -shov-source -statistics |/src/eval/eval-at-actavity-type.py:3301631 keyk no dorithe at cho of Taleasyncio.run(run_evaluation(AlActIvityTypeEvaluater, args))Error: Process conpleted with exit code 1.• Run Mypy |Post Load cached Poetry deoendençies.• Post Set up Python|• Actions sec:• Dependabot: xO Far Dependa:с:|Insights & Coachin.• DevDux.Q Type L] to search88Tue 14 Apr 9:500 Al BookmarksStefka StoyanovaRe-runjobs -Q Search logs|125Steliyan GeorgievKovalik9:50 AM | Daily - Platform5:238 • Tue 14 Apr 9:50:47Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8541
|
|
8540
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niE:1Q 8•Tue 14 Apr 9:50C 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x©Actions secrC C DependatActions sec• Dependabot: xHa Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468B AWSE3 SSHс *El Datados * Clude5 CircieclD Sentry8 TaskslInsights & Coachin.• Dev• Ux.0 Al BookmarksRun Flake8 & Mypy•, Search logs ISummary|Run Flake8 & MypyRun details© UsageWorkfiow fileStefka StoyanovaHDSet up Python> O Install Poetry|> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: N292 no newline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python|› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesn25san GeorgievKovalik9:50 AM | Daily - Platform5:148 • Tue 14 Apr 9:50:38Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8540
|
|
8539
|
firefoxFileEoitViewHistoryBookmarksProfilesToolsWi firefoxFileEoitViewHistoryBookmarksProfilesToolsWindow Helpgithub.com/liminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab/dd0782d4a4d93075ca5ce05903d77e8Google Gemini...Platform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ+ New labim on page"<tab Itle>Jv19798evalarion tor allactivity types bynikol</tabTitle>"...Jy 19798 evaluation for ai activity tvpes #468i OperAll commits-nikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types L• Filter files...src/eval/endpoints.pyEv = eval_ output/ai_activity_type/eval...v src/eval/eval-ai-activity-type.py L> • ground_truth..•aa-0.0 +1.330 ad>I inoutF src/eval= enapolnts.py+ eval-ai-activity-type.pyE .gitignoreDaily - Platform • now• 2 1 203 viewed|Awallind aporovali100%• Tue 14 Apr 9:50:3645 nannOI• Viewed+330 UUUUuVlewedInis Pytnon seript derines theAlActivltylypetvaluator class, aspecialized tool tor evaluating Al-arivenactivity type classification. It extends a baseevaluation namework to nanole datacollection, ground-truth comparison, andperformance scoringCore Functionality0 Enter a oromot for GeminiPro vYour Jiminny chats aren't used to improve our models. Gemini is Alane can make mistakes, Inciualne aoout peodleYour orivacy & GeminSummarise page+ #!/usr/bin/env python3+ 1mport asynclo+ import json+ import os+ trom pathlib import Path+ from typing import Any, Dict, List+ import aiohttp+ from src.component.call_processing.schemes.ai activity type_request importALACELVLEV voekequest11+ from src.component.call_processing.schemes.ai activity type_response importA1ACCIVICYIypekesponse+ trom src.eval.abstract_evaluator import Abstractevaluator+ from src.eval.collectors import collect test set as es collect test set+ trom src.eval.common_evaluator_runner import create_parser, run_evaluation+ from src.eval.endpoints import run endpoint as run endpoint http+ trom src,eval,models 1moort Evaluatzonkesult+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPOINT PATH = "call/ai-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESrUNst MoDEL = ALACCIviLyIvpekesponse23ouTPUT_sUBFOLDER = "a1_activity_type"USES LLM EVALUATION = FalseREADS_FROM_DATASET_ FOLDER = TrueDATA SET: str = "initial"MINIMUM_REQUEST BODY_ SIZE: int = 0SKLP_UNREPLAYABLE_ON_COLLECT: bool = TrueREPLAYABILITY CHECK TIMEOUT SECONDS: float = 20.0rurULAlL onuuNu Ikuin LlM. Dool = IalseGROUND TRUTH LIM ENDPOINT: str = "OPENAL:apt=5 2"LUMPAKE TU GKUUND IKuin: Str=esdeт 1nic selt, **kwaros. Any -> None:super().__init__(**kwargs)_repo_root = Path(_file_).parents (2]_datasets_root = _repo_root / "eval_output" / selt.OUTPUT_SUBFOLDER /"eval datasets"_dataset_dir = _datasets_root / selt.__class__.DATA_SE!self._dataset_dir =_dataset_dirselt._inputs_air =_dataset_air / "Input"self. around truths dir= datacet dir "around truth'l...
|
NULL
|
NULL
|
NULL
|
8539
|
|
8538
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niC 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x© Actions secrC C Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468B AWSE3 SSHEl Datados * Clude5 CircieclD Sentry8 TaskslActions secQ 8•Tue 14 Apr 9:50• Dependabot: xHa Dependatс *Insights & Coachin.• Dev• Ux0 Al BookmarksE:1Summary|Run Flake8 & MypyRun details© UsageWorkfiow fileRun Flake8 & Mypy•, Search logs IStefka StoyanovaHDSet up Python> O Install Poetry|> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics•/src/eval/eval-ai-activity-type.py:330:63: N292 no nevline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python|› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesnDo not share my personal information)25s7san GeorgievKovalik9:50 AM | Daily - Platform5:118 • Tue 14 Apr 9:50:35Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8538
|
|
8537
|
FirefoxFileEoitViewHistoryBookmarksProfilesToolsWi FirefoxFileEoitViewHistoryBookmarksProfilesToolsWindowHelpgithub.com/liminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab/dd0782d4a4d93075ca5ce05903d77e8Google GeminiJy 19798 evaluation for ai activity tvpes #468Platform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ+ New lab8? OperAll commits-nikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types Lsrc/eval/endpoints.pyE• Filter files...v = eval_ output/ai_activity_type/eval...v src/eval/eval-ai-activity-type.py L> • ground_truth..•aa-0.0 +1.330 ad>I inoutv F src/eval#endpoints.py+ eval-ai-activity-type.pyE .gitignoreDaily - Platform • now100%• Tue 14 Apr 9:50:33• 2 | 203 viewed|Awallind aporovali45 nannOI• Viewed+330 UUUUuVlewed+ #!/usr/bin/env python3+ 1mport asynclo+ import json+ import os+ trom pathlib import Path+ from typing import Any, Dict, List+ import aiohttp+ from src.component.call_processing.schemes.ai activity type_request importALACELVLEV voekequest11+ from src.component.call_processing.schemes.ai activity type_response importA1ACCIVICYIypekesponse+ trom src.eval.abstract_evaluator import Abstractevaluator+ from src.eval.collectors import collect test set as es collect test set+ trom src.eval.common_evaluator_runner import create_parser, run_evaluation+ from src.eval.endpoints import run endpoint as run endpoint http+ trom src,eval,models 1moort Evaluatzonkesult23Ask GeminiC A ESummarise page+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPOINT PATH = "call/ai-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESrUNst MoDEL = ALACCIviLyIvpekesponseouTPUT_sUBFOLDER = "a1_activity_type"USES LLM EVALUATION = FalseREADS_FROM_DATASET_ FOLDER = TrueDATA SET: str = "initial"MINIMUM_REQUEST BODY_SIZE: int = 0SKLP_UNREPLAYABLE_ON_COLLECT: bool = TrueREPLAYABILITY CHECK TIMEOUT SECONDS: float = 20.0rurULAlL onuuNu Ikuin LlM. Dool = IalseGROUND TRUTH LIM ENDPOINT: str = "OPENAL:apt=5 2"LUMPAKE TU GKUUND IKuin: Str=esdeт 1nic selt, **kwaros. Any -> None:super().__init__(**kwargs)_repo_root = Path(_file_).parents (2]_datasets_root = _repo_root / "eval_output" / selt.OUTPUT_SUBFOLDER /"eval datasets"_dataset_dir = _datasets_root / selt.__class_.DATA_SElself._dataset_dir =_dataset_dirselt._inputs_air =_dataset_air / "Input"self. around truths dir= datacet dir "around truth'l...
|
NULL
|
NULL
|
NULL
|
8537
|
|
8536
|
FirefoxFileEoitViewHistoryBookmarksProfilesToolsWi FirefoxFileEoitViewHistoryBookmarksProfilesToolsWindow Helpgithub.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8~ Google Gemini...l1 OpenJy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types LPlatform Sprint 1 Q2 - Platform TeaGeminiPROsrc/eval/endpoints.pyEQ Filter files...( Jy 19798 evaluation for ai activ+ New lab• E eval_output/ai_activity_type/eval.….> • ground_truth>I inoutv src/eval/eval-ai-activity-type.py..•0a -0,0 +1,330 ggE src/evalE endpoints.pyE eval-ai-activity-type.py#.gitignore1 Daily - Platform • nowA• 2 1 203 viewed|• Awallind aporovall100%8 • Tue 14 Apr 9:50:3145 П0OOOT• Viewed+330 UuuuuVlewedHi LukasWhere should westart?Create image* Create musicBoost my dayHelp me learnWrite anythingO Enter a prompt for Gemini+New]summarise pagePro v+ #!/usr/bin/env python3+ 1mport asynclo+ import json+ import os+ from pathlib import Path+ from typing import Any, Dict, List+ import aiohttp+ from src.component.call_processing.schemes.ai activity type_request importALACELVLEV voekequest11+ from src.component.call_processing.schemes.ai_activity_type_response importA1ACCIVICYIypekesponse+ from src.eval.abstract_evaluator import AbstractEvaluator+ from src.eval.collectors import collect test set as es collect test set+ from src.eval.common_evaluator_runner import create_parser, run_evaluation+ from src.eval.endpoints import run endpoint as run endpoint http+ trom src,eval,models 1moort Evaluatzonkesult23+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPOINT_PATH = "call/ai-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESrUNst MoDEL = ALACCIviLyIvpekesponseOUTPUT_SUBFOLDER = "ai_activity_type"USES_LLM_EVALUATION = FalseREADS_FROM_DATASET_FOLDER = TrueDATA_SET: str = "initial"MINIMUM_REQUEST_BODY_SIZE: int = 0SKIP_UNREPLAYABLE_ON_COLLECT: bool = TrueREPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.0rurULAlL onuuNu Ikuin LlM. Dool = IalseGROUND TRUTH LIM ENDPOINT: str = "OPENAL:apt=5 2"LUMPAKE TO GKUUND TKUin: Str=esdeт 1nic selt, **kwaros. Any -> None:super()._init_(**kwargs)_repo_root = Path(_file_).parents (2]_datasets_root =_repo_root / "eval_output" / self.OUTPUT_SUBFOLDER /"eval datasets"_dataset_dir = _datasets_root / self._class_.DATA_SETself._dataset_dir =_dataset_dirselt._inputs_air = _dataset_dar / "input"self. around truths dir = dataset dir f "around +ruth"...
|
NULL
|
NULL
|
NULL
|
8536
|
|
8535
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niE:1Q 8•Tue 14 Apr 9:50C 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x©Actions secrC C DependatActions sec• Dependabot: xHa Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468B AWSE3 SSHс *El Datados * Clude5 CircieclD Sentry8 TaskslInsights & Coachin.• Dev• Ux.0 Al BookmarksRun Flake8 & Mypy•, Search logs ISummary|Run Flake8 & MypyRun details© UsageWorkfiow fileStefka StoyanovalHDSet up Python> O Install Poetry|> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: N292 no newline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python |› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesn25s7sSteliyan GeorgievKovalik9:50 AM | Daily - Platform5:088 • Tue 14 Apr 9:50:32Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8535
|
|
8534
|
FirefoxFileEoitViewHistoryBookmarksProfilesToolsWi FirefoxFileEoitViewHistoryBookmarksProfilesToolsWindowHelpgithub.com/fiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab/dd0782d4a4d93075ca5ce05903d77e8Google GeminiJy 19798 evaluation for ai activity tvpes #468Platform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ+ New lab8? OperAll commits-nikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types Lsrc/eval/endpoints.pyE• Filter files...v = eval_ output/ai_activity_type/eval...v src/eval/eval-ai-activity-type.py L> • ground_truth..•aa-0.0 +1.330 ad>I inoutv F src/eval#endpoints.py+ eval-ai-activity-type.pyE .gitignoreDaily - Platform • now100%• Tue 14 Apr 9:50:29• 2 | 203 viewed|Awallind aporovali45 nannOI• Viewed+330 UUUUuVlewed+ #!/usr/bin/env python3+ 1mport asynclo+ import json+ import os+ trom pathlib import Path+ from typing import Any, Dict, List+ import aiohttp+ from src.component.call_processing.schemes.ai activity type_request importALACELVLEV voekequest11+ from src.component.call_processing.schemes.ai activity type_response importA1ACCIVICYIypekesponse+ trom src.eval.abstract_evaluator import Abstractevaluator+ from src.eval.collectors import collect test set as es collect test set+ trom src.eval.common_evaluator_runner import create_parser, run_evaluation+ from src.eval.endpoints import run endpoint as run endpoint http+ trom src,eval,models 1moort Evaluatzonkesult23Ask GeminiNewsummarize page+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPOINT PATH = "call/ai-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESrUNst MoDEL = ALACCIviLyIvpekesponseouTPUT_sUBFOLDER = "a1_activity_type"USES LLM EVALUATION = FalseREADS_FROM_DATASET_ FOLDER = TrueDATA SET: str = "initial"MINIMUM_REQUEST BODY_SIZE: int = 0SKLP_UNREPLAYABLE_ON_COLLECT: bool = TrueREPLAYABILITY CHECK TIMEOUT SECONDS: float = 20.0rurULAlL onuuNu Ikuin LlM. Dool = IalseGROUND TRUTH LIM ENDPOINT: str = "OPENAL:apt=5 2"LUMPAKE TU GKUUND IKuin: Str=esdeт 1nic selt, **kwaros. Any -> None:super().__init__(**kwargs)_repo_root = Path(_file_).parents (2]_datasets_root = _repo_root / "eval_output" / selt.OUTPUT_SUBFOLDER /"eval datasets"_dataset_dir = _datasets_root / selt.__class_.DATA_SElself._dataset_dir =_dataset_dirselt._inputs_air =_dataset_air / "Input"self. around truths dir= datacet dir "around truth'l...
|
NULL
|
NULL
|
NULL
|
8534
|
|
8533
|
Firefox File Edit View History Bookmarks Profiles Firefox File Edit View History Bookmarks Profiles Tools Window Help••+→ cPlatform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ X+ New labi? Open© github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8Jy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types> src/eval/endpoints.py 0 +Q Filter files...• E eval_output/ai_activity_type/eval….> D ground_truth>B input• E src/evalendpoints.pyD eval-ai-activity-type.py*.gitignorev src/eval/eval-ai-activity-type.py0c -0,0 +1,330 gg24•< 4o ll • | Daily - Platform • nowA100% C• 8 • Tue 14 Apr 9:50:28^ 2 1 203 viewed• Awallind aporovaliSuomitrevewy+5 -n Viewed+330 UUU Viewed+ #!/usr/bin/env python3|+ Import async1o+ import json+Import os+ Trom pathlib import Path+ Trom Lyoino 1mporl Any, UIcl L150+ Import dlonuuo+ from src.component.call_processing.schemes.ai activity_ type_request import AiActivityTypeRequest+ Trom src.component.call_processing.schemes.al_activity-type_response import Alactivitylyperesponse+ from src.eval.abstract_evaluator import AbstractEvaluator+ Trom src.eval.couleccors imporr collecc tesc ser as es collect test seu+ from src.eval.common_evaluator_runner import create_parser, run_evaluation+ from src.eval.endpoints import run_endpoint as run_endpoint_http+ trom src.eval.models import EvaluationResult+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPOINT_PATH = "call/ai-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESPONSE_MODEL = AiActivityTypeResponseOUTPUT_SUBFOLDER = "ai_activity_type"USES_LLM_EVALUATION = FalseREADS_FROM_DATASET_FOLDER = TrueDATA_SET: str = "initial"MINIMUM_REQUEST_BODY_SIZE: int = 0SKIP_UNREPLAYABLE_ON_COLLECT: bool = TrueREPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.0POPULATE_GROUND_TRUTH_LLM: bool = FalseGROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"COMPARE_TO_GROUND_TRUTH: str = "es"def _init_(self, **kwargs: Any) -> None:super()._init_(**kwargs)_repo_root = Path(_file_).parents[2]_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"_dataset_dir = _datasets_root / self._class_.DATA_SETself._dataset_dir =_dataset_dirself._inputs_dir = _dataset_dir / "input"self._ground_truths_dir = _dataset_dir / "ground_truth"async der coltect_test_seu(seut) → Listibicuistr, AnyJu:if self.es host name:...
|
NULL
|
NULL
|
NULL
|
8533
|
|
8532
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-d github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8...
|
8532
|
|
8531
|
Firefox File EditView History Bookmarks ProfilesT Firefox File EditView History Bookmarks ProfilesTools Window Help•..•+CPlatform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ *+ New lab© github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8i? OpenJyvoeValuat on Tor ai activity tvoes ·400All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [Q Filter files...› src/eval/endpoints.py 0 *• E eval_output/ai_activity_type/eval….> • ground_truth> E input.• E src/evalendpoints.pyD eval-ai-activity-type.py*.gitignorev src/eval/eval-ai-activity-type.py*• @a -0,0 +1,330 @dlobl1 Daily - Platform • nowA100% C•8 • Tue 14 Apr 9:50:22^ 2 1 203 viewed• Awallind aporovaliSuomitrevewto+330 mViewed• Viewed+*!usrdin, env prtnons+ import asyncio+ 1mporl son4 + I import os+ from pathlib import Path+ Trom typing import Any, bict, List+ Import alonuto.+ from src.component.call_processing.schemes.ai activity_type_request import AlActivitylyperequest+ from src.component.call_processing.schemes.ai_activity_type_response import AiActivityTypeResponse+ from src.eval.abstract evaluator import AbstractEvaluator+ from src.eval.collectors import collect_test_set as es_collect_test_set+ from src.eval.common_evaluator_runner import create_ parser, run_evaluation+ from src.eval.endpoints import run_endpoint as run_endpoint_http+ from src.eval.models import EvaluationResult203+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPOINT_PATH = "call/ai-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESPONSE_MODEL = AiActivityTypeResponseOUTPUT_SUBFOLDER = "ai_activity_type"USES_LLM_EVALUATION = FalseREADS_FROM_DATASET_FOLDER = TrueDATA_SET: str = "initial"MINIMUM_REQUEST_BODY_SIZE: int = 0SKIP_UNREPLAYABLE_ON_COLLECT: bool = TrueREPLAYABILITY CHECK TIMEOUT SECONDS: float = 20.0POPULATE GROUND TRUTH LLM: bool = FalseGROUND_TRUTH_LLM_ENDPOINT: str = "'OPENAI: gpt-5_2"COMPARE_TO_GROUND_TRUTH: str = "es"def _init__(self, **kwargs: Any) -> None:super()._init_(**kwargs)_repo_root = Path(_file_).parents(2]_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"_dataset_dir = _datasets_root / self._class_.DATA_SETself._dataset_dir = _dataset_dirself._inputs_dir = _dataset_dir / "input"self._ground_truths_dir = _dataset_dir / "ground_truth"async def collect_test_set(self) -> List[Dict[str, Anyll:...
|
NULL
|
NULL
|
NULL
|
8531
|
|
8530
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
Lines changed: 25671 additions & 0 deletions
Conversation (3)
Conversation
(
3
)
Commits (9)
Commits
(
9
)
Checks (3)
Checks
(
3
)
Files changed (203)
Files changed
(
203
)
Pull Request Toolbar
Pull Request Toolbar
Collapse file tree
Open
Jy 19798 evaluation for ai activity types
Jy 19798 evaluation for ai activity types
#
468
All commits
All commits
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
2
/
203
viewed
Awaiting approval
Awaiting approval
Submit review
Submit
review
Open diff view settings
Open overview panel
Open comments panel
(
0
)
Filter files…
Filter options
File tree
File tree
eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable
ground_truth
input
src/eval
endpoints.py
endpoints.py
eval-ai-activity-type.py
eval-ai-activity-type.py...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-d github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8...
|
8530
|
|
8529
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
Lines changed: 25671 additions & 0 deletions
Conversation (3)
Conversation
(
3
)
Commits (9)
Commits
(
9
)
Checks (3)
Checks
(
3
)
Files changed (203)
Files changed
(
203
)
Pull Request Toolbar
Pull Request Toolbar
Collapse file tree
Open
Jy 19798 evaluation for ai activity types
Jy 19798 evaluation for ai activity types
#
468
All commits
All commits
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
1
/
203
viewed
Awaiting approval
Awaiting approval
Submit review
Submit
review
Open diff view settings
Open overview panel
Open comments panel
(
0
)
Filter files…
Filter options
File tree
File tree
eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable
ground_truth
input
src/eval
endpoints.py
endpoints.py
eval-ai-activity-type.py
eval-ai-activity-type.py
.gitignore
.gitignore
This page has been optimized for large pull requests.
This page has been optimized for large pull requests.
This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.
Switch to single file mode
Switch to single file mode
Collapse file...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-d github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8...
|
8529
|
|
8528
|
* Firefox FileEoitViewHistory BookmarksProfilesToo * Firefox FileEoitViewHistory BookmarksProfilesTools Window Help© github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e811 OpenJyvoeVluar on Tor ai activty tvoes #408All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [Platform Sprint 1 Q2 - Platform TeaQ Filter files...v src/eval/endpoints.py L *@ Jy 19798 evaluation for ai activ *+ New Tab• E eval_output/ai_activity_type/eval….> • ground_truth> L input.• E src/evalE endpoints.pynyD eval-ai-activity-type.pyE gitignore1721731741aa -172,6 +172,7 @@ async def process_entry(idx: int, entry: Dict[str, Any]) - None:known_keys ={"content", "summary", "themes", "action_items","salesRep", "customer",176177is_empty = known_keys.isdisjoint(new_data.keys())@ -212,6 +213,10 @ async def process_entry(idx: int, entry: Dict[str, Anyl) -> None:2122132141"original_prompt": entry.get("original_prompt"),"current prompt": current prompt,"ground_truth": entry.get("ground_truth"),215216217"new_response": new_data,"valid_schema": False,"response_time": response_time,v src/eval/eval-ai-activity-type.py0c -0,0 +1,330 gg• | Daily - Platform • nowA100% CD8 • Tue 14 Apr 9:50:14^ 1 / 203 viewed• Awallind aporovalito uuuuu• Viewed172173174175 +1761771/8213214215216 [PHONE]21222known_keys = {"content", "summary", "themes", "action_items","saleskep", "customer","ai_activity_type", "ai_activity_type_reasoning",is_empty = known_keys.isdisjoint(new_data.keys())"original_prompt": entry.get("original_prompt"),"current_prompt": current_prompt,"ground_truth": entry.get("ground_truth"),"ground_truth_es": entry.get ("ground_truth_es"),"ground_truth_llm": entry.get("ground_truth_llm"),"ground_truth_llm_model": entry.get("ground_truth_llm_model"),"ground_truch_tum_reasoning: entry.gett ground_truch_lum_reasoning),"new_response": new_data,"valid_schema": ralse,"response_time"; response_time,+330 UUU Viewed+ #!/usr/bin/env python3+ import asyncio+ import json+Import os+ from pathlib import Path*Tron Lyoino 1mporl Any, UIcl, L15+ Import dloneuo+ from src.component.call_processing.schemes.ai activity_ type_request import AiActivityTypeRequest+Tron src,conbonent.call orocessino.schenes.al actlviry tvoe resconse 1nbort AlAculVitV viekesconse+ from src.eval.abstract_evaluator import AbstractEvaluator+ from src.eval.collectors import collect test set as es collect test set+ from src.eval.common_evaluator_runner import create_parser, run_evaluation+ from src.eval.endpoints import run endpoint as run endpoint http+ trom src.eval.models import EvaluationResult+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPUINI_PAIN = "call/al-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESPONSE MODEL = AiActivitvTvpeResponseOUTPUT_SUBFOLDER = "ai_activity_type"USES_LLM_EVALUATION = FalseREADS_FROM_DATASET_FOLDER = TrueDATA_SET: str = "initial"...
|
NULL
|
NULL
|
NULL
|
8528
|
|
8527
|
Firefox FileEoitView History Bookmarks Profiles Firefox FileEoitView History Bookmarks Profiles Tools Window HelpCPlatform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ X+ New labI1 Open©github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083bJy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v src/eval/eval-ai-activity-type.pyQ Filter files...• E eval_output/ai_activity_type/eval….> • ground_truthinput.• E src/evalE endpoints.pyk* eval-ai-activity-type.pyE .gitignore>0 lbl O | Daily - Platform • nowA100% C• 8• Tue 14 Apr 9:50:13• 1/ 203 viewed• Awallino aporovaliVomitrevew+330 UUUUU• Viewed293294295296297299300301302304305306307308309310311312313314315316317318319321322323326328330+ if_name_ = "_main_":parser = create_parser(AiActivityTypeEvaluator)parser.add_argument("_-data-set",type=str,default="initial",dest="data_set",help="Dataset folder under eval output/ai activity_type/eval_datasets/ (derautt: initlal),parser.add_argument("--populate-ground-truth-llm",action="store_true",dest="populate_ground_truth_llm",parser.add_argument("--ground-truth-llm-endpoint",type=str,default="OPENAI : gpt-5_2",dest="ground_truth_llm_endpoint",parser.add_argument("--compare-to-ground-truth",type=str,choices=["es", "1lm"],default="es",dest="compare_to_ground_truth",args = parser.parse_args()AiActivityTypeEvaluator.DATA_SET = args.data_setAiActivityTypeEvaluator.POPULATE GROUND_TRUTH_LLM = bool(args.populate ground_truth_llmAiActivitvTvoeEvaluator.GROUND TRUTH LLM ENDPOINT = args.around truth 1lm endpointAiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truthasyncio.run(run evaluation(AiActivityTypeEvaluator, arqs))> •gitignore @*+4 -uoov ViewedYou made it to the end....
|
NULL
|
NULL
|
NULL
|
8527
|
|
8526
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niC 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x©Actions secrC C DependatActions sec• Dependabot: xQ 8•Tue 14 Apr 9:50O Far Dependas xinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468# AWS|E3 SSHс *El Datados * Clude5 CircieclD Sentry8 TaskslInsights & Coachin.• Dev• Ux0 Al BookmarksIEB)Summary|Run Flake8 & MypyRun details© UsageWorkfiow fileRun Flake8 & Mypy•, Search logs IStefka StoyanovaSet up Python> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: N292 no newline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python|› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesn25sSteliyan GeorgievKovalik9:50 AM | Daily - Platform4:498 • Tue 14 Apr 9:50:13ChatNikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8526
|
|
8525
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
Lines changed: 25671 additions & 0 deletions
Conversation (3)
Conversation
(
3
)
Commits (9)
Commits
(
9
)
Checks (3)
Checks
(
3
)
Files changed (203)
Files changed
(
203
)
Pull Request Toolbar
Pull Request Toolbar
Collapse file tree
Open
Jy 19798 evaluation for ai activity types
Jy 19798 evaluation for ai activity types
#
468
All commits
All commits
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
1
/
203
viewed
Awaiting approval
Awaiting approval
Submit review
Submit
review
Open diff view settings
Open overview panel
Open comments panel
(
0
)
Filter files…
Filter options
File tree
File tree
eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable
ground_truth
input
src/eval
endpoints.py
endpoints.py
eval-ai-activity-type.py
eval-ai-activity-type.py
.gitignore
.gitignore
This page has been optimized for large pull requests.
This page has been optimized for large pull requests.
This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.
Switch to single file mode
Switch to single file mode
Collapse file
src/eval/eval-ai-activity-type.py
src/eval/eval-ai-activity-type.py
src/eval/eval-ai-activity-type.py
Copy file name to clipboard
Lines changed: 330 additions & 0 deletions
Not Viewed
Viewed
Comment on this file
More options
Original file line number
Original file line
Diff line number
Diff line change
@@ -0,0 +1,330 @@
1
+
#!/usr/bin/env python3
2
+
import
asyncio
3
+
import
json
4
+
import
os
5
+
from
pathlib
import
Path
6
+
from
typing
import
Any
,
Dict
,
List
7
+
8
+
import
aiohttp
9
+
10
+
from
src
.
component
.
call_processing
.
schemes
.
ai_activity_type_request
import
AiActivityTypeRequest
11
+
from
src
.
component
.
call_processing
.
schemes
.
ai_activity_type_response
import
AiActivityTypeResponse...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-f github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b...
|
8525
|
|
8524
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
Lines changed: 25671 additions & 0 deletions
Conversation (3)
Conversation
(
3
)
Commits (9)
Commits
(
9
)
Checks (3)
Checks
(
3
)
Files changed (203)
Files changed
(
203
)
Pull Request Toolbar
Pull Request Toolbar
Collapse file tree
Open
Jy 19798 evaluation for ai activity types
Jy 19798 evaluation for ai activity types
#
468
All commits
All commits
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
1
/
203
viewed
Awaiting approval
Awaiting approval
Submit review
Submit
review
Open diff view settings
Open overview panel
Open comments panel
(
0
)
Filter files…
Filter options
File tree
File tree
eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable
ground_truth
input
src/eval
endpoints.py
endpoints.py
eval-ai-activity-type.py
eval-ai-activity-type.py
.gitignore
.gitignore
This page has been optimized for large pull requests.
This page has been optimized for large pull requests.
This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.
Switch to single file mode
Switch to single file mode
Collapse file
src/eval/eval-ai-activity-type.py
src/eval/eval-ai-activity-type.py
src/eval/eval-ai-activity-type.py
Copy file name to clipboard
Lines changed: 330 additions & 0 deletions
Not Viewed
Viewed
Comment on this file
More options
Original file line number
Original file line
Diff line number
Diff line change
@@ -0,0 +1,330 @@
1
+
#!/usr/bin/env python3
2
+
import
asyncio
3
+
import
json
4
+
import
os
5
+
from
pathlib
import
Path
6
+
from
typing
import
Any
,
Dict
,
List
7
+
8
+
import
aiohttp
9
+
10
+
from
src
.
component
.
call_processing
.
schemes
.
ai_activity_type_request
import
AiActivityTypeRequest
11
+
from
src
.
component
.
call_processing
.
schemes
.
ai_activity_type_response
import...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-f github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b...
|
8524
|
|
8523
|
* Firefox File EditView History Bookmarks Profiles * Firefox File EditView History Bookmarks Profiles Tools Window HelpCPlatform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ X+ New labI1 Open©github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083bJy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v src/eval/eval-ai-activity-type.pyQ Filter files...• E eval_output/ai_activity_type/eval.…> D ground_truth> D input~ & src/evalendpoints.py* eval-ai-activity-type.pyE gitignore>0 lbl O | Daily - Platform • nowA100% C• 8• Tue 14 Apr 9:50:04• 1/ 203 viewed• Awallind aporovaliVomitrevew+330 UUUUU|• Viewedencboint vauneselT.eNDrUINi PAln.Ilimit=self.limit,language_filter=self.language_filter,min_prompt_length=self.min_prompt_length,is_json_response=self.IS_JSON_RESPONSE,exclude_languages=self.exclude_languages,[PASSWORD_DOTS] *+*++ +self._inputs_dir.mkdir(parents=True, exist_ok=True)self._ground_truths_dir.mkdir(parents=True, exist_ok=True)min size = self.MINIMUM REOUEST BODY SIZEsamples: List[Dict[str, Anyll = []skipped_no_response = 0skipped_too_small = 0skipped_unreplayable = 0skipped_probe_errors = 0endpoint_url = f"{self.api_host}/{self.ENDPOINT_PATH}"umeour = alontto.culentneour toraselt.KCPLAYABILIIY CHECK TIMEOUI SECUNDS)async with aiohttp.ClientSession(timeout=timeout) as session:for samole in raw samples:es_id = sample["id"]request boay = samplerequest bodyoriginal_response = sample.get("original_response")*if not original_response:skipped_no_response += 1continue1t man suze anc lencison,cunos.recvest oocv < min size.skipped_too_small += 11T SeLTaSALr UNKCrLATADLE UN LULLCCI:try:dsyne wiln sesszon.post enapone url,son-recuest boay as reso.1t resp.status = 200:body_text = await resp.text()iT"Lannor tino aculvlt in ooov text.skipped_unreplayable += 1except Exception:skipped probe errors += 1wiun ooen seut. 1nouls alr Ties 10r.50h, W dsTejson.dump(request_body, t, indent=2)with open(selt._ground_truths_dir / T"gt_tes_1dy•Json", "W") as T:json.dump(original_response, f, indent=2)samples.append(...
|
NULL
|
NULL
|
NULL
|
8523
|
|
8522
|
* Firefox File EditView History Bookmarks Profiles * Firefox File EditView History Bookmarks Profiles Tools Window HelpCPlatform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ X+ New labI1 Open©github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083bJy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v src/eval/eval-ai-activity-type.pyQ Filter files...• E eval_output/ai_activity_type/eval….> • ground_truth> D input• E src/evalendpoints.py* eval-ai-activity-type.pyE gitignore52 €6:2030 ll © | Daily - Platform • nowA100% C• 8• Tue 14 Apr 9:49:46• 1/ 203 viewed• Awallind aporovaliVlewedENDPOINT PATH = "call/ai-activity-tvoe"REQUEST_MODEL = AiActivityTypeRequestRESPONSE_MODEL = AiActivityTypeResponseOUTPUT_SUBFOLDER = "ai_activity_type"USES_LLM_EVALUATION = FalseREADS_FROM_DATASET_FOLDER = TrueDATA_SET: str = "initial"MINIMUM_REQUEST_BODY_SIZE: int = 0SKIP_UNREPLAYABLE_ON_COLLECT: bool = TrueREPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.0POPULATE_GROUND_TRUTH_LLM: bool = FalseGROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"COrPARE TO GROUND TRUTH: str = "es"def _init__(self, **kwargs: Any) -> None:super()._init_(**kwargs)_repo_root = Path(_file_).parents[2]_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"_dataset_dir = _datasets_root / self._class_.DATA_SETself._dataset_dir =_dataset_dirself._inputs_dir =_dataset_dir / "input"self._ground_truths_dir =_dataset_dir / "ground_truth"async def collect_test_set(self) →> List[Dict[str, Any]l:if self.es_host_name:return awalt selt._collect_trom_est)return await self._collect_from_files() *async def _collect_from_es(self) → List[Dict[str, Anyll:if self. inputs dir.exists) or self. ground truths dir.exists):raise FileExistsError(f"Dataset folder already exists: {self._inputs_dir.parent).""Choose a different --data-set name or remove the folder manually."raw_samples = await es_collect_test_set(es_nost_name=self.es_host name.es_port=self.es_port,endpoint path=self.ENDPOINT PATH,Limit=self.limit,language_filter=self.language_filter,min_prompt_length=self.min_prompt_length,is_json_response=self.IS_JSON_RESPONSE,exclude_languages=self.exclude_languages,self._inputs_dir.mkdir(parents=True, exist_ok=True)...
|
NULL
|
NULL
|
NULL
|
8522
|
|
8521
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
Lines changed: 25671 additions & 0 deletions
Conversation (3)
Conversation
(
3
)
Commits (9)
Commits
(
9
)
Checks (3)
Checks
(
3
)
Files changed (203)
Files changed
(
203
)
Pull Request Toolbar
Pull Request Toolbar
Collapse file tree
Open
Jy 19798 evaluation for ai activity types
Jy 19798 evaluation for ai activity types
#
468
All commits
All commits
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
1
/
203
viewed
Awaiting approval
Awaiting approval
Submit review
Submit
review
Open diff view settings
Open overview panel
Open comments panel
(
0
)
Filter files…
Filter options
File tree
File tree
eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable
ground_truth
input
src/eval
endpoints.py
endpoints.py
eval-ai-activity-type.py
eval-ai-activity-type.py
.gitignore
.gitignore
This page has been optimized for large pull requests.
This page has been optimized for large pull requests.
This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.
Switch to single file mode
Switch to single file mode
Collapse file
src/eval/eval-ai-activity-type.py
src/eval/eval-ai-activity-type.py
src/eval/eval-ai-activity-type.py
Copy file name to clipboard
Lines changed: 330 additions & 0 deletions
Not Viewed
Viewed
Comment on this file
More options
Original file line number
Original file line
Diff line number
Diff line change
@@ -0,0 +1,330 @@
1
+
#!/usr/bin/env python3
2
+
import
asyncio
3
+
import
json
4
+
import
os
5
+
from
pathlib
import
Path
6
+
from
typing
import
Any
,
Dict
,
List
7
+
8
+
import
aiohttp
9
+
10
+
from
src
.
component
.
call_processing
.
schemes
.
ai_activity_type_request
import
AiActivityTypeRequest
11
+
from
src
.
component
.
call_processing
.
schemes
.
ai_activity_type_response
import
AiActivityTypeResponse
12...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-f github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b...
|
8521
|
|
8520
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niE:1Q 8•C 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x©Actions secrC C DependatActions sec• Dependabot: xHa Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468B AWSE3 SSHс *El Datadog % Claude5 CircieclD Sentry8 TaskslInsights & Coachin.• Dev• UxRun Flake8 & Mypy•, Search logs I0 Al BookmarksSummary|Run Flake8 & MypyRun details© UsageWorkfiow fileStefka StoyanovaHDSet up Python> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: N292 no newline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python |› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesn12$1s25s7sSteliyan Georgievovalik9:49 AM Daily - Platform4:158 • Tue 14 Apr 9:49:40Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8520
|
|
8519
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
Lines changed: 25671 additions & 0 deletions
Conversation (3)
Conversation
(
3
)
Commits (9)
Commits
(
9
)
Checks (3)
Checks
(
3
)
Files changed (203)
Files changed
(
203
)
Pull Request Toolbar
Pull Request Toolbar
Collapse file tree
Open
Jy 19798 evaluation for ai activity types
Jy 19798 evaluation for ai activity types
#
468
All commits
All commits
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
1
/
203
viewed
Awaiting approval
Awaiting approval
Submit review
Submit
review
Open diff view settings
Open overview panel
Open comments panel
(
0
)
Filter files…
Filter options
File tree
File tree
eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable
ground_truth
input
src/eval
endpoints.py
endpoints.py
eval-ai-activity-type.py
eval-ai-activity-type.py
.gitignore
.gitignore
This page has been optimized for large pull requests.
This page has been optimized for large pull requests.
This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.
Switch to single file mode
Switch to single file mode
Collapse file
src/eval/endpoints.py
src/eval/endpoints.py
src/eval/endpoints.py
Copy file name to clipboard
Expand all lines: src/eval/endpoints.py
Lines changed: 5 additions & 0 deletions
Not Viewed
Viewed
Comment on this file
More options
Original file line number
Original file line
Diff line number
Diff line change
@@ -172,6 +172,7 @@ async def process_entry(idx: int, entry: Dict[str, Any]) -> None:
172
known_keys
=
{
172
known_keys
=
{
173
"content"
,
"summary"
,
"themes"
,
"action_items"
,
173
"content"
,
"summary"
,
"themes"
,
"action_items"
,
174
"salesRep"
,
"customer"
,
174
"salesRep"
,
"customer"
,
175
+
"ai_activity_type"
,
"ai_activity_type_reasoning"
,
175
}
176
}
176
is_empty
=
known_keys
.
isdisjoint
(
new_data
.
keys
())...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-f github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b...
|
8519
|
|
8518
|
Firefox FileEoitYiewHistory BookmarksProfilesToo Firefox FileEoitYiewHistory BookmarksProfilesTools Window Help→Platform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ X+ New labI1 Open©github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083bJy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [[PASSWORD_DOTS]Q Filter files...• E eval_output/ai_activity_type/eval….> • ground_truth> L input.• E src/evalE endpoints.py* eval-ai-activity-type.pyE .gitignorev src/eval/eval-ai-activity-type.py L@a -0,0 +1,330 @→0 • Daily - Platform • nowA100% C8 • Tue 14 Apr 9:49:34• 1/ 203 viewed• Awallind aporovali+330 -0000• Viewed|+ #!/usr/bin/env pythons+ import asyncio+ import Json+ import os+ from pathlib import Path+ from typing import Any, Dict, List+ Import alohttp+ from src.component.call_processing.schemes.ai_activity_type_request import AiActivityTypeRequest+ from src.component.call_processing.schemes.ai activity_ type_response import AiActivityTypeResponse+Tron src,eval,aostract evalvaror imbort Aostracuevaluaton+ from src.eval.collectors import collect_test_set as es_collect_test_set+ from src.eval.common_evaluator_runner import create_parser, run_evaluation+ from src.eval.endpoints import run_endpoint as run_endpoint_http+Trom src.eval.models 1mport evaluacionkesuut22 +28+ class AiActivityTypeEvaluator(AbstractEvaluator):ENDPOINT_PATH = "call/ai-activity-type"REQUEST_MODEL = AiActivityTypeRequestRESPONSE_MODEL = AiActivityTypepfsponseOUTPUT_SUBFOLDER = "ai_activity_type"USES_LLM_EVALUATION = FalseREADS_FROM_DATASET_FOLDER = TrueDATA_SET: str = "initial"MINIMUN REOUES BODY SI4E: 1nt = vSKLPLUNKEPLATABLE_UN_COLLECI: boOl = IrueREPLAYABILITY_CHECK_TIMEOUT_SECONDS: float = 20.0POPULATE_GROUND_TRUTH_LLM: bool = FalseGROUND_TRUTH_LLM_ENDPOINT: str = "OPENAI:gpt-5_2"COMPARE_TO_GROUND_TRUTH: str = "es"def _init_(self, **kwargs: Any) -> None:super().__init_(**kwargs)_repo_root = Path(_file_).parents[2]_datasets_root = _repo_root / "eval_output" / self.OUTPUT_SUBFOLDER / "eval_datasets"_dataset_dir = _datasets_root / self._class_.DATA_SETself. dataset dir = dataset dirself._inputs_dir =_dataset_dir / "input"self._ground_truths_dir =_dataset_dir / "ground_truth"async def collect test set(self) →> List[Dict[str, Anyl]:if self.es host name:203...
|
NULL
|
NULL
|
NULL
|
8518
|
|
8517
|
JY-20543 add AJ reports User pilot tracking by Lak JY-20543 add AJ reports User pilot tracking by LakyLak · Pull Request #11932 · jiminny/app
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Platform Sprint 1 Q2 - Platform Team - Scrum Board - Jira
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet
Close tab
New Tab
Customize sidebar
Open Google Gemini (⌃X)
Tabs from other devices
Open history (⇧⌘H)
Open bookmarks (⌘B)
Skip to content
Skip to content
Open menu
Homepage (g then d)
jiminny
jiminny
prophet
prophet
Search or jump to…
Type
/
to search
Chat with Copilot
Open Copilot…
Create new...
Issues
Pull requests
Repositories
You have unread notifications(g then n)
Open user navigation menu
Repository navigation
Repository navigation
Code
Code
Issues
Issues
Pull requests (6)
Pull requests
(
6
)
Agents
Agents
Actions
Actions
Projects
Projects
Wiki
Wiki
Security and quality (20)
Security and quality
(
20
)
Insights
Insights
Settings
Settings
Important update
Important update
On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out.
Review this update
Review this update
and manage your preferences in your
GitHub account settings
GitHub account settings
.
Dismiss banner
Jy 19798 evaluation for ai activity types #468 Edit title
Jy 19798 evaluation for ai activity types
#
468
Edit title
Preview
Preview
Awaiting approval
Awaiting approval
Code
Code
Open
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
Lines changed: 25671 additions & 0 deletions
Conversation (3)
Conversation
(
3
)
Commits (9)
Commits
(
9
)
Checks (3)
Checks
(
3
)
Files changed (203)
Files changed
(
203
)
Pull Request Toolbar
Pull Request Toolbar
Collapse file tree
Open
Jy 19798 evaluation for ai activity types
Jy 19798 evaluation for ai activity types
#
468
All commits
All commits
nikolaybiaivanov
nikolaybiaivanov
wants to merge 9 commits into
master
master
from
JY-19798-evaluation-for-ai-activity-types
JY-19798-evaluation-for-ai-activity-types
Copy head branch name to clipboard
1
/
203
viewed
Awaiting approval
Awaiting approval
Submit review
Submit
review
Open diff view settings
Open overview panel
Open comments panel
(
0
)
Filter files…
Filter options
File tree
File tree
eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable
ground_truth
input
src/eval
endpoints.py
endpoints.py
eval-ai-activity-type.py
eval-ai-activity-type.py
.gitignore
.gitignore
This page has been optimized for large pull requests.
This page has been optimized for large pull requests.
This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.
Switch to single file mode
Switch to single file mode
Collapse file
src/eval/endpoints.py
src/eval/endpoints.py
src/eval/endpoints.py
Copy file name to clipboard
Expand all lines: src/eval/endpoints.py
Lines changed: 5 additions & 0 deletions
Not Viewed
Viewed
Comment on this file
More options
Original file line number
Original file line
Diff line number
Diff line change
@@ -172,6 +172,7 @@ async def process_entry(idx: int, entry: Dict[str, Any]) -> None:
172
known_keys
=
{
172
known_keys
=
{
173
"content"
,
"summary"
,
"themes"
,
"action_items"
,
173
"content"
,
"summary"
,
"themes"...
|
Firefox
|
Jy 19798 evaluation for ai activity types by nikol Jy 19798 evaluation for ai activity types by nikolaybiaivanov · Pull Request #468 · jiminny/prophet — Work...
|
github.com/jiminny/prophet/pull/468/changes#diff-f github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b...
|
8517
|
|
8516
|
Firefox File Edit View History Bookmarks Profiles Firefox File Edit View History Bookmarks Profiles Tools Window Help•••+→ cPlatform Sprint 1 Q2 - Platform Tea( Jy 19798 evaluation for ai activ XNew 1aoi? Open©github.com/jiminny/prophet/pull/468/changes#diff-bc37d034bad564583790a46f19d807abfe519c5671395fd494d8cce506c42947Jy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v src/eval/eval-ai-activity-type.pyQ Filter files...• E eval_output/ai_activity_type/eval….> D ground_truthinput• E src/evalE endpoints.pyD eval-ai-activity-type.py* gitignore•< 4o ll • | Daily - Platform • now100% C• &• Tue 14 Apr 9:49:28• 1/ 203 viewed• Awallino aporovaliVomitrevew• Viewed292293295296297298300301302303304305306307308309310311312313314315316317318319320322323324326327+ if _name_ = "__main_":parser = create parser(AiActivityTypeEvaluator)parser.add_argument("-data-set",type=str,default="initial",dest="data_set",help="Dataset folder under eval output/ai activity_type/eval_datasets/ (default: initial)",parser.add arqument("—populate-ground-truth-1lm",detsone store_true,dest="populate_ground_truth_llm",narser.add araument("--ground-truth-llm-endpoint",type=str,default="OPENAI: gpt-5_2",dest="ground_truth_llm_endpoant,parser.add_argument("-compare-to-ground-truth",type=str,choices=["es", "1lm"],default="es",dest="compare_to_ground_truth",args = parser.parse_args()AiActivityTypeEvaluator.DATA_SET = args.data_setAiActivityTypeEvaluator.POPULATE_GROUND_TRUTH_LLM = bool(args.populate_ground_truth_llm)AiActivityTypeEvaluator.GROUND_TRUTH_LLM_ENDPOINT = args.ground_truth_llm_endpointAiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truthasyncio.run(run_evaluation(AiActivityTypeEvaluator, args))> .gitignore & ++4-0== ViewedYou made it to the end!...
|
NULL
|
NULL
|
NULL
|
8516
|
|
8515
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niIEB)Q 8•C 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x©Actions secrC C DependatActions sec• Dependabot: xHa Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468# AWS|E3 SSHс *El Datadog % Claude5 CircieclD Sentry8 TaskslInsights & Coachin.• Dev• UxRun Flake8 & Mypy•, Search logs I0 Al BookmarksSummary|Run Flake8 & MypyRun details© UsageWorkfiow fileSterka StoyanovaSet up Python> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: N292 no newline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python |› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesn25sSteliyan GeorgievKovalik9:49 AM Daily - Platform4:048 • Tue 14 Apr 9:49:28Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8515
|
|
8514
|
Firefox FileEoitView History Bookmarks Profiles Firefox FileEoitView History Bookmarks Profiles Tools Window Help→Platform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ *+ New labI1 Open© github.com/jiminny/prophet/pull/468/changes#diff-bc37d034bad564583790a46f19d807abfe519c5671395fd494d8cce506c42947Jy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types ®v src/eval/eval-ai-activity-type.pyQ Filter files...• E eval_output/ai_activity_type/eval….> • ground_truth> D input• E src/evalendpoints.pyD eval-ai-activity-type.pyE gitignore• < 4o ll © | Daily - Platform • nowA100% C• 8 • Tue 14 Apr 9:49:27• 0 / 203 viewed• Awallind aporovaliVomitrevew+330 UUUUU• Viewed304305306307308309310312313314315316317318319321322323324325326327328329330action="store_true",dest="populate_ground_truth_llm",parser.add_argument("--ground-truth-llm-endpoint",type=str,default="OPENAI: gpt-5_2",dest="ground_truth_llm_endpoint",parser.add_argument ("--compare-to-ground-truth",type=str,choices=["es", "1lm"],default="es",dest="compare_to_ground_truth",args = parser.parse_args()AiActivityTypeEvaluator.DATA_SET = args.data_setAiActivityTypeEvaluator.POPULATE GROUND_TRUTH_LLM = bool(args.populate ground_truth_llmAiActivitvTvoeEvaluator.GROUND TRUTH LLM ENDPOINT = args.around truth 1lm endpointAiActivityTypeEvaluator.COMPARE_TO_GROUND_TRUTH = args.compare_to_ground_truthasyncio.run(run evaluation(AiActivityTypeEvaluator, arqs))V .gitignore @*C -16,6 +16,10 @@ eval_output/identify_participants/*eval_output/identify_participants/eval_datasets/*leval_output/identify_participants/eval_datasets/v4-en/leval_output/identify_participants/eval_datasets/v4-non-en/*A Bnn LViewed...historyaws-creas.envprompt.txteval_output/identify_participants/eval_datasets/*leval output/identify participants/eval datasets/v4-en/leval_output/identify_participants/eval_datasets/v4-non-en/+ eval_output/ai_activity_type/eval_datasets/*+eva ououval aculvit tiereval caraseus/vi-tron-loos/+ leval_output/ai_activity_type/eval_datasets/v1-from-logs/**.historyaws-creas.envprompt.txtYou made it to the end!...
|
NULL
|
NULL
|
NULL
|
8514
|
|
8513
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niC 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x© Actions secrC C Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468# AWS|E3 SSHEl Datadog % Claude5 CircieclD Sentry8 TaskslActions secQ 8•• Dependabot: xHa Dependatс *Insights & Coachin.• Dev• Ux0 Al BookmarksIEB)Summary|Run Flake8 & MypyRun details© UsageWorkfiow fileRun Flake8 & Mypy•, Search logs ISterka StoyanovaSet up Python> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics•/src/eval/eval-ai-activity-type.py:330:63: N292 no nevline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python |› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesnDo not share my personal information)25sSteliyan GeorgievKovalik9:49 AM Daily - Platform4:038 • Tue 14 Apr 9:49:27Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8513
|
|
8512
|
Firefox FileEoitView History Bookmarks Profiles Firefox FileEoitView History Bookmarks Profiles Tools Window Help→Platform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ *+ New labI1 Open© github.com/jiminny/prophet/pull/468/changes#diff-bc37d034bad564583790a46f19d807abfe519c5671395fd494d8cce506c42947Jy 19798 evaluation for ai activity types #468All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-typesv src/eval/eval-ai-activity-type.pyQ Filter files...• E eval_output/ai_activity_type/eval….> • ground_truth> D input~ & src/evalendpoints.pyD eval-ai-activity-type.pyE .gitignore• < 4o ll © | Daily - Platform • nowA100% C• 8• Tue 14 Apr 9:49:24• 0 / 203 viewed• Awallind aporovaliVomitrevew+330 UUUUU• Viewed304305306307308309310312313314315316317318319321action="store_true",dest="populate_ground_truth_llm",parser. add_argument("--ground-truth-llm-endpoint",type=str,default="OPENAI: gpt-5_2",dest="ground_truth_llm_endpoint",parser.add_argument("--compare-to-ground-truth",type=str,choices=["es", "1lm"],default="es",dest="compare_to_ground_truth",+ + +++ + + +args = parser.parse_args()AiActivityTypeEvaluator.DATA_SET = args.data_setAiActivityTypeEvaluator.POPULATE GROUND_TRUTH_LLM = bool(args.populate ground_truth_llmAiActivitvTvoeEvaluator.GROUND TRUTH LLM ENDPOINT = args.around truth 1lm endpointAiActivityTypeEvaluator.COMPARE TO GROUND TRUTH = arqs.compare to ground truthasyncio.run(run evaluation(AiActivityTypeEvaluator, arqs))V .gitignore @*C -16,6 +16,10 @@ eval_output/identify_participants/*eval_output/identify_participants/eval_datasets/*leval_output/identify_participants/eval_datasets/v4-en/leval outout/1dentity particivants/eva datasets/v4-non-en/+4 UUUUU_ Viewed22historyaws-creas.envprompt.txteval_output/identify_participants/eval_datasets/*leval output/identify participants/eval datasets/v4-en/leval_output/identify_participants/eval_datasets/v4-non-en/+ eval_output/ai_activity_type/eval_datasets/*+eva ououval aculvit tiereval caraseus/vi-tron-loos/+ leval_output/ai_activity_type/eval_datasets/v1-from-ogs/**.historyaws-creas.envprompt.txtYou made it to the end!...
|
NULL
|
NULL
|
NULL
|
8512
|
|
8511
|
Firefox FileFoitYiewHistory BookmarksProfiles To Firefox FileFoitYiewHistory BookmarksProfiles Tools Window Help•.••+Platform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ *+ New lab© github.com/jiminny/prophet/pull/468/changes#diff-fc8aG53dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b11 OpenJyvoeVluar on Tor ai activty tvoes #400All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_FlDTcZ0BCSsQHiSA_TYe•json [®Q Filter files...0d -0,0 +1,4 CФ• E eval_output/ai_activity_type/eval….> • ground_truth> D input• E src/evalendpoints.py* eval-ai-activity-type.pyE .gitignoreY eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_FVDKcZ0BCSsQHiSAaDZ-•jsondd-0,0 +1,4 adv eval output/ai activity tvpe/eval datasets/v2-from-loas-replavable/around truth/at G1AZcp0BCSs0HiSAHzbe, ison Q* aa -0,0 +1,4 caeval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_G80fcZ0Bah[jYMX®tT03.json [•a -0,0 +1,4 caV eval output/ai activity tvpe/eval datasets/v2-from-loas-replavable/around truth/at q8SacZ0BRbeIYNJH50Ai.ison fL* aa -0,0 +1,4 cav eval output/ai activity type/eval datasets/v2-from-loqs-replayable/ground truth/qt GFD9cZ0BCSs0HiSAHzbU. ison [M•a -0,0 +1,4 @a• иC<40l• Daily - Platform • nowA100% C• Tue 14 Apr 9:49:23C0 Zus viewedi• Awallind aporovaliSubmit review -+4 UUUU• Viewed+ {"ai activity tvoe reasoning": "The call includes one Connectd rep (Tom Zini) and a client participant (SamarieetSingh). Call duration is 18 minutes 13 seconds, which is longer than 5 minutes. Both required criteria for this activitytype are met.","ai_activity_type": "Meeting Sat"+4 UUUU.Viewed+ {"ai activity type reasoning": "A Connectd representative (Georqe Kimpton-Moss) and prospect(s) (Alex Elnaugh / PrinaleMedia) both attended the call, and the call duration was 32 minutes 40 seconds, which is longer than 5 minutes. Bothcriteria in the activity type definition are satisfied.","ai_activity_type": "Meeting Sat"+400000 _ Viewed ••+ {"ai_activity_type_reasoning": "The call had both a Connectd rep (Barney Hall) and the prospect (James Bentley) listedas participants, and the call duration was 7 mins 23 seconds, which is longer than 5 minutes\u2014matching bothcriteria.","ai_activity_type": "Meeting Sat"+4 nuuno O Viewed"ai_activity_type_reasoning": "The call included both a Connectd rep (Daniel French) and a prospect/client (SeanTaylor) and the call duration was 11 minutes 11 seconds, which is longer than 5 minutes, meeting both criteria.","ai_activity_type": "Meeting Sat"+ 3+400000 _ Viewed ••+ 1"ai_activity_type_reasoning": "Transcript includes both Connectd rep James Baker and prospect Michael Nolan activelyparticipating. call duration is 1o minutes ie seconas, which is longer chan s minutes. boch criterta in the activity typedescription are satisfied.","ai activity type": "Meetina Sat"+4 00000|• Viewed...
|
NULL
|
NULL
|
NULL
|
8511
|
|
8510
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niIEB)Q 8•C 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x© Actions secrC C DependatActions sec• Dependabot: xHa Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468B AWSE3 SSHс *El Datadog % Claude5 CircieclD Sentry8 TaskslInsights & Coachin.• Dev• UxRun Flake8 & Mypy•, Search logs I0 Al BookmarksSummary|Run Flake8 & MypyRun details© UsageWorkfiow fileSterka StoyanovaSet up Python> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: N292 no newline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python |› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesn25sSteliyan GeorgievKovalik9:49 AM Daily - Platform3:598 • Tue 14 Apr 9:49:23Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8510
|
|
8509
|
Firefox FileFoit•.••+YiewHistory Bookmarks Profi Firefox FileFoit•.••+YiewHistory Bookmarks Profiles Tools Window HelpPlatform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ *+ New lab© github.com/jiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b11 OpenJyWvoeValation Torai activity Tvoes +400All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_FLDTcZ0BCSsQHiSA_TYe.jsonQ Filter files...0d -0,0 +1,4 CФ• E eval_output/ai_activity_type/eval….> • ground_truth> D input• E src/evalendpoints.py• eval-ai-activity-type.pyE.gitignoreY eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_FVDKcZ0BCSsQHiSAaDZ-•jsondd-0,0 +1,4 adv eval output/ai activity tvpe/eval datasets/v2-from-loas-replavable/around truth/at G1AZcp0BCSs0HiSAHzbe, ison rQ* aa -0,0 +1,4 caeval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_G80fcZ0Bah[jYMX®tT03.json [..••a -0,0 +1,4 caV eval output/ai activity tvpe/eval datasets/v2-from-loas-replavable/around truth/at q8SacZ0BRbeIYNJH50Ai,ison (L* aa -0,0 +1,4 cav eval output/ai activity type/eval datasets/v2-from-loqs-replavyable/ground truth/qt GFD9cZ0BCSs0HiSAHzbU. ison [M•a -0,0 +1,4 @github.com/iiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083bC<40l• Daily - Platform • nowA100% C8 • Tue 14 Apr 9:49:220 Zus viewed• Awallind aporovaliSubmit review -+4 UUUU|• Viewed+ {"ai activity tvoe reasoning": "The call includes one Connectd rep (Tom Zini) and a client participant (SamarieetSingh). Call duration is 18 minutes 13 seconds, which is longer than 5 minutes. Both required criteria for this activitytype are met.","ai_activity_type": "Meeting Sat"+4 UUUUViewed+ {"ai activity type reasoning": "A Connectd representative (Georqe Kimpton-Moss) and prospect(s) (Alex Elnaugh / PrinaleMedia) both attended the call, and the call duration was 32 minutes 40 seconds, which is longer than 5 minutes. Bothcriteria in the activity type definition are satisfied.","ai_activity_type": "Meeting Sat"+4 .0000 _ ViewedLJ ••<+ {"ai_activity_type_reasoning": "The call had both a Connectd rep (Barney Hall) and the prospect (James Bentley) listedas participants, and the call duration was 7 mins 23 seconds, which is longer than 5 minutes\u2014matching bothcriteria.","ai_activity_type": "Meeting Sat"+4 nuuno O Viewed"ai_activity_type_reasoning": "The call included both a Connectd rep (Daniel French) and a prospect/client (SeanTaylor) and the call duration was 11 minutes 11 seconds, which is longer than 5 minutes, meeting both criteria.","ai_activity_type": "Meeting Sat"+ 3+4 .0000 _ ViewedLJ ••<+ 1"ai_activity_type_reasoning": "Transcript includes both Connectd rep James Baker and prospect Michael Nolan activelyparticipating. call duration is 1o minutes ie seconas, which is longer chan s minutes. boch criterta in the activity typedescription are satisfied.","ai activity type": "Meeting Sat"+4 00000|• Viewed...
|
NULL
|
NULL
|
NULL
|
8509
|
|
8508
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.Daily - Platform - nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niIEB)Q 8•C 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x© Actions secrC C DependatActions sec• Dependabot: xHa Dependatinny/prophet/actions/runs/24195639151/ob/70624720594?pr=468B AWSE3 SSHс *El Datados * Clude5 CircieclD Sentry8 TaskslInsights & Coachin.• Del• UxRun Flake8 & Mypy•, Search logs I0 Al BookmarksSummary|Run Flake8 & MypyRun details© UsageWorkfiow fileSterka StoyanovaSet up Python> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v @ Run Flake81 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: N292 no newline at end of fileasyncio.runtrun_cvaluatzon(AlactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python |› © Post Checkout code> O Complete job• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvonest menesn25sSteliyan GeorgievKovalik9:49 AM Daily - Platform3:588 • Tue 14 Apr 9:49:22Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8508
|
|
8507
|
Firefox FileEoitViewHistory BookmarksProfilesPla Firefox FileEoitViewHistory BookmarksProfilesPlatform Sprint 1 Q2 - Platform Tea@ Jy 19798 evaluation for ai activ *+ New labTools Window Help© github.com/jiminny/prophet/pull/468/changes#diff-fc8aG53dec2cd8e3d632fa5d32e0abd4b22176ce7619cc5752c4efe9e53f083b11 OpenJyvoeVluar on Tor ai activty tvoes #400All commitsnikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-types [v eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_7k6scZ0Bd4RvtuaT9GQV.json LQ Filter files...v eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_807ncZ0Bd4RvtuaTumTs.json L-• E eval_output/ai_activity_type/eval.…> • ground_truth> D input• E src/evalendpoints.py* eval-ai-activity-type.py*.gitignore0c -0,0 +1,4 CgY eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_8E6-cZ0Bd4RvtuaThmSx.json0a -0,0 +1,4eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_8k7mcZ0Bd4RvtuaT02Rf.json0c -0,0 +1,4 Cgeval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_8U7VcZ0Bd4RvtuaT0WRt.json L0a -0,0 +1,4v eval output/ai activity type/eval datasets/v2-from-loqs-replayable/ground truth/gt 904Lcp0Bd4RvtuaTyGTO. ison [L∞a -0,0 +1,4 ca© Daily - Platform - nowA100% C8 • Tue 14 Apr 9:49:18C0 Zus viewedi• Awallind aporovaliVomitrevew+4 UUUUI_VIewea• Viewed"ai activity type reasoning": "Call metadata shows an SDR (Mads Hartmann Rasmussen) from Velatir calling Entech MarineA/S for 1m23s, typical of a short outbound cold call to book a demo. No transcript content indicating a demo, follow-up,or onboarding. The description matches an outbound cold call to book a demo.","ai_activity_type": "Outbound Phone Call"+4 UUUU!• Viewed+ 1"ai activity type reasoning": "Call metadata: 57-second call, vendor sales director Jake Bishop outbound to a contactat Hanna Instruments Ltd (a manufacturing company). No transcript content. Short scheduled call likely an initialoutreach/cold qualification attempt rather than discovery, demo, or scoping. Fits the Manufacturer cold prospectingdescription focused on quick relevance validation and booking a follow-up.","ai_activity_type": "Outreach and Qualification - Manufacturers"+ }+4-= Viewed•*•"ai_activity_type_reasoning": "Both Connectd rep (Jake Chweidan) and the prospect (Ben 0\u2019Neill) are listed asparticipants, and the call duration is 24 minutes 58 seconds, which is longerIthan 5 minutes. Therefore the criteria forthis activity type are met.","ai_activity_type": "Meeting Sat"+4 .0000 _ Viewed ••"ai_activity_type_reasoning": "Both a Connectd representative (Alex Rose) and a client participant (Kayleigh Laverick)auuended the care and the cal auramon was 25 mnures 40 seconas, whch exceeas o mnures uzort meerno boun reounreccriteria.","ai_activity_type": "Meeting Sat"+4 --- Viewed+ {"ai_activity_type_reasoning": "Call had both Connectd rep (Harry Field) and prospect (Roselle Allsop) present, but thecall duration was 4 minutes 48 seconds, which is less than the required >5 minutes threshold for the \"Meeting Sat)"activity type. Therefore the criteria are not fully met and no activity type applies.","ai_activity_type": null...
|
NULL
|
NULL
|
NULL
|
8507
|
|
8506
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.Daily - Platform - nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niC 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x© Actions secrC C Dependathttps://github.comfjminny/prophet/actions/runs/24195639151/ob/70624720594?pr=468B AWSE3 SSHEl Datados * Clude5 CircieclD Sentry8 TaskslRun Flake8 & MypyActions secQ 8•• Dependabot: xHa Dependatс *Insights & Coachin.• Dev• Ux0 Al BookmarksSummary|Run Flake8 & MypyRun details© UsageWorkfiow file•, Search logs ISteika Stoyanova• Set up Python |> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|• • Run Flakes g1 •Run poetry run flakeß, -count -select-E9,F63,F7,F82,N293,N291,W292 --show-source -statistics./src/eval/eval-al-activity-type.py:330:63: k292 no newiline at end of fileasyncio.runtrun,cvaluatzon(alactavityTypetvaluator, args)Error: Process conpleted with exit code 1.• Run Mypy|• Post Load cached Poetry dependencies |• Post Set up Python|› © Post Checkout code> • Comolete icbl12$25s7sSteliyan Georgiev• 2026 0i4u6, Inc.HortroPrivacy Security Status Community Docsvanest wenesouvomsureurtaine cumeuKovalik9:49 AM Daily - Platform3:518 • Tue 14 Apr 9:49:16Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8506
|
|
8505
|
FirefoxFileEoitViewHistory Bookmarks ProfilesTools FirefoxFileEoitViewHistory Bookmarks ProfilesToolsWindow Help© github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8jiminny / prophet &Platform Sprint 1 Q2 - Platform Tea‹> Code• Issues 12 Pull requests 6 Agents• Actions# Projects DD Wiki ® Security and quality 20 Insights ộ Settings@ Jy 19798 evaluation for ai activ® On April 24 we'll start using GitHub Copilot interaction data for Al model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.+ New TabJy 19798 evaluation for ai activity types #468 °I1 Opennikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-typesConversation 3a Commits 9El ChecksFiles changed 203I_ All commits -Q Filter files....® This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.• E eval_output/ai_activity_type/eval...• cround TrutnY eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_-E4rcp0Bd4RvtuaTN2RY.json> D input~ E src/evalendpoints.pyE eval-ai-activity-type.pyE gitignore0a -0,0 +1,4eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_7067cZ0Bd4RvtuaT1GTC.json∞ -0,0 +1,4 c@v eval output/ai activity type/eval datasets/v2-from-logs-replavable/ground truth/qt 7k6scZ0Bd4RvtuaT9GOV.ison [L•a -0,0 +1,4 caV eval output/ai activity tvpe/eval datasets/v2-from-loas-replavable/around truth/at 807ncZ0Bd4RvtuaTumTs. ison fL∞ -0,0 +1,4 @github.com/iiminny/prophet/pull/468/changes#diff-fc8a653dec2cd8e3d632fa52e0abd4b22176ce7619cc5752c4efe9e53f083bH N• < 4o ll © | Daily - Platform • nowA100% C8 • Tue 14 Apr 9:49:15Q Type to search• Awaiting approvalCode -Preview) -+25,671 mu000• 0 / 203 viewedSubmit review -Switch to single file mode+4 UuUUViewed+ {+"ai_activity_type_reasoning": "Both Connectd rep (Jake Chweidan) and prospect (Andrew Thomas) participated and callduration 1s 33+ minutes, exceeding minutes; theretore the criteria tor "Meeting Sat are met.","ai_activity_type": "Meeting Sat"+4 UUUUU ]Viewed ••"ai_activity_type_reasoning": "Vendor rep Toby Firkins (Connectd) and prospect Chris Lockyer both attended, and callduration is 12 minutes 11 seconds (>5 minutes), satisfying both criteria.","ai_activity_type": "Meeting Sat"+400000 _ Viewed |+ {"ai_activity_type_reasoning": "Both a Connectd rep (Alex Pertwee) and the prospect (Dan Hyde) attended the call, andthe call duration was 33 minutes 41 seconds, which is longer than 5 minutes\u2014meeting the criteria.","ai_activity_type": "Meeting Sat"+ 3+4 .0000 Viewed+ {"ai_activity_type_reasoning": "Call metadata shows an SDR (Mads Hartmann Rasmussen) from Velatir calling Entech MarineA/S for 1m23s, typical of a short outbound cold call to book a demo. No transcript content indicating a demo, follow-up,or onboarding. The description matches an outbound cold call to book a demo.","ai_activity_type": "Outbound Phone Call"+ }...
|
NULL
|
NULL
|
NULL
|
8505
|
|
8504
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.| Daily - Platform • nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Q 8•Inbox (2) - niC 3y 19798 eval(7 Service-Deskjr Foet Ameson• Cost | Claude x© Actions secrC C DependatActions sec• Dependabot: xHa Dependathttps://github.comfjminny/prophet/actions/runs/24195639151/ob/70624720594?pr=468# AWS|E3 SSHс *El Datadog % Claude5 CircieclD Sentry8 TaskslInsights & Coachin.• Dev• UxD Al BookmarksE:1@ Summary|Run Flake8 & MypyRun details© UsageWorkfiow fileRun Flake8 & Mypy•, Search logs ISteika StoyanovaHDSet up Python> • Install Poetry |> • Load cached Poetry dependencies |> © Install dependencies using Poetry|v © Run Flake8./srC/eval/eval-ai-activity-type.py:330:63: N292 no nevline at end of 11keasyncio.runtrun,cvaluatzon(alactavitytypetvaluator, args)N292 no nevline at end of fileError: Process conpleted with exit code 1.• Run Mypy |• Post Load cached Poetry dependencies |• Post Set up Python |> © Post Checkout code |> • Complete job|25sSteliyan Georgiev• 2026 0i4u6, Inc.HertrorPrivacy Security Status Communitywonest wenesouenDo not share my personal informationKovalik9:49 AM Daily - Platform3:508 • Tue 14 Apr 9:49:15Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8504
|
|
8503
|
FirefoxFileEoitViewHistory BookmarksProfilesToolsW FirefoxFileEoitViewHistory BookmarksProfilesToolsWindow Help© github.com/jiminny/prophet/pull/468/changes#diff-d2f9fea9bc428b119ae26a66453e8eab7dd0782d4a4d93075ca5ce05903d77e8jiminny / prophet &Platform Sprint 1 Q2 - Platform Tea‹> Code• Issues 12 Pull requests 6 Agents• Actions# Projects DD Wiki ® Security and quality 20 Insights sộs Settings@ Jy 19798 evaluation for ai activ® On April 24 we'll start using GitHub Copilot interaction data for Al model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.+ New TabJy 19798 evaluation for ai activity types #468 °I1 Opennikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-typesConversation 3a Commits 9El ChecksFiles changed 203Il_ All commitsQ Filter files....® This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.• E eval_output/ai_activity_type/eval...• cround TrutnY eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_-E4rcp0Bd4RvtuaTN2RY.json> D inputv E src/evalendpoints.py• eval-ai-activity-type.py* gitignore0a -0,0 +1,4eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_7067cZ0Bd4RvtuaT1GTC.json∞ -0,0 +1,4 c@v eval output/ai activity type/eval datasets/v2-from-loqs-replavable/ground truth/qt 7k6scZ0Bd4RvtuaT9GOV. ison [L•a -0,0 +1,4 @V eval output/ai activity tvpe/eval datasets/v2-from-loas-replavable/around truth/at 807ncZ0Bd4RvtuaTumTs. ison fL* aa -0,0 +1,4 ca-< 40 ll © | Daily - Platform • nowA100% C•8 • Tue 14 Apr 9:49:12Q Type to search• Awaiting approvalCode -Preview) -+25,671 m0000• 0 / 203 viewedSubmit review -Switch to single file mode+4 UUUULViewed+ {"ai_activity_type_reasoning": "Both Connectd rep (Jake Chweidan) and prospect (Andrew Thomas) participated and callduration 1s 33+ minutes, exceeding 5 minutes; theretore the criteria for "Meeting Sat are met.","ai_activity_type": "Meeting Sat"+4 UUUUU|Viewed ••"ai_activity_type_reasoning": "Vendor rep Toby Firkins (Connectd) and prospect Chris Lockyer both attended, and callduration is 12 minutes 11 seconds (>5 minutes), satisfying both criteria.","ai_activity_type": "Meeting Sat"+4 _ Viewed+ {"ai_activity_type_reasoning": "Both a Connectd rep (Alex Pertwee) and the prospect (Dan Hyde) attended the call, andthe call duration was 33 minutes 41 seconds, which is longer than 5 minutes\u2014meeting the criteria.","ai_activity_type": "Meeting Sat"+ 3+4 00000• Viewed|+ 1"ai_activity_type_reasoning": "Call metadata shows an SDR (Mads Hartmann Rasmussen) from Velatir calling Entech MarineA/S for 1m23s, typical of a short outbound cold call to book a demo. No transcript content indicating a demo, tou low-up,or onboarding. The description matches an outbound cold call to book a demo.","ai_activity_type": "Outbound Phone Call"+ }...
|
NULL
|
NULL
|
NULL
|
8503
|
|
8502
|
DiaFilelEdit|ViewTabsBookmarksHistoryExtensionsWin DiaFilelEdit|ViewTabsBookmarksHistoryExtensionsWindowHelpmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)nbox (2) - nilП1 ОрC 3y 19798 eva(7 Service-Deskj Foet Ameson X• Cost | Claude xfActions secrCCt: Dependat xE SSHEl Dutadog @ # Claude O Circieci E Sentry>X • TasksJy 19798 evaluation for ai activity types #468its into master from JY-19798-evaluation-for-al-activity-typesSee analysis details on SonarOube Cloud)))This branch has not been deployedo ocooymeneReview requiredAt least 1 approving review is required by reviewers with write access.Some checks were not successtuli1 failing, 2 successful checks1 failing check2 successtul checks viting & Tyne Checkiog /Run.FlakeB.&.MyRy.(0ul_rsgut1s), Falling atter 57s|Q Buld / SonarCloud (pull_request) Successful in 7mv @3 SonarCloud Code Analysis Successful in 1ml- Quaity Gate passecMerging is blockedSquash and mergeYou can also merpe this with the command line. Yitw.command line instrastions.Add a commentWritePreviewIEO0RequiredStll in progress? Convert to dreft9:49 AM Daily - Platform$0.| Daily - Platform • now• Actions secrinsights & Coachin_• Dependabot: xQ 8•O Far Dependa:с *• Dev• Ux.D Al BookmarksStetka StoyanovaSteliyan GeorgievKovalik3:478 • Tue 14 Apr 9:49:11Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8502
|
|
8501
|
DiaFilelEditViewTabsBookmarksHistoryExtensionsWind DiaFilelEditViewTabsBookmarksHistoryExtensionsWindowHelp$0.Daily - Platform - nowmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)inbox (2) - niC 3y 19798 eva(7 Service-Deskje Foet Awesoe xCont| Claude x© Actions secreC C Dependat x• Actions secr• Dependabot: xQ 8•O Far Dependas xс *11 ОрE SSHEl Dutadog # Claude O Circieci E Sentry>X 0 Tasksinsights & CoachinJy 19798 evaluation for ai activity types #468its leto naster from JY-19798-evaluation-for-al-activity-types2. Invalid modei name: "OPENAI: gpt-5_2"src/eval/eval-al-activity-type.py line 32:CROUND_TRUTH,LUM,ENOPOINT: str • "OPENAI: gpt-5_2*spt-5_2 ls not a real OpenAl model name. This looks like a typo - was this meant to be gpt-40, gpt-4-turbo, orsimilar? This would cause silent falures il -populate-ground-truth-lte is used.Fix this →3. Dead code: ground_truth_es in endpoints.pysre/eval/endpoints.py line 216:"ground_truth_es*: entry-get("ground_truth_es"),This key is passed through but nowhere in the evaluator is ground,truth,es ever set on entries —|entry-get (*ground_truth_es*) will always return None . Either populate this field in the collector or remove the key.4. Missing newline at end of filesrc/eval/eval-al-activity-type-py is missing a trailing newline (flagged in the diff). Most linters/editors will warnaoout trisFix this →• Devux.Steika StoyanovaSteliyan GeorgievMinor Notes* Pul mn transcnpts. tne 100 mous ties contain reai cat transcrois (ur conersabons). Contfm tris is scocpooie pelyour data handling policies before merging.• COMPARE_TO_GROUNO_TRUTH = *es" naming: The value "es" is used to mean "compare against originalElasticsearch response*, but the variable name and choice value are somewhat opaque. A comment or rename (e.g."original" ) could improve readabilty.The mualuator lsc loe e pold - e. collecemfrentet/ cellte en ites geie epitne beny checkina and9:49 AM | Daily - Platform• 3:44|8 • Tue 14 Apr 9:49:08ChatNikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8501
|
|
8500
|
FirefoxFileEoitViewHistory BookmarksProfilesToolsW FirefoxFileEoitViewHistory BookmarksProfilesToolsWindow Help© github.com/jiminny/prophet/pull/468/changes#diff-cac29662cfcc94ab0d41bf55628c52944063d9b3d1c18f441442c3c0da7f4fdejiminny / prophet &Platform Sprint 1 Q2 - Platform Tea‹> Code• Issues 12 Pull requests 6 Agents• Actions# Projects DD Wiki ® Security and quality 20 Insights sộs Settings@ Jy 19798 evaluation for ai activ® On April 24 we'll start using GitHub Copilot interaction data for Al model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.+ New TabJy 19798 evaluation for ai activity types #468 °I1 Opennikolaybiaivanov wants to merge 9 commits into master from JY-19798-evaluation-for-ai-activity-typesConversation 3a Commits 9E ChecksFiles changed 203I_ All commits -Q Filter files…..® This page has been optimized for large pull requests. Some browser features (like Find on Page or Select All) may not work as expected.• E eval_output/ai_activity_type/eval...• cround TrutnY eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_-E4rcp0Bd4RvtuaTN2RY.json> inputv E src/evalE endpoints.pyD eva-ar-activity-type.py* gitignore0a -0,0 +1,4eval_output/ai_activity_type/eval_datasets/v2-from-logs-replayable/ground_truth/gt_7067cZ0Bd4RvtuaT1GTC.json∞ -0,0 +1,4 c@v eval output/ai activity type/eval datasets/v2-from-logs-replavable/ground truth/qt 7k6scZ0Bd4RvtuaT9GOV.ison [L•a -0,0 +1,4 @aV eval output/ai activity tvpe/eval datasets/v2-from-loas-replavable/around truth/at 807ncZ0Bd4RvtuaTumTs. ison fL* aa -0,0 +1,4 ca2<40 ll @ | Daily - Platform • nowA100% C•8 • Tue 14 Apr 9:49:04Q Type to search• Awaiting approvalCode -Preview) -+25,671 mu000• 0 / 203 viewedSubmit review -Switch to single file mode+4 UUUULViewed+ {+"ai_activity_type_reasoning": "Both Connectd rep (Jake Chweidan) and prospect (Andrew Thomas) participated and callduration 1s 33+ minutes, exceeding minutes; theretore the criteria tor "Meeting Sat are met.",+=°"ai_activity_type": "Meeting Sat"+4 UUUUU ]Viewed ••"ai_activity_type_reasoning": "Vendor rep Toby Firkins (Connectd) and prospect Chris Lockyer both attended, and callduration is 12 minutes 11 seconds (>5 minutes), satisfying both criteria.","ai_activity_type": "Meeting Sat"+400000 _ Viewed+ {"ai_activity_type_reasoning": "Both a Connectd rep (Alex Pertwee) and the prospect (Dan Hyde) attended the call, andthe call duration was 33 minutes 41 seconds, which is longer than 5 minutes\u2014meeting the criteria.","ai_activity_type": "Meeting Sat"+ 3+4 .0000_ Viewed+ {"ai_activity_type_reasoning": "Call metadata shows an SDR (Mads Hartmann Rasmussen) from Velatir calling Entech MarineA/S for 1m23s, typical of a short outbound cold call to book a demo. No transcript content indicating a demo, follow-up,or onboarding. The description matches an outbound cold call to book a demo.","ai_activity_type": "'Outbound Phone Call"+ }...
|
NULL
|
NULL
|
NULL
|
8500
|
|
8499
|
DialFilelEditViewTabsBookmarksHistoryExtensionsWin DialFilelEditViewTabsBookmarksHistoryExtensionsWindowHelpmeet.google.com / Daily - PlatformNikolay Yankov (Presenting)Inbox (2) - niC 3y 19798 eva(l Service-Desk xJr Foet AmesonD Cont | Claude@ Actions secreC C Dependat• Actions sec:https://jiminny.atlassian.net/jira/software/c/projects/JY/boards/37?selectedissue=Jy-197980 Projects#I AWS|E SSHE Datadog0 * Claudel5 CircieClSentry8 TasksInsights & Coachin.Platform Team800$ JY-18679 / E JY-19798READY FOR DEV 1Evaluation for AI Activity Typesexceed Fontawesomepockage limitsMAINTENANOSReady for Dev8-20564 1 П ****=DescriptionA few months ago we launch a new feature which uses AI to detect the activity type of a call. This is done by adding adescription to each activity type in the Playbook page. We still don't have automatic evaluation for this feature.• create automaue evaluacon lor as Acuvity lype reature• use data from Jiminny and other customers as a data set|• make sure we test with different teams - sales/cs and with different call types (conference, dialer calls)• use a smart modei to determine whether the data set can be set as a source of truth |•eheechehoecitwewhewientrrene weteecetminc• make sure the evaluation is accurate and the results are good and rellableSUDKaSKSAdd subtaskLinked work itemsrelates to• Jy-20510 Combine Al requests vor summaries, acton kems, xey poio, oActivityAddcomnen@3Code Review v*x Improve Technical Story~ Details• Nikolay IvanovAssign to meKODoh2 Galya DimitrovaUoyoe ononeQ Open with VS Code1 branch9 commits1 pull requestoutkd VatinieComponentsSub-ProductAOOrConLabels( Dependabot-• DelDux.5 days agol9:49 AM Daily - PlatformDaily - Platform - now8•O Far DependolD Al BookmarksGroup: QueriesDEPLOY &leature fiag reieaseSUPPORT TICKETSDeployed0 -20ses 15 4 •**=AI Review - Q1CTNHEUANANORCODeployed__19082 1o 400-wo sucn coumh "lype onentty 'Event"PLATTORM STARTUTTYDeployed8 -20384 05 11 ****=Missed deletion of stale0 m40n48009(Pipedrive) DI > status notlabeled, date data missingREOUCT CHURNDeployed0 -20014 05 Пl ****=Stefka StoyanovaSteliyan GeorgievValik:• 3:40|8 • Tue 14 Apr 9:49:05Chat6Nikolay YankovNikolay Ivanov...
|
NULL
|
NULL
|
NULL
|
8499
|